Sunday, August 26, 2007

How about remaking the world?

How about remaking the world, with one click. Making it more structured, more proper, less verbose. And less human? Less full of mistakes, but maybe making the genius of it visible at first sight.

But then - what would there be left to do?

Sunday, August 19, 2007

The 'what to do now' problem

I'm extremely passionate about programming (especially Lisp and Python), about creating simple and effective solutions really quickly, about reusing existing solutions. Over the last month or two I've created an extremely scallable IM server (scales infinitely, handles tens of thousands of concurrent connections per server), a simple in-memory cache server, and the BPM software mentioned in the previous post. Then there is the tourism site I've made, which isn't up online yet but I'm going to finally release it next week.

OK, but that's done. So what next? Many ideas float around. Ideas for the software I listed above arose from everyday problems. Grono needed an IM server and a cache - so I wrote it. Grono needed BPM - so I wrote one. I'm having problems with organizing my free time - so I made that tourism site to solve that problem.

Problem is I seem not to have any more problems left. I hope some will come soon, or maybe you have some problems to be solved?

Business process/workflow management (aka BPM)

The BPM software I've been working on is practically finished. It's very simple to use, yet enables the users to closely track all the processes going on in their department/company. Nothing ever gets lost, and for every task there is always a responsible person and a deadline. All the processes are visualized so you can easily verify their correctness.

The software is closely integrated with email, contains only 2000 lines of code so it's highly manageable, and sports a very easy to use plugin architecture enabling users to create custom code to be called whenever ANYTHING changes in the system. And I mean anything, from new user creation to adding a comment to a task.

We've just started using this project at grono to manage our IT department. Once it matures a bit I will probably put this project online for you to try out.

It's 100% Python (and consequentially multiplatform) and based on Pylons

Sunday, August 12, 2007

Nokia E90 review

The communicator is finally reborn, and finally works :)

Instead of naming the virtues of this phone I will name the problems with it. I love this phone. Everything that I don't mention is great. So the problems are:
  1. Series 60 software not adjusted to it's wide screen (in many places it could utilise it a lot better).
  2. Slow switching between the external and internal screen.
If i find any more problems I will report them immediately, but at last a communicator has quick software, a great camera, excellent reception, excellent sound quality and it does not hang (one random restart after USB transfer during over a month of usage).

Good job Nokia!

My work

For a long time I've been wondering what is my work about, what I would like to do. Well now I've come to a conclusion:
  1. I want to concentrate on grono, make it a worldwide company.
  2. I found that I'm a backend programmer. I don't like implementing frontends because it's a non-creative work, even though I consider myself very experienced in designing them, and so I will focus on backend solutions/programming.
A bunch of ideas are fluctuating around my head, many things can be done better than they are now based on my vast experiences with grono. And I will work on many projects enriching available solutions. Right now I'm working on:
  1. Business workflow software - I want to use it inside grono and I wonder if I will be able to surpass Microsoft's solutions with it's simplicity and integration (if not - we will use their solutions at grono probably).
  2. Free time/tourism website. This is partly GUI related work so it's been waiting to be finished for the last 1-2 months while I concentrated exclusively on grono, but now it is the time to finally finish it.
  3. TODO list site. This isn't even in the planning state, I want to wrap up the first two.
So that's about the whole secret plans :)

Saturday, October 28, 2006

Web 2.0 conferences

I'm preparing a speech on Web 2.0 for the PTI conference in Wisla (Poland). And then I'm showing up at Stanford (USA) on November 29th for the Web 2.0 conference. So I have to prepare two presentations. I don't want them to be identical so it's an even harder job.

Web 2.0 is talked about all the time and it's been defined by numerous people. But those aren't the things which are making the task of talking about it hard. What does make it a difficult task is that the whole idea is actually very simple. Web 2.0 is just about doing things the 'natural' way. It's about conveying thoughts, feelings and all other attributes of humanity to computers.

No longer are websites designed by technology people with technology in mind. Web 2.0 are sites designed by humans with all the attributes of humanity in mind. And humanity is something everyone of us 'feels' and knows. So is there anything new to say about it? I hope I find those things and manage to draw the attention of conference guests to my presentation.

Sunday, August 20, 2006

Writing a distributed filesystem in 24 hours

Motivation
So you want to run Flickr, YouTube out of business with your brand new site. Whether it happens or not - you need a decent distributed filesystem to store the millions of files your users will upload.

You don't need full filesystem functionality (i.e. seeking files, reading/writing parts of files, using the filesystem as a normal Linux FS) for this purpose, so GFS is probably not for you. Besides, it seems to be kernel-based (and I hate Linux kernel-based solutions for their instability) and it also seems to be RedHat only, so it's not really a choice for you - the true hacker. There is also Coda and AFS but both seem to be immature and I wouldn't bet any money on any of those.

What you need are simple put/get file operations. This narrows your choice down to MogileFS. MogileFS has some bad things about it:
  1. It's Perl-based (yuck!)
  2. It requires a central database to store the locations of files, and when your load rises the database gets more and more hits possibly becoming the bottleneck of the whole deployment.
If you can cope with these then you're all set. But if you want to achieve infinite scalability and get rid of the central database - read on.

Prerequisites
What you need in order to hack a fully fledged distributed FS:
  • Python, the best language a business can use without risking the lack of programmers (Common Lisp is better, but I don't know anyone who knows it except for me)
  • Some means of communication between the client and the FS - I chose HTTP (using Python's builtin urllib on the client side and Django on the server side), but you can use Twisted's PB or any other communications protocol.
Well.. that's it. No database required, no kernel hacks, no hidden costs.

So how DOES it work?
Let's say you have 9 servers to use for the filesystem, each with 400GB of storage (2 SATA drives in RAID 0). We will divide those into 3 clusters, each cluster consisting of 3 servers. So we will replicate every file in the FS to 3 independent machines. This way the machines can have cheap SATA-based hard drives running in RAID 0, and when a machine fails we still have 2 other computers storing the same files.

On each of the servers there is a HTTP server running which channels all requests to a Python program supporting two operations - putFile and getFile. It can be a single Django view, or two Python methods exported by PB. Whatever it is it's simple and you can implement it in a matter of minutes.

The whole magic is inside the client. What the client does is it treats our server clusters as hashtable buckets. So basing on the name of the file we want to put or get from the FS, it calculates the hash of it (MD5 for example which is fast and available as a Python library) and divides that modulo the number of clusters. As a result we get the number of the cluster to which this file belongs. If it's a putFile operation we're doing - we execute this operation on ALL machines in this cluster, and if it's a getFile - we execute it on a random one (and if it's not working - we try another one and so on).

Simple, easy to implement and effective. There is of course a bunch of problems with this implementation (adding clusters, lack of transactions, etc) but all these can be overcome. One thing is certain - you can have it up and running within 24 hours.

If you have any questions please post them in the comments, and perhaps I will continue to part two of this tutorial with more details and perhaps code snippets. And if you are interested in buying such a solution together with all the software required to grow and maintain it - let me know.