I'm preparing a speech on Web 2.0 for the PTI conference in Wisla (Poland). And then I'm showing up at Stanford (USA) on November 29th for the Web 2.0 conference. So I have to prepare two presentations. I don't want them to be identical so it's an even harder job.
Web 2.0 is talked about all the time and it's been defined by numerous people. But those aren't the things which are making the task of talking about it hard. What does make it a difficult task is that the whole idea is actually very simple. Web 2.0 is just about doing things the 'natural' way. It's about conveying thoughts, feelings and all other attributes of humanity to computers.
No longer are websites designed by technology people with technology in mind. Web 2.0 are sites designed by humans with all the attributes of humanity in mind. And humanity is something everyone of us 'feels' and knows. So is there anything new to say about it? I hope I find those things and manage to draw the attention of conference guests to my presentation.
Saturday, October 28, 2006
Sunday, August 20, 2006
Writing a distributed filesystem in 24 hours
Motivation
So you want to run Flickr, YouTube out of business with your brand new site. Whether it happens or not - you need a decent distributed filesystem to store the millions of files your users will upload.
You don't need full filesystem functionality (i.e. seeking files, reading/writing parts of files, using the filesystem as a normal Linux FS) for this purpose, so GFS is probably not for you. Besides, it seems to be kernel-based (and I hate Linux kernel-based solutions for their instability) and it also seems to be RedHat only, so it's not really a choice for you - the true hacker. There is also Coda and AFS but both seem to be immature and I wouldn't bet any money on any of those.
What you need are simple put/get file operations. This narrows your choice down to MogileFS. MogileFS has some bad things about it:
Prerequisites
What you need in order to hack a fully fledged distributed FS:
So how DOES it work?
Let's say you have 9 servers to use for the filesystem, each with 400GB of storage (2 SATA drives in RAID 0). We will divide those into 3 clusters, each cluster consisting of 3 servers. So we will replicate every file in the FS to 3 independent machines. This way the machines can have cheap SATA-based hard drives running in RAID 0, and when a machine fails we still have 2 other computers storing the same files.
On each of the servers there is a HTTP server running which channels all requests to a Python program supporting two operations - putFile and getFile. It can be a single Django view, or two Python methods exported by PB. Whatever it is it's simple and you can implement it in a matter of minutes.
The whole magic is inside the client. What the client does is it treats our server clusters as hashtable buckets. So basing on the name of the file we want to put or get from the FS, it calculates the hash of it (MD5 for example which is fast and available as a Python library) and divides that modulo the number of clusters. As a result we get the number of the cluster to which this file belongs. If it's a putFile operation we're doing - we execute this operation on ALL machines in this cluster, and if it's a getFile - we execute it on a random one (and if it's not working - we try another one and so on).
Simple, easy to implement and effective. There is of course a bunch of problems with this implementation (adding clusters, lack of transactions, etc) but all these can be overcome. One thing is certain - you can have it up and running within 24 hours.
If you have any questions please post them in the comments, and perhaps I will continue to part two of this tutorial with more details and perhaps code snippets. And if you are interested in buying such a solution together with all the software required to grow and maintain it - let me know.
So you want to run Flickr, YouTube out of business with your brand new site. Whether it happens or not - you need a decent distributed filesystem to store the millions of files your users will upload.
You don't need full filesystem functionality (i.e. seeking files, reading/writing parts of files, using the filesystem as a normal Linux FS) for this purpose, so GFS is probably not for you. Besides, it seems to be kernel-based (and I hate Linux kernel-based solutions for their instability) and it also seems to be RedHat only, so it's not really a choice for you - the true hacker. There is also Coda and AFS but both seem to be immature and I wouldn't bet any money on any of those.
What you need are simple put/get file operations. This narrows your choice down to MogileFS. MogileFS has some bad things about it:
- It's Perl-based (yuck!)
- It requires a central database to store the locations of files, and when your load rises the database gets more and more hits possibly becoming the bottleneck of the whole deployment.
Prerequisites
What you need in order to hack a fully fledged distributed FS:
- Python, the best language a business can use without risking the lack of programmers (Common Lisp is better, but I don't know anyone who knows it except for me)
- Some means of communication between the client and the FS - I chose HTTP (using Python's builtin urllib on the client side and Django on the server side), but you can use Twisted's PB or any other communications protocol.
So how DOES it work?
Let's say you have 9 servers to use for the filesystem, each with 400GB of storage (2 SATA drives in RAID 0). We will divide those into 3 clusters, each cluster consisting of 3 servers. So we will replicate every file in the FS to 3 independent machines. This way the machines can have cheap SATA-based hard drives running in RAID 0, and when a machine fails we still have 2 other computers storing the same files.
On each of the servers there is a HTTP server running which channels all requests to a Python program supporting two operations - putFile and getFile. It can be a single Django view, or two Python methods exported by PB. Whatever it is it's simple and you can implement it in a matter of minutes.
The whole magic is inside the client. What the client does is it treats our server clusters as hashtable buckets. So basing on the name of the file we want to put or get from the FS, it calculates the hash of it (MD5 for example which is fast and available as a Python library) and divides that modulo the number of clusters. As a result we get the number of the cluster to which this file belongs. If it's a putFile operation we're doing - we execute this operation on ALL machines in this cluster, and if it's a getFile - we execute it on a random one (and if it's not working - we try another one and so on).
Simple, easy to implement and effective. There is of course a bunch of problems with this implementation (adding clusters, lack of transactions, etc) but all these can be overcome. One thing is certain - you can have it up and running within 24 hours.
If you have any questions please post them in the comments, and perhaps I will continue to part two of this tutorial with more details and perhaps code snippets. And if you are interested in buying such a solution together with all the software required to grow and maintain it - let me know.
Saturday, August 19, 2006
On Jakarta Tomcat's weakness
I've done a lot of work with Jakarta Tomcat. My Tomcat setup handled 30000 online users. Tuning the JVM for that was a nightmare but it pretty much worked.
But there remains one problem with Tomcat. A problem which I've reported a number of times to the developers but with no result. Once Tomcat is hit with tons of requests, and all the worker threads are depleted, it fails and never recoils. So basically instead of waiting for one of the threads to finish it's work and passing the next request to it, Tomcat simply does nothing.
I wonder if this will ever get fixed. And until it doesn't - I don't recommend using Tomcat for any production setup.
But there remains one problem with Tomcat. A problem which I've reported a number of times to the developers but with no result. Once Tomcat is hit with tons of requests, and all the worker threads are depleted, it fails and never recoils. So basically instead of waiting for one of the threads to finish it's work and passing the next request to it, Tomcat simply does nothing.
I wonder if this will ever get fixed. And until it doesn't - I don't recommend using Tomcat for any production setup.
Eten G500 review
After bad experiences with Nokia 9300 my hunt for the perfect business phone continued. I decided to give Windows Mobile a try. As usually I read a bunch of reviews and comparisons, and it came out that a nice choice would be the Eten G500. I thought it would be nice since it has a big and nice screen (unlike the HP iPaq 6915), has all the features needed nowadays except Wi-Fi. And it has a GPS receiver so I thought it would be neat to use it in my car when driving around the city/country.
And here's why I'm going to sell it ASAP and never buy such a device again:
I wonder why the biggest players in phone OS software release such crappy software. Do consumers really not want their phones to work uninterrupted over 30 or more days? Do they not mind rebooting them every 3 days? I recall having a Palm Vx, which was only a Palm and didn't have a phone integrated, but it did everything I needed (email, web, calendar) and I didn't reset it for ages and it never hung.
Any hints for a usable business phone? Someone suggested a BlackBerry so I guess I'll go for that phone next (within 2 weeks or so), and for now I guess I'll downgrade to Nokia 9300i.
And here's why I'm going to sell it ASAP and never buy such a device again:
- The stylus is located at the bottom of the device. NEVER buy such a device. Every other time you take it out of your pocket or bag, the stylus will automatically jump out and get lost either in your bag or it will be loosened and fall out on the floor at some random moment. I don't have to say that this is very annoying.
- The touch screen is a curse. Whenever you receive an SMS or someone calls you, the device turns itself on and activates the touch screen. It will add some random entry into your calendar, contacts list or whatever. So.. another annoying thing.
- The voice quality is really poor. Everyone I called was complaining that they can barely hear me. I asked some support person about this and he told me to update the firmware. It didn't help, and he ignored my email stating that.
- A phone is not a GPS navigation device. Some people may be satisfied with a Pocket PC or Palmtop based car navigation device, but I'm not. The screen doesn't give enough light, the loudspeak mode isn't loud enough and the software I used (AutoMapa) had a really unintuitive interface. I've never used a real navigation device built into the car's console, but I bet it's better than a phone-based navigation device.
- Windows Mobile is a PC system. I've been using the latest version (5) in this device, and it's basically the PC windows with phone software added. Some things that were annoying about it: unability to add more than one mobile phone to a person because the outlook-like contacts list only has one such field and you can't add another (it was possible in Nokia 9300), when you get a call from a number that's not in your contacts list - you can't add it to an existing contact but you have to make a new one (it was also possible in the Nokia 9300). These things don't render the phone unusable but are simple bugs which could be easily fixed if the vendor gave the software a bit of tests. I guess they don't care, and I know I won't buy Windows Mobile (until perhaps the next version if they fix such bugs) again.
- Windows Mobile hangs a lot. Sorry guys, you have to write really clean code if you want it to run for many days on such devices.
I wonder why the biggest players in phone OS software release such crappy software. Do consumers really not want their phones to work uninterrupted over 30 or more days? Do they not mind rebooting them every 3 days? I recall having a Palm Vx, which was only a Palm and didn't have a phone integrated, but it did everything I needed (email, web, calendar) and I didn't reset it for ages and it never hung.
Any hints for a usable business phone? Someone suggested a BlackBerry so I guess I'll go for that phone next (within 2 weeks or so), and for now I guess I'll downgrade to Nokia 9300i.
Nokia 9300 review
I'm a heavy mobile phone user. I expect the phone to let me: call, sms, organise my time and browse the web. If the phone does those 3 things then I'm completely satisfied with it.
I've read a million reviews of Nokia 9300 on the web before I bought it, and when I started actually using the phone it came out that none of those were actually true. All the reviewers seem to be using the devices for 1 week before writing the review. And most of the trouble with contemporary communicator-phones usually starts happening after 1-2 months of use.
Everyone knows the specifications of this phone so I won't repeat myself here, I'll just go straight for all the cons:
I've read a million reviews of Nokia 9300 on the web before I bought it, and when I started actually using the phone it came out that none of those were actually true. All the reviewers seem to be using the devices for 1 week before writing the review. And most of the trouble with contemporary communicator-phones usually starts happening after 1-2 months of use.
Everyone knows the specifications of this phone so I won't repeat myself here, I'll just go straight for all the cons:
- It hangs, and it hangs and it hangs. I had two firmware updates already (at an official Nokia repair station) and it didn't help. It seems that Symbian is single-threaded, because every time two things happen at once - the phone hangs. When I'm receiving an SMS and someone calls me - it hangs, when I'm browsing through the call log and someone calls me - it hangs. Sometimes it rings and I'm unable to receive the call because all the buttons stop responding, sometimes it just plain reboots.
- It's slow. Making a call from the 'outside' keyboard is a real PITA, the phone says 'contacts not ready' and you have to wait 15-20 seconds till it readies itself. Is this a phone or what? Another thing is the poor SMS handling. When you receive an SMS you only see the first few words of it, while you'd expect to see the whole thing having such a big screen. When you want to open the SMS you wait 5 or more seconds till it finally opens.
- It has broken software. Even such basic things as the contact book are broken. Sometimes I was unable to make a call because when searching through the contact book (or browsing) it kept sending me back to the first contact every 3 seconds. So I was unable to dial contacts beyond 'b' or 'c'.
- Nokia's PC suite software is a nightmare. I have no idea why such a successful company would make such pitiful software and not fix it over the years. The only usable thing is the backup feature - and even that is terribly slow.
Subscribe to:
Posts (Atom)