This blog is 1 year old today. I’d like to say thanks to all of you for visiting, and often getting in touch.
It’s been an eventful year, with content mostly a mix of posts — too-infrequent — on GIS, avian flu and public health, computing, and, of late, the venue for disseminating information on the Libya HIV case, and the campaign to free the six medical workers facing the death penalty — see here and here.
I’ve an article in today’s Nature — Amazon puts network power online — on an interesting form of computing-on-demand from Amazon, that might appeal to many scientists — it is in beta. It costs $0.10 per computing hour, and to store data for $0.15 per gigabyte per month. To get started, see the FAQ, and a guide here.
When Dutch computer scientist Rudi Cilibrasi needed hundreds of hours-worth of computing time to test a data-mining algorithm earlier this month, he went not to his IT department but to Amazon.com. He paid $60 with his credit card, and in minutes had the equivalent of ten servers installed, which crunched through his job in a couple of days â€” ten times faster than his desktop PC would have managed.
“I see no reason why the Amazon service wouldn’t take off,” says Alberto Pace, head of Internet services at CERN, the European particle-physics laboratory near Geneva. “For a lab that wants to go fast and cheaply, this is a huge advantage over buying material and hiring IT staff. You spend a few dollars, you have a computer farm and you get results.”
What’s most interesting is Amazon’s use of ‘virtualization’ technologies.
Virtualization uses a layer of software to allow multiple operating systems to run together. This means that different computers can be recreated on the same machine. So one machine can host say ten ‘virtual’ computers, each with a different operating system.
That’s a big deal. Running multiple virtual computers on a single server uses available resources much more efficiently. But it also means that instead of having to physically install a machine with a particular operating system, a virtual version can be created in seconds. Such virtual computers can be copied just like a file, and will run on any machine irrespective of the hardware it is using.
Virtualization is going to be one of the next big things in computing, as it brings both large economies of compute resources, and unprecedented flexibility. One of the most popular open source systems is Xen; here’s a guide to install it on Debian or Ubuntu Linux.
Scientists are also testing using virtualization to overcome one of the biggest drawbacks of most current Grids – see here and here for more info on Grids — and computing clusters. They are balkanised, each using a different operating systems or versions, which results in poor use of the available computing resources. Virtualizing the Grid allows virtual computers — image files — to be run on top of all available resources irrespective of the underlying operating systems.
Researchers can also develop applications on whatever software and operating system they have on their lab machine. But at present when they go to run the application at a large-scale, they often need to completely rewrite it to fit the protocols and systems used by a particular cluster or Grid. Virtualization frees researchers from these constraints.
I asked Ian Foster, cofounder of the Grid computing concept what he thought of the prospects for Amazon type-services.
“It’s neat stuff. Exactly what it means remains to be seen, but my expectation is that Amazon’s EC2 and S3 will be seen as significant milestones in the commercial realization of Grid computing. I also think that they may turn out to be important technologies for scientific communities, because they start to address the current high costs associated with hosting services.”
In passing, anyone who has tested the Amazon service, do get in touch to give me your experience, and how you have used it, on email@example.com