[2007: NOTE THAT THE MAP BELOW IS NOW OLD: MORE RECENT AND UPDATED TIME SERIES MAPS ARE AVAILABLE AT THIS LINK -- http://declanbutler.info/blog/?p=58 ]
Nature has a Google Earth map of avian flu outbreaks online tonight.
Download the network link directly from here.
The visualization of avian flu outbreaks is the first online map, to my knowledge, of each of the more than 1800 individual outbreaks of avian flu in birds that have been reported over the past two years. It also provides a geographical overview of confirmed human cases of infection with the H5N1 influenza virus.
Have a look. Here’s a quick description of how it was built.
The FAO supplied me with an Excel table containing information on some 1800 H5N1 outbreaks in animals from Dec 2003 to Dec 15 2005. The animal data was compiled from information held by the FAO, the World Organization for Animal Health (OIE), various government sources, and the UN Food and Agricultural Organization (FAO) Emergency Prevention System (EMPRES) for Transboundary Animal and Plant Pests and Diseases. Thanks to the FAO for this data, and Francesca Pozzi, a FAO GIS consultant at the time, for advice on issues with the data. The underyling data contains errors and omissions, but it is the best global dataset available; version control issues told me not to even think about trying to modify or add to it, apart from correcting errors in names etc when I see these.
I compiled data on human cases from World Health Organization bulletins. No single database, or precise spatial data on cases could be obtained from WHO because, it says this would require the lengthy process of it requesting permission to provide it from each affected country. I found locations for many cases in WHO bulletins, although these case data, which come from governments, often themselves provide no detail of even where the case came from, let alone their clinical or epidemiological characteristics — locations were found here by cross-checking cases described in WHO updates with case descriptions in the literature, using dates, sex, and age to identify distinct cases. Errors no doubt remain though, as a result, so I would be grateful if any scientists who take a look at it would let me know if you spot any incorrect case data, so that these can be corrected.
Mapping the FAO data posed several challenges. The biggest was that the original datasets contained no latitude and longitude data for the outbreaks, so it was impossible to map them directly! FAO says it uses an internal proprietary UN system for defining geographical units such as place names, provinces and districts that it can only share internally within UN agencies, and so could not make available. So I had a spatial dataset of outbreaks, great, with no coordinate data; less greatâ€¦ ;->
Also, for many outbreaks, no precise location name was available, but at best, the name of the local ‘district’ where it occured. I therefore used ‘districts’ as the basis for geolocating outbreaks; this I felt was sufficient for these maps, the use of which is journalistic presentation, not research.
As districts are irregular polygons, their latitude and longitude are given as â€˜centroids,â€™ a mathematical estimation of the â€˜centreâ€™ of the polygon. FAOâ€™s GIS people kindly supplied me with files of centroid data for district names in affected countries (it’s a bit more complicated than that as districts can have ‘islands’ or multiple-polygons, each with their own centroid, but I won’t go down that road here).
To link the centroid data to the records lacking coordinates, I exported the data to a Microsoft Access database, and the centroids data to another, and using relational joins, used a database query to import the lat & long of the centroids into the outbreak table, where an exact match was found both between the â€˜districtâ€™ names, and the â€˜provinceâ€ names for each entry in the centroid tables. One could probably do this quicker by georeferencing within ArcGis, but as I’m more familiar with DBs I chose this route.
This automatic process located geographical coordinates for most of the districts, although variations in the spellings of names, and typographical errors in the data, resulted in many having to be manually checked and corrected. Errors no doubt remain though… let me know of any clangers.
Generating the description box for the GE placemarks
To generate the text for the â€œdescriptionâ€™ field that would appear in the Google Earth white pane for each placemark, I ran a database query to populate this field by concatenating other relevant fields in the table, using the kml required < ![CDATA[ expression. The screenshot is in Frenchâ€¦
This for example
Generates something like this for every entry:
In passing, database queries are an quick-and-dirty way to generate simple kml files from point data; say you want a quick look at part of your data, a quick query (xml syntax omitted, as WordPress blog softwaredoesn’t like it) like this:
“Placemark name & [Province] & name & Point & coordinates” & [lng] & & [lat] & coordinates & Point & Placemark”
will generate correct kml. Then your just paste your output into a text editor, add the standard kml header
and the footer â€œ â€œ, save in UTF-8 . Et voila!; instant kml. (Hat tip: http://conversationswithmyself.com/240 )
Conversion to kml in the real bases
To convert the database to more complicated kml, I wrote a couple of php scripts that generate a nested hierarchy of folder categories in GE (inspired by scripts by Rui Carmo â€“ see examples here: http://the.taoofmac.com/space/blog/2005-07-04 ).
The folders and other multiple attributes in the database allow kml to be generated in various formats, eg by country, by time period etc.
You can also do much the same, quicker, by importing the Access file into Arcgis as X,Y data, and then exporting the data into kml using Brian Floodâ€™s Arc2Earth tool.
I used a PHP approach here simply because I had these customized for the dataset already, but Arc2Earth works great also, and I’ll try to post a few results here.
Placemarks were coloured according to time period, using the kml StyleMap element â€“ thanks for the suggestion to Anne Wright, a computer scientist at Ames Research Center, and a collaborator in the Global Connection. These were also set so that the labels only showed on mouse browseover to avoid label clutter.
The diameter of the placemarks was made proportional to the size of outbreaks, much as one could do in ArcGis, but here simply by using a simple DB query to create a “place_diameter” field in the DB, based on a log of the number of animals killed.
The map is a â€˜betaâ€™, and although the data has been manually checked, errors in the positions of some locations, names, etc, cannot be excluded. The underlying animal data itself also suffers from country underreporting of outbreaks, and omissions or inaccuracies in reporting. The FAO also notes with respect to its own data, â€œfacts and figures are to the best of our knowledge accurate and up to dateâ€ and that â€œFAO assumes no responsibility for any error or omission in the datasetsâ€.