A2-DavidSun
From CS294-10 Visualization Fa07
Contents |
[edit] Software
I used Tableau for this assignment.
[edit] Data
I acquired the data from Human Development Report [[1]], an organization whose goal is bringing people to the center stage of the development process. HDR periodically publishes reports on human development progress (aka Human Development Index) and makes the data available for online queries.
[edit] Question
Initially, I wanted to examine the relationship between literacy rate growth and information technology infrastructure growth in developing regions and low-income countries. However, the availability of the online data is limited to the years of 1990 and 2004, making it impossible to examine trends over time. Consequently, I modified the question to looking at the correlation between literacy rate and IT infrastructure in developing regions and to focus on the dataset for year 2004 due to completeness of the data.
[edit] Visualization
To characterize information technology infrastructure at a country or region I selected the following key inidcators:
- Cellular Subscribers (per 1000 people)
- Internet Users (per 1000 people)
- Telephone Mainlines (per 1000 people)
[edit] Developing regions
The first dataset I queried is an aggregated dataset concerning the main developing regions around the world, which are roughly divided by HDI into the Arab States, East Asia and the Pacific, Latin America and the Caribbean, South Asian and Sub-Saharan Africa. I plotted literacy rate against the IT infrastructure features described above and asked Tableau to perform a linear regression. The trend line shows that a positive correlation exists between literacy rate and IT infrastructure availability.
Next I examined separately two groups of countries based on their levels of human development.
[edit] Low human development countries
In total 32 countries are classified as low human development. Not all countries have data available along the three IT infrastructural dimensions that we measure. I applied separate filters to exclude those data tuples with null values for the particular dimension under consideration:
[edit] Medium human development countries
In total 88 countries are classified as medium human development. Again, not all countries have data available along the three IT infrastructural dimensions that we measure. I applied separate filters to exclude those data tuples with null values for the particular dimension under consideration:
Both sets of visualizations lend support for the thesis that a positive correlation exists between literacy rate in developing countries and IT infrastructure availability.
It is also clear from the visualizations that more data points exist for telephone mainlines than other measurement indicators. Furthermore, data points concerning telephony is also available for the year 1990 since has been in existence much longer than the Internet and cellular networks.
Comparing the above visualization for year 1990 against the same plot for year 2004, it is interesting to note that the East Asian and Pacific region (marked by the green circle) boasts very high literacy rate (around 80%) despite low telephony availability in 1990. While in 2004 they became the region with the greatest telephony coverage, literacy rate was up only by 9%. This suggests that there exist other factors that influence the literacy rate of a region/country, as one would naturally expect. I hypothesized that East Asian societies places greater emphasis on education and this would be reflected by government spendings on education. To verify this hypothesis, I tried next to examine public expenditure on education as a percentage of the GDP as well as government expenditure. Unfortunately this proved to be less successful due to the unavailability of this data from HDR.
[edit] Conclusion
The visualizations to a large extent confirmed the thesis that a positive correlation exists between literacy rate and IT infrastructure. This is what one would expect intuitively and visualization in this case served to confirm this belief. The visualizations also pointed to the existence of other factors that influence literacy rate. I think cultural influences has a large part to play, however more work is needed to draw that connection. Data unavailability was the main bottleneck for this assignment. This touches on an important aspect of visualization which I think has not been discussed much in class: data collection methods.








