A2-NateAgrin

From CS294-10 Visualization Fa07

Jump to: navigation, search

Contents

[edit] Domain

I'm interested in looking into HIV prevalence in countries and determining if there is a possible correlation to various secondary factors such as literacy, education, and use of contraceptives. Much has been hypothesized about these socio-economic relationships within the popular media, and it would be useful to researchers within the HIV field to understand if these relationships exist, and how they might exploit them to reduce HIV infection rates within populations. To develop questions around this topic and subsequently answer them, I will need to acquire statistics for large populations on factors such as literacy, education rates, and other socio-economic factors.

[edit] Questions

General:

  • Are there certain socio-economic or quality of living factors which seem to correlate with the presence of diseases, specifically rates of HIV infection?

Specific questions:

  • Does the rate of contraceptive use correlate to a rate of HIV infection within a population?
  • Does the total number of years of education within a population influence the rate of HIV infection?
  • Do literacy rates impact the occurrence of a disease like HIV?

[edit] Dataset

Datasets were found by searching the web using a common search engine (Google) and the available class listed resources. The datasets used were taken from the United Nations' public data repositories, and their subsidiary entity the UNAIDS group. Finally, to determine the populations of each nation, data was used from the census.gov website.

[edit] Visualizations

[edit] General Data Exploration

Image:1_hiv_per_country_redux.jpg

To get a feel for how Tableau would handle the data I simply input the information on HIV/AIDS, selected the total number of infections for adults and children for each country and plotted out the above graph. The data required some formatting, including editing the axis and changing the data alias names to correct for misinterpreted characters or incorrect labels. Finally, in the original data, major geographic regions were each condensed into a single row of data. These rows were excluded from the visualization to prevent a visual misrepresentation of the data for each individual country.

Comments: Not surprisingly, places with high population density and known AIDS hot-spots showed up as having the highest numbers of HIV/AIDS infections.

[edit] Question 1: HIV / Contraceptive Use

Based on the data I found, I wanted to see if there was a clear correlation between HIV/AIDS and the percentage use of modern contraceptives. My hypothesis is that as the percentage of the population's contraceptive use increase, the number of HIV/AIDS cases per population will decrease.

Image:2_hiv_vs_modern_contraceptive_use_redux.jpg

Here no clear correlation appeared, but this may have been do to an error with my approach to the data. Instead of using the percentage of the population of each country living with HIV/AIDS, I used the total number of cases reported for each nation. This made me realize that my use of the raw number of HIV/AIDS incidents per country was a misleading representation of the trends I was searching for. Because countries with high populations would be expected to have a higher number of infections, they might overshadow countries with lower population but a higher percentage of infection. To adjust for total population size, I obtained population data for each country and used Tableau's computational features to create a computed data set, the percentage of the the population with a recorded incident of HIV/AIDS. This graph plots the percentage of the population with HIV/AIDS per country, revealing some startling and disturbing trends in Africa.

Image:4_percentage_population_with_hiv_redux.jpg

I remapped the percentage of modern contraception use against percentages of populations living with HIV/AIDS, in an attempt to retest my previous hypothesis, that as contraception use increases the percentage of a population living with AIDS will decrease.

Image:5_contraceptive_use_vs_percentage_w_hiv_redux2.jpg

No clean trend was found based on this analysis. This might suggest that outside factors influence the prevalence of HIV within a population. If that population is sedentary and does not mix with other populations much, the incidence of HIV might be low regardless of the use of contraception.

[edit] Question 2: HIV / Years of Education

In this graph I attempted to look for a correlation between the number of years of education and the incident of HIV/AIDS infection. The educational dataset included information about men, women and total populations' average number of years of education. I used only the total population dataset and did not inquire into the men or women's numbers.

Image:6_years_education_v_percentage_w_hiv_redux.jpg

This visualization suggests that there is no strong correlation between years of education and rates of HIV/AIDS infections, although the data was not computed with any statistical methodology so this finding is purely empirical.

[edit] Question 3: HIV / Literacy

Here I question whether literacy plays a major role in influencing rates of HIV/AIDS infections. Similarly to the last two questions, I predict that as literacy rates increase, the rate of HIV/AIDS infections decreases.

The dataset for literacy contained information on Adults and Youths, and a breakdown between total population literacy rates, male literacy rates and female literacy rates. I used only the Adult information (15+ years old) and the total population literacy rates. Literacy rates are listed as a percentage.

Image:7_literacy_vs_per_w_hiv_redux.jpg

Clearly, no obvious trend was revealed by this analysis, suggesting that literacy rates influence the rates of HIV infection little if at all.


[edit] Discussion

The several plots created seem to suggest that there is no strong correlation between socio-economic factors and the rate of HIV/AIDS. Much has been reported about the efforts of HIV aid workers attempting to bridge the cultural divide to fight HIV, and these data seem to suggest that high rates of HIV may be more related to unmeasured, or unmeasurable factors like social expectations or cultural beliefs. However, further study of these data would be necessary to draw any firm conclusions.

[edit] Problems

Working with the underlying data proved to be difficult and required a bit of by-hand editing. I could not easily find a method for editing the data directly within Tableau, although I'm sure one exists. This became a particular problem when trying to join desperate sources of data together, where the joins would drop data without explicitly showing what was lost in the creation of a connection between two datasets. In relating back to my discussion, this loss of data may have influenced my final results and skewed the data to look more erratic than it potentially is.

[edit] Notes

  • Tableau was fairly straightforward, if not limited in its options and available configurations.
  • After typing up the results and observations I discovered that Tableau had removed some columns of data. I assume this occurred because Tableau was attempting to preform a join on the data and could not match up specific columns. It would have been useful if the user was notified of this dropped data, instead of it simply disappearing.
  • It would be nice if Tableau allowed the user to more easily see the data, in spreadsheet form that it was referencing in real time.
  • Highlighting across data sheets and graphs was not readily obvious. Making this clearer to the user would help in aiding them to find trends through multiple visualizations.
  • The author may be a total noob who should do a better job of reading the manual...


[edit] Issues

I would have liked to have tried Spotfire out, but I kept being told that my request to try out the software was denied... frustrating.



[add comment]