From CS 294-10 Visualization Sp10
Creating Visualizations with Existing Visualization Software
Ideation to Realization
I knew I wanted to do some kind of sociology/psychology visualization. First I wanted to do something on media and attention span, but had difficulty finding data since it was such a specific topic. Then decided I wanted to look at another health issue such as depression, and maybe relate it to the GDP of countries. Finally, I decided to look at suicide rates by country instead. At first I wanted to compare 2 specific countries, but had a hard time finding that too (however, later on I did find it).
At one point, I looked specifically at UC Berkeley, but was having trouble also. It seemed like this data was not as readily reported on the Berkeley websites (except for maybe the Daily Cal) and is stigmatized information. I found a few good sources, but time series data was lacking. What stopped me eventually from further pursuing this avenue was the difficulty of data extraction. Much of it was summarized and not easily converted to a usable format.
Finally, I decided to look at suicide rates by country again. This time, with some experience I found the data on the World Health Organization (WHO) website and the Human Development Index from a UN website. I decided to use the Human Development Index over the GDP because it includes more factors than just the GDP, such as life expectancy and education, to try to estimate the standard of living (a controversial thing). The final question I posed was: "Does the standard of living affect people's willingness to live (resulting in either committing or not committing suicide)?"
While looking for data I realized a few things:
- Lots of data is summarized, which is not necessarily useful for visualizations.
- Time series data is not easy to find or collect, because often times they only prominently display the latest data.
- There is too much data, and much of it is irrelevant.
This is the main visualization that shows each country plotted with HDI (in the X-axis) against Suicide rates per 100,000 (in the Y-axis). Instead of showing individual countries, I decided to distinguish them by region because I noticed a trend as I was shifting through the data. Much of the higher rates are correlated to location. Specifically the Eastern/Northern European countries (many of which were part of the former Soviet Union). In order to do this, I had to pull and aggregate the data across 4 excel sheets: (1) HDI, (2) Suicide Rates, (3) ISO 3166-1 alpha-3 country codes to Numerical Country Codes, (4) Numerical Country Codes to regions.
I've labeled the outlying countries, which would be of more interest. Also I've labeled the United States, since we would like to know where we stand in relation. The data points that more or less mark the boundaries (India, Norway, and Belarus) are labeled with more detail.
Finally, I included a trend line to show (although with high variance), increase in standard of living does not lead to people who are less willing to take their lives (actually the inverse is true). Of course, we need to take it with a grain of salt (I was able to find that pride in ones country is directly correlated with nickel import).
Also when I was looking through the data, I noticed many countries did not report any statistics on suicides. Many of these countries are in Central Asia, Africa, and South-East Asia, so I decided to include a map to show that. It also serves to reinforce that the Eastern/Northern European countries have some of the highest suicide rates.