From CS294-10 Visualization Fa08
The Search for a Data Set
On the Assignment 2 wiki page, I looked into the link titled "Online Datasets". From there, I clicked onto the link under the heading CSV and was brought onto a web page listing a number of data set sources. I eventually settled on the Human Mortality Database. I had to register (which was free) and downloaded a Deaths text file spanning from the year 1933 to 2005.
Stuck With Spotfire
For this assignment, I am using Spotfire. I myself own a Mac and unfortunately neither of the two visualization softwares support an Mac operating system...only Windows. So I am using the Windows computers in the Windows lab at Soda Hall. I tried to install both visualization applications, but it seems I was only successful in installing Spotfire on my Windows account. Tableau required some sort of administrative power!
Initial Question: How have the death tolls changed over the years?
To answer this question, I downloaded the CSV file containing USA death tolls from the year 1933 to 2005. In Spotfire, I then opened that text file and out came (by default) a scatterplot with the x-axis encoding year and the y-axis encoding number of deaths. Each point (depicted as a square) in the scatterplot encoded a record from the text file and with each year having multiple records, there were columns of points overlapping. I personally wanted to see a sum of deaths for each year and compare them, so the scatterplot was not really suitable for answering my question. Instead of a scatterplot, I made a bar chart with the x-axis still encoding each year and the y-axis encoding the sum of deaths for each year.
As can be seen from the visualization, the death tolls appear to be decreasing as the time progresses.
Question #2 Is there a relationship between the number of births and the number of deaths?
I wanted to see the relation between the number of births over the years compared to the number of deaths over the years. So I downloaded a CSV file containing USA number of births from the year 1933 to 2005. So with both data tables, I made two separate bar charts, one for births and the other for deaths. I placed them above and below each other for better comparison.
From the bar chart for births, it begins by increasing reaching its peak at the year 1957. And from there it decreases sharply in a few years. Then it just changes very little, remaining more or less steady. The number of deaths versus the number of births don't seem to have too much of a relation. The death tolls appear to increase through time regardless of the increases and that sharp decrease in number of births.
Question #3 How do the death tolls over time compare between the two sexes?
To answer this question, I decided to use a line chart instead of a bar chart. The line chart is more well-suited to the task of comparing the two genders than a bar chart.
According to the line chart, it appears the death tolls for men dominated the death tolls for women until their intersection at around the year 1997. From then on, the total deaths of women were greater than the total deaths of men. Both lines do generally increase over the years. I wonder what accounts for the increase in death. And I also wonder why the deaths of women have recently (for the years 1997-2005) became greater in number than the deaths of men. What accounted for that change?
Question #4 How do the death tolls compare between the ages?
I wonder how the deaths are distributed among the different ages. Since the range of ages spans from 0 to 148 years, I decided to narrow down the range to 20-75 years. I decided that the bar chart would be best to accomplish this task. The x-axis encodes each age from 20 years old to 75 years old. And the y-axis encodes total deaths for that age.
What resulted was sort of odd. It appears that for every age that was a multiple of 5, there was a spike in the number of deaths. All the number of deaths for the ages that were not a multiple of 5 were significantly lower than the number of deaths for the ages that were a multiple of 5 that bounded them. So for example the number of deaths for ages 30 and 35 are significantly larger than the number of deaths for the ages between them i.e. ages 31-34. I am not sure what is so magical about being divisible by 5. From this visualization, it just makes it seem that you are in your 30's, you are more likely to die when you are 30 or 35.
This strange observation may be due to the fact that the totals encompass all the years from 1933-2005. However, it is still odd to have such a consistent pattern to the spikes in deaths.
If I only count deaths totals from the most recent two decades (1985-2005), the visualization seems to make more intuitive sense: that older people die more than younger people.
Question #5: How do age and sex relate to death over the years?
To answer this question, I settled on a line plot. The two lines each represent a sex and the x-axis and y-axis encode age and death count respectively. Because I was more interested in the more recent years, the line plot only uses data from the year 1975 to 2005.
From the line chart, the death count for men is greater than the death count for women until the age 79. This supports the conclusion that more men die at a younger age than women. After the age of 79, the death count for women is greater than the death count for men. The men's line peaks at age 78 whereas the women's line peaks at age 84. Between the age of 100 and 110, both lines appear to be flat-lining. This flat-lining can be interpreted as that not many people live that long in order to die at those ages (which is not that surprising).
My questions were answered by the visualizations I generated on Spotfire, but it has inspired more questions that I do not believe can be answered soley by mortality datasets. What accounts for the death count of women to recently outnumber the death count of men? And what accounts for the number of deaths increasing over the years? My guess is that the answers lie in how society in the United States has evolved and progressed.