From CS294-10 Visualization Sp11
I thought it would be interesting to analyze Community Health Status Indicators provided by the Department of Health and Human Services Centers for Disease Control and Prevention. These data contain 200 measures for each of the 3,141 United States counties. Data from the Centers for disease control was relatively easy to work with. It came in CSV format which imported into Tableau quite easily, however the format of the data (those fields with no data containing negative numbers) made some modifications necessary.
What are the significant health concerns in different parts of the United States? First I thought it would be useful to look at the over all "Leading Cause of Death" for each state. To do this I put the State Name in "columns" and all of the causes of death in Rows. This gave a huge number of small multiples that were not very useful (it also took a while to compile). I think filters are going to be necessary to get a good overview of this data.
Ok, let's just see leading causes of death in the United States...Which gives us negative numbers...that is weird, certainly not what one would expect from "leading causes of death." There must be something going on with the data. After some investigating it turns out that counties without data are indicated with negative numbers (-1111 or -2222), very helpful Centers for Disease Control. Ok need to remove the negative numbers in order to get accurate sums of "leading causes of death."
But this is not too useful since it's just a table of numbers. In order to get a better sense of the leading causes of death overall, by age group and by ethnic group I'll need to group some of this data.
I cannot figure out how to group these data points...I'm going to move on to risk factors.
What risk factors are important in which parts of the country?
Ok, was able to plot average obesity rates across counties in each state in a bar graph and sort by highest to lowest obesity rate:
Was also able to plot the obesity rates on a map:
It would be useful to see dots or bars for each of the risk factors in one graph... Still trying to get each risk factor side by side on a map, but I was able to map the average % of population that doesn't exercise to the color of each dot. This is going a little better than before, so I think I'll quit now while I'm ahead and come back to this a little later.
Tried for quite a while to get multiple risk factors on one map but only succeeded in showing obesity and exercise. That being said I was able to optimize the map a bit to show more clearly the relationship between obesity, exercise and location. I am running into "resource exceeded" errors now. I'm going to try restarting to see if that fixes anything. Apparently not, now I can't even load the image...
Ok, was able to modify the query to be a little less complicated and now it's working again.
Here is the optimized obesity by exercise plot.
I wonder if I can filter by another dimension, like Diabetes to see if there is a trend in obesity and exercise. The following shows only counties with > 10% of the population with diabetes.
But there are a couple of counties of interest, are they anomolies? Look at counties in blue (0-9% obesity and either + or triangle, 0-19.99% of population who don't exercise). These seem to be relatively low risk factors but high diabetes rates. I wonder if there are other factors at play here.
Obviously I'm limited to the data points in the data source so I'll just try some things...like poverty:
Does seem conclusive, I'll also try the various ethnic groups that are tracked including Hispanic, Asian, Native American and Black...also not conclusive...but, most of these counties are primarily white.
What about age factors? Here is a plot of the % of people between 65-85 in these counties with low no-exercise rates obesity and high diabetes.
The first visualization I came up with attempt to plot three variables on a map, rates of obesity, exercise and diabetes. This visualization showed a concentration of high rates of diabetes in counties where there was high obesity and low rates of exercise which is unsurprising. These counties seem to be concentrated in the South-Eastern United States.
However there are some anomalous counties in the mid-west and west with low rates of no-exercise and low rates of obesity that still have high rates of diabetes. To investigate this phenomenon further I created a number of other graphs.
The following shows relatively high elderly populations in these counties which may explain the higher rates of diabetes despite the relatively good exercise and obesity factors in those regions.