A2-JeffBowman
From CS294-10 Visualization Fa08
Assignment: Assignment 2: Creating Visualizations with Existing Visualization Software
Student: Jeff Bowman
For Assignment 2, I use U.S. Census data to investigate the changes in Asian population between 1990 and 2000 across each of the 50 U.S. states.
Contents |
Initial question
How has the number of people identifying as Asian American changed over the years?
Data and reformulation
Initially, I used data from the U.S. Census FactFinder. However, after some initial research, it became clear that U.S. Census data was not easily available for the years 1980 and before. In addition, the data analysis becomes more complex, because the U.S. 2000 Census allowed people to select as many race boxes as they wanted.
In addition, there were several different measures of how the number of people could change, leading me to reformulate the question.
Specific question
How has the number and percentage of the Asian population in each of the 50 states changed between 1990 and 2000?
Data collection and importation
To produce the visualization, I first opened the U.S. Census Factfinder to export the 2000 Census table for Race, which exported a pipe-delimited file that I could then convert to comma-delimited using Excel. Next, I tried to do the same for 1990s, but was unable: The 1990 data in U.S. Factfinder did not allow for pipe-delimited output. Using their online query tool, I queried for the Race data and assembled a spreadsheet in Excel from five pages of HTML tables. After transposing the table and saving it to comma-delimited, I opened them up in Tableau.
In order to produce data that worked together, I had to first convert each of the two fields (State Name in 1990 and Geography in 2000) to Geographic locations so that Tableau could calculate latitude and longitude data. I also imported both data sources into the same Data Connection, producing an Inner Join on the 1990 data's State Name and the 2000 data's Geography to link the two.
Visualizations
Initial visualizations
My first visualization (Figure 1) was a simple horizontal bar chart—horizontal so state names could be read easier—integrating both 1990 data and 2000 data. The colors chosen were the default colors; I turned on data labels and muted them in color to add more specific data.
After seeing the graph, I realized that the relative total population could skew the data. Therefore, I created two calculated measures (one for each year) for the percentage of the Asian population versus the total population, and graphed that in a similar horizontal bar chart, yielding Figure 2.
This horizontal bar chart based on percentage is useful for data analysis, but as the question spoke more to the change in population, I created a third calculated measure for the difference in percentage between 1990 and 2000, and graphed that in a simpler horizontal bar chart, Figure 3. I changed the color to make it clear that the data is not to be confused with 1990 or 2000 single-census data.
While the bar chart in Figure 3 is the clearest comparison in relation to the question, the location information attached to the states have not yet been used. Therefore, I produced a final visualization, Figure 4. This used State Name as the series of points, but then changed the color and size of those points to fit the change in percentage data used in Figure 3. However, this produced Hawaii's negative value as being much smaller than any of the other data points, when in reality it is much larger in magnitude. Thus, I created the fourth and final calculated measure, which was simply the absolute value of the change in percentage. While values close to 0 could be difficult to discern (positive or negative), this made it easier to compare the relative magnitude.
Final visualization and analysis
The data, visualized, shows a strong increase in population in the three states bordering the Pacific Ocean, notably California, and also Nevada. Also, generally, there is a large increase of population in the New England states.
However, there also seems to be a relatively large increase in Minnesota, Illinois, Georgia, and Texas, which do not have a strong historic precedent. This would be a good starting point for further sociological study.
In this interpretation of the question, the Asian population in 2000 includes all who selected the "Asian" box in the census report, including those who selected more than one box. This may include a slight bias in the data, as those who identify as more than one primary race may be more likely to identify as Asian if they are given multiple races to choose. Also, the 1990 census contained categories for Asian and Pacific Islander, with 2000 specifying that Pacific Islander also contains Native Hawaiian; with this in mind, some of the drop in Asian population in Hawaii may be due to differences in self-categorization rather than specific changes in population.
