From CS294-10 Visualization Sp11

Jump to: navigation, search


Assignment 2

I decided to make global public opinion the focus of AS-2. I like to follow the annual global surveys carried out by the Pew Research Center (a think tank) based in Washington DC. The ongoing Pew Global Attitudes Project (pewglobal.org) makes available all the institute's collected data throughout the years. Pewglobal.org provides very nice Flash-based visualization tools for exploring the data. But the actual fraction of the collected data accessible via this method is extremely miniscule, and leaves so much untapped potential in terms of data exploration.

Pew's data sets can be freely downloaded from their website after filling out some basic information. Unfortunately, the data sets for different years are all distinct, and the questionaires used change every year. Another issue is that the the set of countries selected to be surveyed varies throughout the years. As a result, it is somewhat difficult to construct time-based visualizations. However, some manual labor can make some time-based analyses possible (discussed further below).

The Data

The data sets come in the .sav format used by IBM's SPSS satistical analysis software. Before it can be used by Tableau, it must be converted to .csv format. Fortunately IBM provides a trial version of SPSS that is able to do this. SPSS also uses a tuple/column format. The data provided by Pew is essentially one huge list of tens of thousands of tuples, with one column for every question used in the survey. There is a slight peculiarity, though, in that the actual values stored in the fields are integers. Each individual integer is actually an index that maps to a particular label from a set/enumeration of labels possible for that field type. So for example, if field x represents a color and has possible values {orange, red, green}, then field x might contain "1" for orange or "3" for green. The fields are basically enums.

Luckily, SPSS has an option during export that allows the field integers to be replaced by the actual labels they correspond to. This worked mostly, though I was noticing that Tableau had parsed certain parts of the data strangely. It turns out some of the labels contained commas, which confuses the .csv format since commas are used as delimiters. Modifying the data to remove all comma occurences from the field labels alleviated this problem.


I am interested in differences in opinion that occur between the people of different countries. A simple first question to explore is, "How generally satisfied are the people of various countries with their wellbeing?" In the bars shown below, the border between the blue and orange bars per country divides the proportion of people who are at least somewhat satisfied (bottom portion of the bars) and those who are at least somewhat dissatisfied (upper portion of the bars) with their lives.


There are several observations to note. First, over 90% of the Indian respondents reported to be at least somewhat satisfied with life -- a remarkably high proportion. Second, Egyptian respondents fared the worst in terms of life satisfaction. Less than 40% of the respondents from Egypt reported being at least somewhat satisfied with life. The other 60% report being dissatisfied to at least some extent, and over 30% report actually being very dissatisfied. Kenya also did not fare well, with only around 50% of respondents reporting to be at least somewhat satisfied. The United States has the most number of people reporting to be "very satisfied" with their lives, with Canada close behind.

As a follow up, I wanted to get a sense of the overall mood of the respondents at the time they were surveyed. The Pew data questionnaire measured this by asking the respondents whether they had a good, bad, or typical day. I wanted to know about any anomalies that may color the rest of the survey results. For example, if the people in a certain country happened to be experiencing an extraordinary rough patch, those people could unhappily and inadvertently respond more negatively in general to the rest of the survey questions.


It appears that most countries had more or less comparable distributions of mood on the day of the survey. I use the term "mood," for expediency, but the question is more accurately understood as measuring whether a respondent's day could be characterized as better than usual, worse than usual, or typical. Anyway, most countries followed the distribution to be expected by definition: The majority of respondents report "typical day," followed by a small number of people reporting "good day," and finally a still smaller number of people reporting "bad day."

Several countries, including the United States and especially Canada, had roughly equal proportions of people reporting "good" and "typical" days. It is unclear whether this was due to the fact that it was a particularly good day for North America or whether Americans and Canadians are typically more cheerful overall. Egypt had an especially large number of people responding "bad day." In fact, there were 10% more bad day than good day respondents in Egypt.

At this point, I could not help but wonder about Egypt. It seems peculiar, though not unexpected, that a country would fare "poorly" not only on life satisfication but on mood as well. I wondered whether a person's mood at particular moment can influence his perception on his life or vice versa. This might vary depending on the country a person is from, but I was curious to know whether this was the case in a general sense. So I charted the respondents based on their moods, grouped by their life satisfaction levels.


My suspicions were confirmed in that the two measured attitudes influence and are not necessarily independent of each another. A person who is satisfied with her life is more likely to perceive her day as being better than usual, and vice versa. Pay attention to the green red bars in the chart. We see that there is a clear trend downwards for the green and clear trend upwards for the red as we move across the page from most satisfied to least satisfied with life respondents.

In fact, there is a spike in the "very dissatisfied with life" category of "bad day" respondents -- nearly 40%! Could it by coincidence that nearly 40% of all the respondents with less satisfying lives just so happened to have a particularly bad day when the survey was taken? This seems highly unlikely. The more likely explanation is that perhaps these respondents had a bad day and then in turn had a skewed perspective in their overall evaluation of their lives. Or perhaps they are always unsatisfied with life and as a result mistake typical days for a particularly bad ones.

A funnier, less depressing trend can be seen in the category on the far right corresponding to respondents who "don't know" how satisfied with life they are. This group has by far the largest proportion of people responding "typical day" for mood. Not too much of a surprise, as people without opinions of their lives are probably also likely to ambivalently give a "meh" response when asked about mood. "Typical day" and "Don't know" respondents account for well over 80% of all the respondents in this group.

One common question that has seen much debate is whether wealth can buy happiness. Out of curiosity, I decided to take a brief detour and see if the Pew data could provide any insight. Income levels were measured for China respondents only, so I charted the life satisfaction parameter divided by income levels. It was certainly not intended to be a conclusive visualization, but rather a curious attempt to see if any trends could be spotted.


It does seem that there is a weak trend in which the number of somewhat and very satisfied respondents increases as income levels increase. But the effect is quite small. It should be noted that the group with the lowest income also contains the largest proportion of "very dissatisfied" respondents. That being said, it would still be difficult to conclude that income level is a primary determinant of overall life satisfaction. At best, a marginal effect can be seen. Perhaps the theory that income levels matter, but only up to a certain point, is the most fitting here.

I generated a number of visualizations out of personal interest that are not directly related to the theme of this report. I decided to include the following one anyway because it was interesting. It shows the proportion of people who agree/disagree that more measures should be taken in their respective countries to restrict people from entering the country from outside. In the pie charts, blue corresponds to agreeing with more restrictions while orange is disagreeing.


Most countries around the world seem to have similar distributions, with more people agreeing than disagreeing. An interesting region is East Asia, where there is a peculiar concentration of countries (Japan, South Korea, China) where significant numbers of people who instead disagree. South Korea and Japan are significant outliers in that the majority of respondents in those countries do not believe in increased restrictions.

Time-based Visualizations

The Pew research institute has conducted the surveys over many years, and I wanted to see some time trends. As mentioned before, this required me to download all the separate results over the years and merge the data. This was challenging because the questionnaire changes year by year, with even common questions sometimes measured using different metrics. For example, in one year, the life satisfaction responses was measured using "satisfied," or "dissatisfied," whereas in other years the respondents were asked to rate their life quality on a scale of 0 to 10.

Merging the data also required some judgment calls. As an example, it was unclear how to interperet the number 5 given on the 0-10 scale for life satisfaction. Because there are 11 numbers in the range [0,10], 5 does not fit neatly into a simple dichotomy of satisfied vs. dissatisfied. A large number of respondents chose 5, as it is a fairly neutral response, so how I decided to make the split was quite important. One could reason that 5 could map to undecided/don't know, but this does not seem semantically convincing. I ultimately decided to exclude "5" responses from the data sets, which may or may not have been the right call.

What was important was getting all the data sets to use common metrics. There were a number of formatting issues (e.g., column alignment) that had to be corrected manually. Other problems didn't show up until the merged data were loaded into Tableau and used. For example, there were varying spellings for the United States: "US," "USA," "United States," etc. Finally, because the data sets for different years are all discrete, none of their contained tuples had date attributes attached to them. I inserted this information manually so that Tableau would know to which years the various records belonged to.

After the data was neatly merged, I was able to construct time series data. The first one shows how respondent "satisfaction with how things are going in the country" varies over time.


China, represented by the top blue line, jumps out of the fray. China is an outlier in that not only is respondent satisfaction with the country very high, but it increases every subsequent year measured, eventually reaching levels near 90%. This may be a reflection of the rapid economic development of the country in recent decades. Eyeballing the rest of the trend lines, I noticed two similarly shaped lines and decided to isolate them for further inspection.


They happened to be Poland and Russia. Both exhibited rising satisfaction in the early to mid 2000s, with a sharp increase in satisfaction from 2007 to 2008. Both experienced a significant drop in satisfaction from 2008 to 2009. It also happens that both are Eastern European countries, with perhaps some shared geopolitical characteristics (though I am not knowledgable enough in this regard to say), and this might suggest common causes for the similar trends. Unfortunately, Pew did not collect enough data from other countries that could be grouped in the same way to verify this hypothesis further.

Another similarity between countries:


The lines represent Jordan and Pakistan. Though the shape of the lines in terms of rises and falls is remarkably similar, I don't have enough insight to attribute it to anything more than coincidence. It was a pattern that jumped out at me from the original graph and seemed worthwhile to isolate.

Keeping with the life satisfaction theme, I wanted to gauge the attitudes of various countries towards achieving satisfaction -- whether it is self determined or due to factors beyond individual control. The chart below shows the proportion of respondents from each country that agree/disagree with the idea that success is dependent mostly on exterior forces beyond the control of the invidual.


Two countries immediately stand out: United States and Canada. The respondents in these two countries, by far more than any others, attribute success to factors within an individual's control. On the other end of the spectrum, India, South Korea, and Pakistan respondents overwhelmingly agreed that success relies on exterior forces. The causes for the different attitudes between countries are obviously complex with geopolitical, socioeconomic, and cultural components. There is a spectrum of opinions, but US and Canada stand out very significantly, perhaps due to similarities in the aforementioned components.

This led me to a more cynical question about whether those who are dissatisfied attribute that fact to factors beyond their control while more satisfied people credit themselves for their success. This can be quite strongly influenced by factors that depend on where they are from (country), so I was careful to divide the results by country. I decided to make bars showing the proportion of people who agree that success depends on external forces, and divide them by whether they are satisfied or dissastisfied with life. My hypothesis is that there will be a consistent trend across most countries in which satisfied people attribute success to internal factors, while dissatisfied people blame their misfortunate on factors beyond their control. Here is my final visualization:


I was surprised to find that there was no obvious trend present. If my hypothesis was correct, we should expect to see a shorter bar on the left followed by a taller bar on the right in most countries. If respondents' views on success are not influenced by their life satisfaction, we should expect the two bars in each country to be very close to even.

The fact is that the patterns are quite mixed across all the countries. As a matter of fact, in certain countries such as China and India, upwards of 65-80% of those who are satisfied with their lives actually believe success is attributed mostly to factors beyond individual control! In some other countries, especially Germany, the opposite pattern exists. In Germany, those who are satisfied with life are more likely to believe that success factors are within an individual's control.

[add comment]
Personal tools