A2-RobinHeld

From CS294-10 Visualization Fa07

Jump to: navigation, search

Contents

[edit] Choice of Domain

The assignment posting included a link to USDA National Nutrient Database for Standard Reference, which proved to be very interesting. I downloaded the MS Access format of the database [1] , which included an enormous variety of food types, from raw to processed to prepared. The nutritional information listed for each food product contained such items as protein, cholesterol, and fiber content, as well as many others. The quantitative values were listed as g or mg per 100 g of the food of interest. While the database itself did not specify g or mg, it was straightforward to use the USDA's online search tool for the database [2], find an example entry, and look up the correct units.

[edit] The Question

It is commonly said that meat is an excellent source of protein. I've always had a general feeling that most meats, such as beef, chicken, fish, and pork, had roughly equal protein content. Therefore, if one wishes to eat a meal rich in protein, one could choose from any of those four options. However, I don't specifically remember ever reading any statistic to support that claim. So my question was:

Do beef, chicken, pork and fish meat have significantly different protein content (per 100 g)?

One could make the argument that the question excludes too many other types of meat, including lamb and turkey. However, I decided to limit the analysis to only four animals to make the problem more manageable and more applicable to my personal diet.

[edit] Exploring the Data

Tableau was used to analyze the data. As stated above, the SR database contained an incredible amount of information. So the most difficult hurdle I encountered was filtering the information down to just those pertaining to chicken, beef, fish, and pork. The data field that contained the food names was labeled as "Shrt_Desc," and each entry had abbreviated titles. Fortunately, the abbreviated titles were arranged with the type of animal as the first word. For instance, each beef item began with "BEEF," followed by the specific cut. My first step was therefore to manually filter out all items that didn't begin with "BEEF," "CHICKEN," "PORK," or "FISH." When I plotted the items against their protein content, I ended up with the following large bar graph:

Image:AllItems.jpg

I created a "calculated" measure that searched for "BEEF," "CHICKEN," "PORK," or "FISH" in the product description, and returned the found string. This measure was then set to the color field, resulting in the orange, green, red, and purple colors seen above. Notice that there are so many items that it's impossible to simultaneously get an overview of everything and maintain the ability to read the individual product descriptions. It then became clear that the data would need to be further refined to provide the reader with only the most important data, while concealing superfluous quantities. As a first step to this goal, I chose to refine the product types being displayed. As mentioned above, the foods listed in the database included raw, processed, and prepared varieties. Since preparation techniques vary a great deal within each meat type, I chose to focus on raw meats. This made my original question a little bit more specific, and perhaps limiting, in a culinary sense, but no less interesting. A filter was applied to the data set that excluded any items without "RAW" in the product description. The resulting graph is seen below:

Image:RawItems.jpg

It is clear that the number of items has been reduced, but not by much. Also, an interesting change occurred between the "all items" and "raw items" graphs. In the original graph, there were a handful of items with protein content values that seemed to be outliers. However, once the raw filter was applied, the outliers disappeared. I investigated the omitted items and found that they were all dried and/or skin-based items (ie, pork skins). The process of dehydrating the meat seems to raise the protein content per 100 g, which makes sense. Raw meat has high water content, so the water loss is bound to change the relative proportions of the other constituents of the meat.

Going back to our original question regarding relative protein content, the second graph still seems to offer too much information. A simple solution would therefore be to just display the average protein content for each type of meat, seen below:

Image:1stAverage.jpg

The bar graph shows similar values for the average protein content of the four different meats. But what kind of differences are necessary for us to conclusively answer our original question? A key piece of information is missing--the standard deviation of the protein values. These are included in the final visualization, presented in the following section.

[edit] The Visualization

Image:2ndAverages.jpg
Average protein content in grams per 100 g of raw meat for beef, chicken, fish, and pork. The bars represent the average value plus one standard deviation. Generally speaking, none of the types of meat offers a significantly different amount of protein than the others.

The original question asked whether raw beef, chicken, fish, and pork meat contained significantly different amounts of protein, based on the data in the USDA National Nutrient Database. The graphic above illustrates that generally speaking, none of the types of meat offered significantly more or less protein than the others. The average protein content of each type of meat (grams of protein per 100 g of meat) is displayed, as well as one standard deviation above the average. Due to limitations in tableau, a box-plot could not be created, and the lower standard deviation could not be included. Also, one could argue that 95% confidence intervals would have been preferable. However, the single-standard deviation metric presented here serves the purpose of demonstrating variability within the data. Also, the shapes are simple and easy to read and the figure is able to convey the point that the various protein content values of the different types of meat overlap each other. Previous versions of the graph (see above) included a color legend. However, the column labels fit with the shape of the graph better, are very clear, and made the legend seem superfluous.

An additional piece of information that could be useful would be the number of products included in the average calculations. For instance, there were nearly 250 items of raw beef, versus only two items of raw fish. This is likely due to the fact that it is a database assembled by the USDA, which is mostly livestock-oriented. A larger database, perhaps created the FDA, may include additional information about fish, which would help bolster the legitimacy of any findings.



[add comment]
Personal tools