A2-Ken-ichiUeda

From CS294-10 Visualization Fa07

Jump to: navigation, search

Contents

[edit] Setup

[edit] Domain

I am broadly interested in biodiversity informatics, the use of information technology to manage, query, and distribute biological information for use in ecology, systematics, evolutionary biology, natural resource management, and related fields.

[edit] Question(s) and Dataset

My questions were somewhat biased by my choice of dataset. Several years ago I worked on project called the Wieslander Vegetation Type Mapping Project, where I had to manage the digitization and distribution of data from a series of vegetation mapping surveys conducted in California during the 1930s. In the spirit of eating my own dog food, I decided I would see how easy it would be to answer some questions using the data that I helped prepare for use by researchers.

The data included GIS data files containing point data of every individual plot surveyed, and tab-delimited text files containing the results of tree surveys and ground cover surveys conducted at each plot. The dataset also includes original and georeferenced plot location maps and the final vegetation maps, but these GIS data weren't appropriate for the visualization software I used. More information on the data can be found at http://vtm.berkeley.edu/about/

My questions were:

  1. What was the tallest tree species surveyed?
  2. What was the widest species?

[edit] Acquiring Data & Importing Into Spotfire

I downloaded the "All VTM Plot Data Archive" from http://vtm.berkeley.edu/data/download.php?path=download/Plot/Data/vtm-plotdata.zip and simply tried opening the vtm-plotdata-trees.txt tab-delimeted text file into Spotfire. Sure enough, Spotfire had no trouble importing it and immediately creating some charts.

I also downloaded the GIS shapefile for the plot locations from http://vtm.berkeley.edu/data/download.php?path=download/Plot/Vectorized/vtm-plots.zip. Although Spotfire doesn't seem to support ESRI shapefiles as data sources (it does support them as basemaps in map visualizations), the shapefile is actually several files, most of which are proprietary ESRI formats encoding the vector data, but the database is a dBase IV file that Excel can read. So I converted it to an Excel workbook and Spotfire had not problem importing that.

[edit] Iteration 1: Initial Observations

The first thing Spotfire did was show me a scatter plot, so I changed to the axes to height and species. Immediately, an outlier popped out, a pine that was apparently 1440 ft tall. If you believe Wikipedia or Reuters, the world's tallest tree is around 370 ft to 380 ft, so his is clearly an error, but Spotfire made it easy to filter out. Here's the filtered scatter plot:

Image:a2-kueda-init_scatter.png

The initial and most obvious way to answer my first question seemed to be a bar chart, which was easily achieved by plotting average height against species name:

Image:a2-kueda-height_vs_sp.jpg


Turns out the tallest tree species in the data is Sequoia gigantea, which was the scientific name of the giant sequoia at the time (the current name is Sequoiadendron giganteum). I ordered the bars by height to make it easy to differentiate between species that had similar average heights, and I colored by the number of plots used to calculate the average height, to give some indication of how representative the average was given the amount of data (e.g. an average height based on two plots might not be very convincing).

Then it was onto question 2. For this I used an identical graph, except I plotted the average number of trees per plot with a diameter-at-breast-height (DBH) over 36 inches. For each plot, the original surveyors counted stems in 4 DBH classes for each species, so for example, they may have recorded 3 tanoaks between 4" and 11", 0 between 12" and 23", 10 between 24" and 35", and 0 over 35". So the data couldn't tell me what species had the widest individual tree, but it could tell me, on average, which trees tended to have the widest stems in all the plots:

Image:a2-kueda-avg_over_36_vs_sp.jpg


This chart showed me some more errors in the data. "LITTER" is the highest value, but is generally used in the ground cover data as a stand-in for unidentifiable lead litter, and shouldn't even be in the tree data. Next is Salvia spathacea, a species of sage that isn't even a tree. Pirola picta is next, which is not a valid name, and may be a misspelling of Pyrola picta, an herbaceous plant that also has no place in the tree data. The fourth species is the first tree, making Pinus jefferyi (Jefferson pine) the widest tree in the data, with an average of 2.5 stems over 36" per plot.

[edit] Iteration 2: Further Investigation

With my initial questions answered, I wanted to test out Spotfire's GIS capabilities, so I tried creating a map visualization. At first I tried plotting the tree plots over a basemap of California counties, but after some puzzling over why no points were displaying, I noticed the line in the docs that says all geo data must be in the WGS84 datum (and, I assume, lat/lon coordinates). So I reprojected my base map using ArcGIS, and I also reprojected the original VTM plot points shapefile, and generated new columns in its data table for latitude and longitude using the XTools Pro extension to ArcGIS. Pulling all these together in Spotfire was then relatively painless (after adding those X and Y columns to the tree data table, joining on the PLOTKEY attribute), and I got a map like this:

Image:a2-kueda-map.jpg


Then I could ask questions like "Where are the tallest and widest trees?" The tall sequoias were strung out in small groves in the Sierras:

Image:a2-kueda-tallest_plots.jpg


And the wide Jefferson pines seems to be in the Transverse Ranges in Southern California: Image:a2-kueda-widest_plots.jpg


I could also ask questions like, "What was the tallest pine?" (Pinus lambertiana) and "where did they find it" (mountainous areas, mostly in the Sierra) by filtering by species name. I changed the filter type to text and entered "Pinus" and Spotfire showed every record with a species name beginning with "Pinus" (the genus of all pines). Here's what it looked like when I highlighted Pinus lambertiana:

Image:a2-kueda-tallest_pine.jpg

[edit] Summary

Although I was able to answer my relatively modest questions, I did not uncover any particularly amazing patterns in the data. However, Spotfire was able to point some problems with the dataset (impossible tall trees, impossibly wide plants that are not even trees), which was certainly useful, and I don't think I really even scratched the surface of its full potential. The GIS support was definitely pretty cool, though support for layering different GIS data sources would be even better. A proper zoom tool for the map would also be nice.

Overall, I'd say the greatest advantage this kind of software supplies is the opportunity for interactivity. Brushing and filtering were the most interesting and most informative parts of my experience with Spotfire. In a few hours, I was able to throw together a fun and flexible interface for perusing this dataset, the kind of thing that would probably take me months to engineer on my own in a web environment.



[add comment]