From CS 294-10 Visualization Sp10

Jump to: navigation, search


Final Visualization

File:Final A3.jpg



How the visualization answers the question:

Process Wiki

Initial questions and Data set selection:

I am using a initial data set collected in a global warming experiment being conducted at Niwot Ridge in the Eastern Rockies of Colorado. There are three sites along an elevational gradient: Lower Subalpine, Upper Subalpine, and Alpine (increasing elevation). At each of the subalpine sites there are 20 plots and there are 40 plots at the alpine site. In each plot there are 5 sensors that measure both soil temperature and soil volumetric water content. Four sensors are placed at the corners of a 1-meter square at a depth of 5 cm. The fifth is at the center of the 1-meter square at 15 cm depth. Half of the plots at each elevation are heated with infrared heaters that were turned on half way through the data collection (although I do not know the date). This initial data collection is spotty due to staggered equipment installation, equipment malfunction, and miscommunication in collection procedure. Measurements are ideally collected at 15 minute intervals.

I am interested in microclimate variability at various spatial scales.

Initial questions: 1) Are there any overlapping data collection periods from initial data collection? 2) Are there differences in measurement variability within plot? between plots? between elevations? 3) Does measurement variability change with application of infrared heat application?

Data Preparation

First try: I'm using R to manipulate my data for Tableau. The data is directly from four different dataloggers, which I did not program, so I'm dealing with weird formatting. Basically their is some header info which I am dumping then a matrix of time (down the rows) and sensors (across the columns). The sensors have two instances, one measure is the soil temperature and the other is soil moisture. The labels for the sensor headers do not contain info about which sensor it is, so I correct this by changing the column names. I then attempted to loop through all the data to create a matrix with the sensor info and measurement type as column fields and only one measurement per tuple. The matrices are on average ~5,000 tuples by 200 fields before rearranging so I expect to have 1 million tuples per logger file. R crashes on this. I'm not thinking of how to code this differently.

Second try: (In the mean time I finally get a virtual PC up an running.) I have an idea to try to link the column names in Tableau to the appropriate sensor identification fields. Get this all ready to go in R and save the files as csv files. I go to import into Tableau and figure out that I can't use csv files. I change them to excel and eventually put all 4 datalogger files into one excel file on different worksheets. Then I figure out that I can join the field names to tuples in the sensor info table. So I try to play around with just one of the datalogger tables. Looks like I can make a timeseries using a custom time field -- Tableau tries to parse the datalogger time stamp into days, minutes, hour, month, etc and then combine the data for the resulting categories. This doesn't help me look for data gaps.

[add comment]
Personal tools