User:David Purdy

From CS294-10 Visualization Fa07

Jump to: navigation, search

I'm a PhD candidate in Statistics.

Here's my homepage in the Stats Department.

My area of research is machine learning, and I'm especially interested in very high dimensional data sets - beginning at hundreds of variables and going on up to the millions of variables. These are a little hard to visualize.  :) As for the the number of observations - as much as I can get ahold of - thousands, millions, whatever.

As for visualization, everything changes at this scale. We're no longer even able to observe individual points, and densities are more important.

Books and articles I like

  • I really like "Graphics of Large Datasets: Visualizing a Million" - it has a lot of interesting ideas, and there's still a lot of room for new methods.
  • "Scagnostics" by Wilkinson et al., is also very nice. Scagnostics was a Tukey & Tukey term for Scatterplot Diagnostics - the idea being that measurements of scatterplots could be used to indicate how interesting they might be. For plots of pairs of features, we can use these diagnostics to indicate which scatterplots we should look at first. The idea is quite appealing, especially for machine learning with very large, very high dimensional data sets: if we can do machine learning on the data, why not do it on the plots, and then accelerate our understanding of what's going on by being able to visualize interesting aspects of the data.


[add comment]