FP-KesavaMallela-ChrisVolz-HannesHesse
From CS294-10 Visualization Fa07
Contents |
[edit] Visualizing the Twitter Stream
[edit] Proposal
[edit] Group Members
- Kesava Mallela
- Chris Volz
- Hannes Hesse
[edit] Description
The Microblogging service Twitter offers an exciting text corpus containing insights about the behaviors, interests and emotions of a large number of people.
Our group aims to mine data from Twitter's public timeline and create visualizations to better navigate not just the data, but to reveal information about the twitter posts that might otherwise be lost. This includes:
- Showing keyword trends over time (most popular terms measured against a time axis): Similar to Google Trends or Zeitgeist, the Twitter corpus offers very timely insights into current events and trends. We will try to identify new developments and emphasize them visually.
- Geographic abnormalities: Twitter posts are not geotagged, but user profiles generally include a static information. Using this 'home address', we want to explore geosegmentation of the stream to identify region-specific content. This could be augmented by an interactive map.
- Reveal user interconnectedness: Do specific user's contacts twitter at the same time? What does it mean if two users' streams are similar in some measures? How can we visualize this?
This application would partly be a tool for data exploration and thus support interactivity, and partly act as a kind of dashboard to spot and analyze current trends.
[edit] Previous work
- Google trends: http://www.google.com/trends
- Google Zeitgeist: http://www.google.com/press/zeitgeist.html
- Jaffe, A., Naaman, M., Tassa, T., and Davis, M. 2006. Generating summaries and visualization for large collections of geo-referenced photographs. In Proceedings of the 8th ACM international Workshop on Multimedia information Retrieval (Santa Barbara, California, USA, October 26 - 27, 2006). MIR '06. ACM Press, New York, NY, 89-98. DOI= http://doi.acm.org/10.1145/1178677.1178692
[edit] Initial Problem Presentation
- Link to slides here
[edit] Midpoint Design Discussion
Presentation Slides: http://docs.google.com/Presentation?id=ahg6rvptvq9m_192d4m2q6
[edit] Final Deliverables
- Geographic Visualization: http://groups.ischool.berkeley.edu/twitter/TwitterVis2.html
- Term Frequency Analysis: http://groups.ischool.berkeley.edu/twitter/msgexplorer/twitterviz.html
- Source code
- Twitter Public Timeline poller: http://groups.ischool.berkeley.edu/twitter/scrubbed/tweetgrabber.py
- TF Analysis code: http://groups.ischool.berkeley.edu/twitter/scrubbed/msgexplorer.tar
- Geographic Code (Flash): http://groups.ischool.berkeley.edu/twitter/scrubbed/TwitterVisGeo.zip
- REST API: http://groups.ischool.berkeley.edu/twitter/scrubbed/rest.tar
- Final Paper: http://groups.ischool.berkeley.edu/twitter/TwitterVis_Final_Writeup.pdf
- Poster: http://groups.ischool.berkeley.edu/twitter/twittervis-poster-2.ppt
[edit] Team Member Responsibilities:
- Hannes Hesse was the primary developer for the Geographic Visualization interface
- Kesava Mallela was the primary developer of the Term Frequency Analysis interface
- Christopher Volz was the primary developer for the twitter message collector and the REST interface for accessing the database
- All team members contributed toward the presentation slides and the final write up.
