A3-AlvarezOshea

From CS294-10 Visualization Fa07

Jump to: navigation, search

Contents

[edit] Pubster - the Vizster of publications - Final Implementation

[edit] Description

Pubster provides interactive visualization of the relationships between scientific publications, as well as full citations. Commonalities across publications are shown as edges, which can be dynamically queried using checkboxes on the bottom of the display. The graph of publications is shown in the central panel. The default view shows keyword, year, and author links between nodes. Currently, common authors, publication years, keywords, and journals can form edges, and more than one edge checkbox can be selected at a time (selecting multiple edge types conjoins them with OR statements). Nodes (papers) are identified by first author and publication year. Clicking on a node displays its citation in the right panel; hovering over a node highlights its connections. Selected nodes remain selected and highlighted even after the graph structure has changed. Nodes can be re-arranged by dragging, and clicking on a node centers and anchors it. Not all nodes will have edges in every graph structure; those nodes are removed from the display when that edge type is selected.

The citations were downloaded, inserted into a spreadsheet, which was then used to specify nodes and edges in the data xml files.

Pubster was implemented in Java using the prefuse visualization toolkit.

[edit] Screenshots

Image:AlvarezOsheaPubster1.png

Screenshot of Pubster in default view (keyword and year edges on), with one node selected and its details visible in the bottom panel


Image:AlvarezOsheaPubster2.png

Searching with an author name (top) highlights two nodes: Ringach was first author on one paper and co-author on the other


Image:AlvarezOsheaPubster3.png

Selecting 'Delahunt 2004' and checking the year box causes only the papers from that year to be presented


Image:AlvarezOsheaPubster4.png

Papers with a common keyword. We have also zoomed in on the graph structure so the nodes are larger

[edit] Changes

The final implementation has edges defined by four attributes: author, year, keywords, and journal. 'Journal' was used to specify edges even though all the publications in the dataset we used came from the same journal (Journal of Vision), and hence are all connected. We downloaded full citations for the top 20 downloaded papers from JOV, and their associated images (which we would have liked to use as nodes, but there turned out to be too much spatial overlap). Download numbers for each paper are included in the data but are not displayed (no space in the bottom panel was created in which to show them). Ditto for the thumbnail images.

Due to complexity of implementation, selection by highlighting or clicking on an attribute in the details panel is not enabled. Edges had to be manually specified, since unlike in Vizster, not all nodes are inherently always 'on' and connected. This meant the citations had to eyeballed for shared attributes, like keywords, and the nodes then had to be manually linked in the input xml file. Thus if a certain node is selected and an edge type checked, all other nodes will disappear, eg. selecting 'Troje 2002' and only the author checkbox causes all other nodes to vanish, as Troje has no author links to other papers. We would have liked the other nodes to remain floating unconnected nearby. Ideally, the data would not have to be specified in four different files, and the program would actually query for the edges as the boxes are checked, instead of having the edges be specified beforehand.

[edit] Online Demo

You can interact with a live demo of Pubster at the following site:

http://jposhea.org/pubster

The domain host is a little slow, so please be patient and allow up to 1 minute for the demo to load. This demo uses the 20 most popular downloads from the Journal of Vision site as the set of publications.

[edit] Code

The code for Pubster can be downloaded here. This zip includes a README file with more details on executing the code; or you can read it here:

README: Pubster.java was written in conjunction with the Prefuse toolkit. We based it on the demos found in the Prefuse distribution, and we tested it within the prefuse environment. Our Pubster class can be found in the folder "assignment3" under the top-level prefuse directory.

Pubster requires 4 input files: pubster_author.xml, pubster_journal.xml, pubster_keyword.xml, pubster_year.xml

These should be placed in the current working directory. Pubster requires no command-line arguments.

The PrefuseWithPubster.zip archive should be opened in a Java development environment, eg. Eclipse (http://eclipse.org).

Within Eclipse Workbench: Select File, "Import". Then select "Existing Projects into Workspace". In resulting dialog, click the radio button for "Select archive file" and browse for the PrefuseWithPubster zip file. The "prefuse" project should then appear in the area below. Now just click the "Finish" button to import the project and build it. Once prefuse has been loaded as a project within Eclipse, you can run Pubster from within Eclipse by right-clicking the class file (prefuse/assignment3/prefuse/assignment3/Pubster.java) and selecting "Run >> Java Application" from the menu.

[edit] Comment and Breakdown

Total time: 30 hours

Writing and tweaking the code took the greatest amount of time in this project. The storyboard discussions also took longer than expected, but were important for the direction and clarity of the project. Manually transforming the data from bibtex citations into xml files was tedious (copying and pasting).

Jamie's portion: contributed to storyboard and part 1; coded the checkboxes, data, and functionality into Pubster; final writeup.

Amanda's portion: contributed to storyboard and part 1; obtained citations and images, created input data files; final writeup, description and screenshots.

[edit] Pubster - Part 1 storyboard and description

Pubster is an interactive tool for visualizing how scientific publications are related. The system allows exploration of links between publications on many different levels, and the goal is to facilitate the understanding of how a set of papers (on your hard drive, for example) is similar in terms of authors, keywords, and other attributes. Pubster allows visualization of the intellectual achievements that constitute the 'community' (as in Vizster) of a field, and lets you see how the field and the authors' research interests have evolved.

[edit] Overview

Pubster presents the set of publications as an undirected graph in which each node represents one of the papers from the data set. Nodes are connected to each other by an edge if they share a common entry in one of the attribute fields selected for providing structure to the graph (e.g. authors). The edges are weighted according to the number of common entries, and this is encoded by the thickness of the edges. The system also provides the functionality to select subsets of the data by searching for textual strings contained within any of the other fields provided in the bibliographic records for the publications (i.e. abstracts containing the word 'graph').

[edit] Data

Pubster visualizes data from the domain of journal article references organized in the form of a relational database. Each record in the database corresponds to a particular journal article, and the different fields in the table describe various attributes of the article. The following list describes each field in detail:

  • ID: a unique numeric identifier for the article record (NOTE: we may omit this field and rely solely on the DOI)
  • Title: a text string containing the title of the journal article
  • Authors: an array of text strings, each of which contains a single author of the article. The array may have an arbitrary number of author entries
  • Year: a numeric string corresponding to the year of publication
  • Keywords: an array of text strings, each of which contains a keyword or phrase that describes the contents of the article. This array may have an arbitrary number of entries
  • Journal Title: a text string containing the title of the journal in which the article was published
  • DOI: a numeric string corresponding to the article's unique Digital Object Identifier
  • Citations: an array of numeric strings, each of which contains a DOI corresponding to one of the article's references.
  • Abstract: a string of characters which contains the article's abstract

We acquired full citations for approximately 50 journal articles to demonstrate the capabilities of the system.

Not all fields can be used to encode the edges of the graph. Currently, Pubster allows the user to assign graph structure by searching for common entries within the author, keywords, journal, and year fields. The remaining data fields provide details about the specific publications as well as further query possiblities (see Features below).

[edit] Techniques

Pubster employs a physics-based layout algorithm to present the graph of publication nodes and their connections to other nodes in the graph. The user can select the attributes to use for assigning graph structure using dynamic queries. Dynamic queries are also employed to allow the user to select a subset of the graph (shown as highlighted nodes) by searching for text strings within any of the data fields in the bibliographic records.

These techniques are effective in this data domain primarily because the sets of publications people collect tend to have similar attributes within their bibliographic records. These common attributes lend themselves to the formation of a graph structure to visualize how various publications are related. We chose to emulate the interactive visualization techniques adopted by Vizster because publications, like people, form networks based on shared attributes yet it is often difficult to visualize and explore the patterns made by these relationships. Not only does the network structure provide useful meta-data about the field, but exploring commonalities across nodes (i.e. papers) can deepen the understanding of the scientific trends in the field, the hot topics under current research, or the authors making valuable contributions.

[edit] Features

Pubster allows users to visualize the similarities and relationships between publications by presenting journal articles as a graph of interconnected nodes. The user can select how the publication nodes are connected by selecting one of several attribute fields in the bibliographic records (authors, keywords, etc). Pubster additionally allows the user to select a subset of publications (shown as highlighted nodes) by searching for common text entries or mousing-over elements within the detailed record descriptions. The user can see detailed descriptions of the publications (details-on-demand) by clicking one of the nodes in the graph.

[edit] Layout

Image:AlvarezOsheaA3Layout.png

The layout of Pubster consists of three panels shown simultaneously. On the left is the main visualization window which displays the graph of the publications. The upper right panel contains the dynamic queries used for assigning structure to the graph and selecting subsets of nodes. The lower right panel displays the details-on-demand for a selected publication node. By highlighting text strings found within this bibliographic record, users can interactively select subsets of nodes containing the selected text strings. The nodes for each publication are identified with an image from the paper (if available), or simply a text string containing the first author's last name and year of publication (e.g. 'Agrawala 2004').

[edit] Implementation

Pubster will be implemented in Java using Jeffrey Heer's Prefuse Toolkit.

[edit] Potential Extensions

Structure the graph according to citations

Given a large enough set of publications, it will often be the case that certain papers will directly cite other papers within the set. These relationships could potentially be used to assign new structure to the graph.

Encode one of the attributes with the size of the nodes or characteristics of the edges

New attributes could be added to the bibliographic record such as number of citations or number of downloads. These data could then be encoded by the size of the node (i.e. the more downloads there are, the larger the node). Similarly we could control the characteristics of the edges (width,color) to encode the number of type of shared attributes.



[add comment]
Personal tools