Group Members

Mason Smith

Description: Visual Queries for Tree Structures

The Penn Treebank is a collection of parse trees for sentences from the Wall Street Journal and other corpora, used for NLP tasks like part of speech tagging and parsing. Because of the tree structure of the data, querying for patterns or substructures in the treebank is troublesome using standard text-searching tools like regular expressions. Some specialized text-based tools, such as tgrep/tgrep2 and tregex, allow for structure-aware querying of parse trees. However, such text-based queries become unwieldly when spanning multiple tree levels.

For my project, I plan to create an interface for constructing queries visually. The user will create and connect nodes to form a search structure that can be used against a treebank. The visual construction of the search query will more closely match the nature of the search itself, allowing for more complex, yet still comprehensible, tree queries.

