FinalPrototype-Group:MELODY
From CS160 User Interfaces Fa06
Contents |
Team Members
- Nankun Huang – MIDI conversion and software playback system
- Cheng-Lun Yang – Note recognition algorithm
- Roland Carlos – Changes to the paper interface, note recognition algorithm
- Julius Cheng – Writeup and presentation
Problem and solution overview
Problem
In the modern day, composing music is done on other paper or on the computer, via digital editing tools. Unfortunately, composers who only write on paper exclude themselves from the powerful and time-saving features of music composition software, and those that use software lack the portability, intuitiveness, and ease of navigation of paper. Currently, there is little in the way of bridging the two mediums – paper-based music cannot easily be transferred to the computer, and although computers can print out music, such printouts are not easily editable.
Solution
MELODY seeks to bridge this divide, to give paper composers access to the benefits of digital sound software while still preserving the paper composing process that is natural to them, and digital composers the freedom to use paper when it is not convenient to use the computer. Users will be able to write their music on paper, and have access to many features available to regular sheet music such as writing pieces with multiple voices, selecting different instruments, and erasing handwritten notes.
Target user group
Any music composer who has access to both paper and digital music tools can stand to benefit, since MELODY allows users to switch between the two mediums at will, providing the affordances of each when they are wanted or needed.
To be more specific, MELODY is best suited towards composers who are familiar with traditional music notation, and prefer composing pieces with it. With the democratization of digital editing tools, people without experience in traditional music notation are able to create music using music production software packages that look and feel drastically different than standard sheet music. The MELODY system reads a traditional music notation system, and thus cannot serve composers unfamiliar with it. It is more likely, then, that older composers and composers of older genres of music (classical, jazz, big band) will benefit from using the MELODY system.
Representative tasks
Composing a moderately complex tune with different instruments and multiple voices (Hard)
Users should be able to achieve some level of complexity in compositions they create with the MELODY system. The MELODY paper interface allows users to select different instruments, and specify multiple voices by joining two or more staffs together. While the standard for "complex" is subjective, to us it is sufficient if users can fully use the instrument selector and staff combining features.
This task represents the main purpose of the MELODY project - success in supporting this task reflects the success of the entire project.
Making edits to the composition they were to have completed previously. (Medium)
Users should be able to edit their work, especially given the constraints of having to use the non-erasable Anoto pen. For this task, a user should be able to indicate to the note recognition system that particular notes should be disregarded. Specifically in the MELODY system, this is done by activating "erase mode," using the appropriate region on the page and marking over the desired notes.
Composers using regular pen and paper can easily cross out their work, and pencil is erasable. In order to replicate this particular affordance of pen and paper, it is necessary to implement some way of editing work.
Playing back a composition through the digital organizing system. (Easy)
Users should be able to browse through all previously uploaded compositions and play them back. The organizing system should be intuitive and easy to use, and support features that make searching easier. Our software allows users to search by title, tag, or date, and can also narrow down a search by tag. Once a song is found, playing it should be trivially easy. The MELODY browser contains an onboard MIDI player, and playing a song is simply a matter of selecting a song, and pressing the play button.
This task represents the necessary digital counterpart to the paper system. MELODY seeks to bridge the paper and digital worlds, so software to handle Anoto pen data is vital to the concept of the project.
Design evolution
Initial sketches
The very first rendering of MELODY’s paper interface has many of the features necessary to recreate the freedom of traditional paper sheet music. As labeled on the image, the interface contains title and tag fields that make digital organizing easier. In addition to the standard five-line bars for writing notes in, there are instrument selector boxes next to each bar, and a “cue line,” a dotted line spanning all of the bars, which allow the user to specify that multiple bars should be played together when they are connected with the Anoto pen.
The very first rendering of MELODY’s paper interface has many of the features necessary to recreate the freedom of traditional paper sheet music. As labeled on the image, the interface contains title and tag fields that make digital organizing easier. In addition to the standard five-line bars for writing notes in, there are instrument selector boxes next to each bar, and a “cue line,” a dotted line spanning all of the bars, which allow the user to specify that multiple bars should be played together when they are connected with the Anoto pen. Finally, a feature unique to the initial sketch was item E, the preview button. This button would upload all of the stroke data into a cell phone with software that automatically converts the stroke data into MIDI and plays the composition.
One of the tasks we wished to support was the ability to erase notes and notation, since easy error-correction is one of the primary affordances of music paper. Our idea was that rather than cluttering up the paper interface with an erase option, we have a modeless gesture-based system of erasing notes, whereby a user would mark an ‘X’ over the desired targets, and the MIDI converter would recognize which notes to omit.
Not pictured is the software interface that organizes uploaded MIDI compositions. At this point, we had not yet received the pens, let alone information about the R3 toolkit, and were in the dark about the specific mechanics of the pen, so we did not make a visual sketch of it. We did, however, plan for the same features we have now: the ability to organize and search by title, tag, and date, and a rudimentary MIDI player.
Low-fidelity testing
The above image is a simulation of the music browser that organizes and plays uploaded compositions. As mentioned before, such a feature was discussed, but we created a visual sketch of the interface to print out for low-fidelity testing. The sketch contains the basic features necessary to locate and play a music composition, allowing the user to search for a composition by title, tag, and date. It also contains a rudimentary MIDI player with volume and duration sliders, as well as the ability to pause.
The paper interface remained the same between the initial brainstorming sketch and low-fidelity testing, since no user testing took place to evaluate it. We simply printed out our initial sketch to test with users.
Interactive prototype/Pilot usability test
When it came time to create a demonstratable interactive interface, we had become familiar with the Anoto pens as well as the R3 toolkit. Our new knowledge on how to practically implement the MELODY sheet music paper as well as the results from the low-fidelity user testing caused us to make several important changes to the functionality of the paper interface.
The most visually obvious change is that all of the fields denoted in the interface now contain the dotted Anoto pattern. This was of course necessary to have the prototype work with the pen.
The “cue line” used to combine multiple bars to have them play at the same time was no longer a dotted line, but a long and thin rectangle. It was an oversight on our part that a thin line cannot capture Anoto pen strokes. The new “bar combiner region” allows the user to draw a bracket, or even just a line spanning the desired bars.
The final major change was to how our second task, erasing notes and notation, is done. Our modeless gesture-based system was inadequate because a) the natural tendency of users was to scribble or make other markings other than an ‘X’ and b) the area that an ‘X’ would cover was poorly defined to the user when erasing multiple notes with a single ‘X.’ A solution would be to only allow erasing one note at a time, but this would prove tedious for erasing longer passages. Our solution is mode-based, with two regions at the bottom of the page that toggle an “erase mode” on and off. When erase mode is on, any marking made on a note erases it, allowing users to scribble, dash, or mark over deletions in any way. Despite what we have learned about avoiding modes, we decided that the added risk of having modes was worth the ease of use gained by being less restrictive.
The interactive prototype of the browser software is only a more polished and professional-looking version of the the low-fidelity sketch. The only functional difference is a blue help call-out button.
The current version
After conducting the pilot usability test with the interactive prototype, several issues came to light. First, the Anoto pen is limited by design in that the pen does not read entirely accurate coordinates when it is tilted, due to the displacement of the camera’s focus. This was a critical issue that we did not foresee, since having note recognition that can accurately tell where a note is placed is vital to our system. Rather than forcing the user to awkwardly hold the pen straight up, which unfairly places the burden of the pen’s limitations on the user, the only viable solution was to enlarge the five-line bars so that any displacement of the stroke would be minimized. The wrong note will still be read if the pen is held at an unusually slanted angle, but any more and the appearance and feeling of regular sheet music would be lost. We are not pleased with the current appearance, and to us, the size of the bars is at the very threshhold of aesthetic acceptability.
The second issue was that users from the pilot study would sometimes forget that erase mode is on, since there is no way to give the user feedback about this with only pen and paper. This would lead to a serious error, but would not be a catastrophic loss of work, since users thinking erase mode is off would not mark over existing notes, but would instead make new ones. In most cases a user would simply lose any information intended to be written normally, but this is still a serious error. In the end, we did not change it because we imagine that an expert user would experience that sort of error less, and would take extra precautions, such as disabling erase mode whether or not it is on, just in case it is. Additionally, returning to the modeless form of erasing with an ‘X’ would heavily interfere with the note recognition algorithm we implemented for the pilot study.
Details about the note recognition algorithm will be discussed in the following section, since it came late in the design process.
There is no picture of the music browser; its design remained the same, since the our focus was primarily to get note recognition and MIDI conversion running, and user testing showed no problems with it. Of course, it was no longer the same dummy interface used during interactive prototype, but a working version that actually processes music conversion and organizes MIDI files.
Testing technique evaluation
It is difficult to say what most valuable evaluation technique was. Low-fidelity testing gave us a first taste of how users interact with unfamiliar interfaces. Even though we did not make large design changes as a result of it, we gained a sense of what users would think about certain features, which played a role in future design considerations. The interactive prototype testing was not much different, since all of the difficult note recognition and MIDI conversion had not yet been implemented. The software browser was ‘wizard-of-oz’ed, and was thus effectively the same as when we represented it with paper for low-fidelity testing.
The pilot study was equally as useful in terms of studying user behavior, since our ‘wizard-of-oz’ runthroughs already covered all important user interactions. By the pilot study, when note recognition and MIDI conversion was implemented, we were testing most of the same features all over again, except with semi-working note recognition and stroke conversion systems. Although, the pilot study required us to apply some of the rigorous logging techniques learned from readings and interpret the data, which did not lead to any significant design changes, but helped us gain a better understanding of user testing in general.
Each of the testing sessions were about equally useful, mainly because they were all very similar, in terms of the interface we presented and the data we gathered.
The Final Interface
Functionality
The MELODY system composed of two parts – a paper interface and the playback system. The Anoto pen is used to write music on the special paper interface, and eventually the strokes uploaded to the playback system system, which interprets the pen strokes on the paper as musical notation, converts it into MIDI, and organizes it by title, tags, and date.
Each sheet of MELODY paper is an individual composition that can be titled and tagged by the author. Music notes can be written on dotted-pattern regions, and users may specify a different instrument to be played for a particular bar. Multiple lines can be combined to indicate that the music is to be played in score form; that is, simulatenously rather than sequentially.
Even though the Anoto pen is not erasable, the paper interface supports erasing via an “erase mode” that can be activated and deactivated directly on the page.
When the pen data is uploaded, the playback system automatically analyzes the pen strokes excluding erased notes and converts them into a single MIDI file. The software lists all previously uploaded compositions and a user is able to locate a file by the title and/or tags that were written on the page, or the date of the upload. A file can be played directly from the MIDI player on the interface.
User interface design
The labels on the above image indicate special features of the paper interface that allow the MELODY user to take advantage of the supported functionality above. Each item acts as follows:
A. Title field - The user must title his or her composition by writing a name into this field. Anything written in this field is read with OCR and converted into an ASCII string that serves as the name of the resulting MIDI file.
B. Tag field – The user may write zero to four tags that describe and categorize the composition here. These regions are also OCR-activated, and the written tag, once uploaded with the file, can be used to narrow down the search for a song in the playback browser.
C. Standard five-line staffs – These regions form the main body of the page and are where music notes and notation are written. Notes written here are recognized and uploaded. Unless “combined” (described in item E.), the staffs are sequential and not simulataneous, which means that each staff is played at different times, one after another.
D. Instrument selector boxes – Next to each staff is a square box where the user may write-in a single letter that is read by OCR. This letter represents the code for a musical instrument that the neighboring staff should be played in.
E. Bar combiner region – This long vertical region allows the user to indicate which staffs should be played together, instead of sequentially. One can draw a bracket or simply a line spanning the staffs that should be played together. The converter automatically interprets strokes made here.
F. Erase mode activator – Making a stroke in this region activates erase mode, which, when on, causes any stroke over an existing note to “delete” it, meaning that the MIDI converter will omit any such notes.
G. Erase mode deactivator – Making a stroke here turns off erase mode. A strategy for preventing accidents involving leaving erase mode on is to mark this region before one begins working.
This is a screenshot of the counterpart of the paper interface: the software interface that receives and manipulates the data. The labels on the above image indicate special features of the software browser interface that allow the MELODY user to take advantage of supported functionality. Each item acts as follows:
A. Song selection field – All songs created by the software are listed here. The search can be facilitated by clicking on the title, tag, or date headers to sort the list by that field. Each can be clicked again to switch between ascending and descending order.
B. Tag selection field - This window lists all the tags of every song uploaded and how many songs there are of each. Selecting a tag here limits the list of songs in the song selection field to only songs containing that tag to facilitate searching through a large library of songs.
C. MIDI player – Once a song is selected, it can be played with the onboard MIDI player. The song can also be paused, fast forwarded or rewinded by manipulating the duration slider, and the volume can be adjusted.
Implementation
The entire paper implementation and is programmed in Java and Swing with the R3 toolkit. We currently do not fully support batch data due to its incompleteness in the R3 toolkit, so all pen/paper interactions are streamed to a running Java program that displays strokes made in real time. The details of how this works are not particularly novel; it is a basic streaming and displaying mechanism.
What happens with the strokes is worth some discussion. Our note recognition algorithm is somewhat strict because our implementation is simple, and not an entire statistical analysis like many modern character recognition methods.
We support detection of four kinds of notes: eighth notes, quarter notes, half notes, and whole notes, and its pitch. Each note must be written with a single stroke. The algorithm, contained in the Java program, first determines pitch by the y-coordinates of the first sample in the stroke. Then, it determines the note by first analyzing the min-y and the max-y. If the range is small, then it is a whole note. Then, if the stroke occurs over a shorter amount of time, it must be a half note because quarter and eighth notes must be filled in. Then, to differentiate quarter notes from half notes, half notes are determined if the end point is closer to the origin. This detects the “tail” on the end of an eighth note. The pitch and type of each note is sent to a text file that will be read by the MIDI converter.
As one might tell, this algorithm imposes a rigorous method of writing is required for a note to be recognized correctly. Since the focus of this class and the project is on user interfaces and not algorithms, we did not apply a more accurate but computationally expensive and complicated statistical analysis of the samples.
As will be discussed further in the final section, we have not yet implemented some of the advanced features, although they are not far from completion, and we have a clear idea on how to implement them. OCR is not yet implemented, but this is only a matter of integrating R3’s new OCR capabilities into our Java program. Once OCR is supported, the title, tag, and instrument selector boxes will be fully functional. Supporting the staff combiner in just a matter of reading min and max-y coordinates. Once these capabilities are added, the Java program would write this data to a text file.
Bridging the gap from paper music to digital music is the MIDI converter, which uses the Java sounds library in JDK. This program’s custom code can support writing in different instruments and multiple voices even if our paper interface cannot yet provide that data. The code reads the text file outputted by the paper-reading program and converts the title, tags, notes, and other data within into a MIDI file.
This MIDI file is handled by the music browser, whose interface is built with Adobe Flex and file organization system with MYSQL.
What was left out and why/'wizard-of-Oz' techniques used
As of the writing of this report, some of the highlighted features of the MELODY system have been left unimplemented, though in the days between the deadline of this report and the presentation, some of the following omitted features may instead be working.
Most of our efforts were directly towards the note recognition algorithm and the conversion of recognized notes into MIDI, since those aspects are more core features of MELODY. For the final presentation, we decided that it was best to showcase the core essential features of the system, namely note recognition and MIDI conversion. Many of the advanced features that we planned did not fit into our time table, since we wanted a few basic features to work flawlessly, rather than having many features that may not always work.
Optical character recognition has not yet been implemented due to time being spent on note recognition and MIDI conversion. This leave the title, tags, and instrument selector box regions unusable for now. For the time being, we “wizard-of-Oz” the title of the piece by giving it a generic name upon uploading.
The staff combiner does not yet work. It is relatively easy to analyze strokes to determine which staffs the user wants to combine, but again, our focus on the core features left us little time to work on it. The staff combiner data is not yet used for anything, but the pen strokes are still captured are displayed on the screen during streaming. No ‘wizard-of-Oz’ technique can really be used here, since the conversion algorithm ignores the field, and we cannot “fake” that it does not. The very same applies to the instrument selectors.
Erase mode can be toggled on and off by the two activator and deactivator regions, but they do not actually erase notes yet. Currently, any algorithm we can imagine that detects drawing over existing strokes is exorbitantly computationally expensive.
Lastly, the current system does not fully support batch data. Batch data was not fully supported by the R3 toolkit in the midst of our design process, so most of our work was done with streaming data. Indeed, this limits the portability of MELODY paper to the range of a Bluetooth device, but fortunately for our purposes, streaming data is very conducive to testing and live demonstrations. No ‘wizard-of-Oz’ is needed, since the functionality is the nearly same as it would be in batch data mode.
For any future work, we plan to implement OCR and staff combiner handling. Our MIDI converter can already read that data, so we just need to implement the R3 toolkit’s OCR support and our own stroke analyzer for the staff combiner. These features are easier to add at this point. After that comes the more time-consuming tasks of implementing batch data support and improvements in note recognition.
Oral Presentation
Image:MELODYfinalpresentation.zip
Poster
(To be added)








