Pilot Usability Study-Group:100 Proof
From CS 160 User Interfaces Sp10
Contents |
Introduction
Beerpressions gives beer aficionados an easy and non-intrusive way to detail the subjective experience of tasting a beer in real time. With many different kinds of beers all around the world, it is difficult to remember particular beers and one's opinions of them. Our application aims to address these issues by providing an easy-to-use interface for recording one's thoughts that isn't intrusive to the drinking experience. In this initial usability study, we took our interactive prototype version of our beer rating-and-ranking app into the field for our first real hands-on user testing. Our purpose in this study was to observe users working with our app in a natural setting for its use (namely: a beer-focused establishment) and measure their successes and failures, both quantitatively and qualitatively. In this way, we intended to gain insights into ways to further improve the usability of our app, as well as keep an eye on any potential flaws that we had overlooked thus far.
Implementation and Improvements
- Fine-tuned natural-language results screen so that aspects of a beer not rated at all on the taste screen were excluded, to reduce unnecessary clutter.
- Sorted the taste wheel scores on the results screen from lowest to highest so that reading the results is easier on the eyes.
- Moved the "pop-up" location of adjectives when sliding the taste wheel to avoid problems with finger occlusion that were not noticed in the simulator.
- Modified the default tags to remove overlap with taste wheel characteristics.
- Changed the tag icons to be created programmatically, instead of the "hand"-drawn static images that had been used. This allows for much greater extensibility of the tag screen.
- Back end code improvements, to make it easier to access tag variables.
- Added full line-up of Jupiter beers to the "Rank by Category" screen, in preparation for doing testing at Jupiter.
Method
Participants
As in our previous studies, we found our participants by visiting a local beer bar, as it is the sort of place that, by definition, our target users are likely to spend time. We picked Jupiter this time, as it offers a wide variety of beers and is close to campus. All three of our participants were college-educated, male beer-drinking enthusiasts, aged 26-42. Other than selecting a subset of people by virtue of our location, our selection was essentially random. We did, however, pick participants that were only drinking beers and not there to have a meal in order to avoid casual drinkers. We ended up with two iPhone owners and one person who was completely unfamiliar with the iPhone (!).
Apparatus
For this study, we installed the latest version of our app onto a class-provided iPod touch. We used the stopwatch feature on a group member's iPhone to track time. Simple notepads and pens were used to take down observations. As previously mentioned, our study took place at Jupiter, a brewpub near the UC Berkeley campus.
Tasks
Task 1: Record Information About a New Beer and Tag It
This "easiest" of tasks is also the most vital: adding a brand-new beer to the application's database. This task includes the absolute basics: the beer's name and brand. It also includes entering some important details - price, location had, and seasonality - as well as a few ratings, including an overall score and notes on a beer's color and aroma. Finally, the user will use the tag screen to "tag" a beer with characteristics that jump out at them. This may mean dragging a default tag into the sack, or entering one by hand - it depends on the person and the beer!
Task 2: Rank Beers Within a Specific Category
Our task of "medium" difficulty involves ranking a beer within a particular category. With the goal of flexibility in mind, we decided to let the user pick a category - which can be a brand, a location, or for ultimate choice, a 'tag' (which can be user-defined) - and sort beers within that category. In this way, the user can not only give beers absolute ratings, but also rate them subjectively, relative to each other.
Task 3: Taste Wheel
Our most "difficult" task involves users inputting a number of different pieces of information about the beer they are tasting through the use of our Taste Wheel feature. Our wheel goes beyond the offerings of most other similar apps out there, and uses a somewhat novel interface to do so. It allows the users to interactively rate eight key aspects of a beer's taste, in a quick and fun manner, all on one screen. It is deemed the most difficult task because while we feel it is easy to learn to use, it is the most novel of our interfaces, and therefore the most potentially challenging to new users.
Procedure
We located our subjects by looking for people who seemed to be at Jupiter to enjoy the tastes of their various microbrews, and not simply there for a meal. (Jupiter offers house brews and "guest" brews, and also has a full dinner menu) Once we located the subjects, we approached them, explained who we were and what we were trying to do, and asked for their (signed) consent. After showing them a quick demo of our app, we had each participant complete each of our three tasks as outlined above, one at a time. As they navigated through the app, we had them voice their thoughts so we could better understand their interactions. We did not provide step-by-step instructions, but just a general goal, so that we could see our app through the eyes of a user unfamiliar with it. Finally, we thanked our participants, and bade them enjoy the rest of their beers!
Test Measures
- Time to complete each task - to make sure that each screen can be used fairly quickly (under a minute is our goal).
- Number and type of errors made - to see how many interface problems an average user would run into, and how severe they are.
- Type of and details about any critical incidents - to gain insight into areas where we might improve either our UI or our functionality.
- Whether use of the app precludes drinking beer - because it's all about the beer!
We also attempted to make quantitative measurements to determine how successful each of our 3 tasks were at their intention: to replace handwritten beer notes.
Task 1 Hypothesis: The aspects of a beer that a user tagged would be a subset of the aspects they described verbally. Independent variable: # of aspects described verbally. Dependent variable: # of aspects tagged. Rationale: We suspected that the effort required to tag and input a beer would impede people from tagging all the aspects they associated with a beer.
Task 2 hypothesis: Users would be more efficient at ranking a beer by drag-and-drop than by individually giving each beer a score. Independent variable: # of rankings to assign. Dependent variable: # of "drag" actions executed. Rationale: Users would spend less time creating rankings by drag-and-drop then by assigning 1-5 stars to multiple beers.
Task 3 hypothesis: The qualitative values input into the taste wheel would closely reflect the values the user would attribute on a quantitative scale. Independent variable: User's "actual" quantification of a taste aspect (written on worksheet, from 10 to 100) Dependent variable: Quantity inputted on taste wheel (measured as distance from wheel origin) Rationale: Users will be able to visually correlate the size of the "slice" they input in the taste wheel with their subjective score, say, from 1 to 100.
Results and Discussion
Results of Pilot Study and Critical Incidents
Notes:
- All times include include participants "thinking out loud" while using the app, which seemed to slow them all down somewhat.
- None of our participants' beer drinking was impeded by app use. All users would pause, hold iPod in one hand, and take a sip with the other, when they wanted to remind themselves of the taste.
Participant 1
Task 1: 3 minutes, 7 seconds. (General: 2:02; Tags: 1:05)
Task 2: 57 seconds.
Task 3: 50 seconds. (did not read help text)
Errors: 3
1. (Task 1) Had trouble with slider - tried to tap instead of slide. Heuristic: Match between system and real world, since the wheel affords "filling" the slice rather than tapping. Severity 1 since both input methods work exactly the same.
2. (Task 1) Couldn't tell sack was clickable, despite on-screen instructions next to it. Heuristic: Match between system and real world, since the sack represents the collection of all tags. Severity 2, since there is few other ways for the user to get feedback on what he has tagged the beer with.
3. (Task 3) Claimed the taste wheel "didn't work." (Unclear - seemed to stem from frustration and/or lack of interest) Heuristic: Efficiency of use. Severity: 4, major problem since there is no other way to motivate the user to enter data.
Participant 2
Task 1: 2 minutes, 16 seconds (General: 1:36; Tags: 0:40)
Task 2: 46 seconds
Task 3: 1 minute, 30 seconds (spent 21 seconds reading help text)
Errors: 2
1. (Task 2) Tried to find Jupiter under "Brewery", and not "Location" (an odd case we hadn't considered: it is both!) Heuristic: match between system and real world. Severity: 2, since it is really a specific case of brewery/location (Jupiter happened to be one)
2. (Task 2) Tried to drag beers from center of text box instead of by the 3-dash "drag icon" on the right. Heuristic: Match between system and real world, since the box "affords" draggability, not just the icon. Severity: 1, since we would have to rewrite the dragging routine to support this, and familiar iPhone users will recognize the bars.
Critical Incidents:
- Complained about visibility of adjective pop-ups in taste wheel
Heuristic: Visibility of system status. Severity: 2, since the adjectives are secondary to the visual display, but the problem is persistent.
- Claimed tag wheel instructions misleading: he tapped, while the instructions said drag. (either works)
Heuristic: Consistency, since it says to drag while tapping also works. Severity, 2, since the system works fine once the user learns the ropes.
Participant 3
Task 1: 2 minutes, 30 seconds (General: 1:40, Tags: 0:50)
Task 2: 50 seconds
Task 3: 53 seconds (spent 12 seconds reading help text)
Errors: 2
1. (Task 1) Hit a textfield and keyboard popped up, but decided not to fill it out. Tried to keep editing page without tapping "return" to dismiss keyboard, then tried to find a different way to dismiss it (tapping outside of the keyboard). Heuristic: error recovery. Severity: 2, since the user worked their way around the keyboard easily.
2. (Task 2) Also tried to find Jupiter under "Brewery" and not "Location"
Critical Incidents:
- Did not notice "add custom tags" button.
Heuristic: Flexibility, since the user could have put in more detailed information but the app did not give him the apparent power to do so. Severity:1, since the "custom tags" are underutilized already.
- Thought the ability to enter text or voice recording "would be cool".
- Thought his beer was "too bitter" - wasn't sure how to note a negative like this.
- Expressed desire for a social, Yelp-like beer information system.
Results of Quantitative Experiments
Task 1: Number of Tags
Number of Tags
| Participant | Aspects Named | Aspects Tagged |
|---|---|---|
| 1 | 2 | 5 |
| 2 | 1 | 4 |
| 3 | 1 | 6 |
What we found was that the users actually tagged MORE taste aspects in the tag screen then they had named out loud. The explanation for this is simple: While the user may not have thought of the "wheaty" aspect of a beer when naming aspects out loud, they would have it in front of them in the application. Thus, tagging aspects became a recognition, rather than recall task. The implications for the app are twofold: First, users submit more tags than we had expected, and secondly, the custom tag function is underutilized. One way we could approach solving the issue is by collecting broader data about custom tags using a larger participant pool. If users tend to over-tag beers, then it would make sense to have as many tags on the screen as possible to most accurately describe their taste experience.
Task 2: Ranking of Beer
The results of our experiments were essentially null, since all of the users were either confused by some part of the ranking system, or had not tried enough beers on the list for it to be meaningful. Thus, although we had planned out the measurements for the experiment (how many reorderings were performed) we collected no meaningful data. The fact that we could barely convince any of our participants to use the ranking system means that we should redesign the interface to be more flexible in terms of what beers appear on the rankings.
Task 3: Taste Wheel
We gave our first user a scorecard where they recorded a number for each of the eight aspects on the taste wheel. We then compared the output of the taste wheel to their written response.
Taste Wheel Accuracy
| Aspect | Scorecard Rating | Taste Wheel Input |
|---|---|---|
| Hoppy | 100 | 80-100 |
| Creamy | 50 | 55-80 |
| Floral | 80 | 55-80 |
| Sweet | 60 | 30-55 |
| Fruity | 90 | 55-80 |
| Full Bodied | 100 | 55-80 |
| Malty | 0 | 1-30 |
| Nutty | 30 | 30-55 |
The conclusion is that while the taste wheel is accurate for the "most prominent features", such as hoppiness in this case, the results are much more varied for the intermediate taste aspects. Several of the aspects in this example fall outside of the range described in the taste wheel. The best explanation for this is that the slice of the wheel is a 2d area, while the measurement we actually take is linear (the distance from the origin). Thus, we should adjust the data output of the taste wheel (maybe use a quadratic or log function) to make the intermediate measurements more accurate. One easy way to do it is with repeated trials on a larger population: assign scoring functions to different participant pools, run several trials with each participant, and then compute the square-difference of the eight aspects against the actual (integer-valued) output from the Taste Wheel. we could then pick the function which most accurately represents people's subjective taste scores.
What We Learned
Overall, despite having one participant who seemed uninterested in mobile computing in general (despite his interest in beer), our results were fairly positive. None of the errors that we encountered were large problems with our application, but we gained insights into how to further fine-tune our application. Also, we were pleased to note that the use of the app while concurrently enjoying a beer seemed natural to all participants.
We did run across a few issues that we plan to address, though. Perhaps most obviously, as it occurred with two of our three participants, is to make sure that a "Location" that brews its own beer is also listed in the "Brewery" list as well. The two lists might well be different - Jupiter, for example, serves beers that they did not brew - but people should be able to think about beer rankings in whatever manner makes sense to them without worrying about the app's setup.
While we did make a modification to it between our prototype presentation and this exercise, it was clear that our taste wheel's pop-up adjectives still need to display themselves in a better spot. We believe we've found a way to do this, by moving the wheel down, and having all adjectives (regardless of which slice is being manipulated) appear to the upper-left of the wheel.
Another minor tweak we may investigate is to see if we can cause the keyboard to retract by tapping outside of its area. This would provide a quick and easy way to dismiss the keyboard, and would address one of the errors we ran across in our experiments - when a user taps a textfield but then decides not to enter anything.
Because one of our users did not notice our "add custom tag" button, we discussed options for making the button more visible. Our thinking is currently that the button should be changed from a textual description to a more eye-catching visual. We feel the clearest choice here would be to have a "+" icon, probably with a small description underneath.
For the Taste Wheel's "help" dialog, we may re-work the text slightly to note that you can also simply tap the wheel, instead of touching and dragging. With the current instructions, users felt that the only way to interact with the wheel was to touch & drag.
While it wasn't something that we set out to measure, we couldn't help but notice that our participants seemed somewhat constrained by the options our application initially provides. That is to say, while they may have mentioned a beer characteristic out loud, they did not necessarily think to input it if it wasn't a default tag or element of the taste wheel. We believe this goes to reinforce our previously-held idea that our taste wheel should be customizable, as well as our new finding that the "custom tag" button should be more prominent.
Finally, we pondered how to address the issue of "negative" notes that was brought up. We toyed briefly with the thought of having an "evil" taste wheel that could be accessed as well. However, without clear antonyms of the current taste wheel characteristics, we felt the added complexity and potential confusion this would bring wouldn't be worth any gains in functionality. We feel that our "tagging" system is still an acceptable way to note negative aspects, especially when used in conjunction with the overall rating and within-category rankings that we offer.
Screenshots of Changes
Appendices
Materials
File:100 Proof demo script.doc
File:100 Proof consent form.doc
File:100 Proof beerevalform.doc
