Quantitative Evaluation

From Cs160-sp08

Jump to: navigation, search

Lecture on Mar 6, 2008

Slides

Lecture Video: Windows Media Stream Downloadable .zip

Contents

Readings

Jonathan Chow - Mar 03, 2008 11:34:26 pm

Parts of these articles were pretty interesting, although I can't say that much new information was presented to us. Most of us should have learned the scientific method somewhere before in high school. Nonetheless, some points were further elaborated and useful. One such point was in regards to pilot testing the experiment yourself to make sure that it measured the right things and the act of administering the experiment was doable with the number of experimenters present (referring to the complicated equipment example). In some ways this point about designing an experiment reminded my of designing an interface. Also, one of the other things that I found interesting was the discussion of using composite dependent variables. The idea that we could have a bunch of detailed numbers that don't matter to people is an interesting concept that I think heavily applies to interface design too. As a side note, I did find the little comics to amusing, too bad all readings don't have those...

JessicaFitzgerald - Mar 03, 2008 11:50:19 pm

I read through these chapters trying to find how we could apply what we read to help us with user interface design. I didn't see a connection with experiments, although it seems as though it is always helpful to know experimental analysis. I did find interesting the three methods of determining reliability. I thought it might be beneficial to use test-retest reliability where the experiment was used on the same group at a different time, except the interface would be modified to update the feedback from the first test. This way, the test subjects would be able to further evaluate the interface with a completely updated version of it. In the situation of user interfaces, for the most part we are able to observe the behaviors we are looking for as responses to using the interface. Therefore it seems unnecessary to be using so many variables and scientific techniques to help us determine the usefulness of the design.

Gerard Sunga - Mar 04, 2008 08:26:54 pm

The readings were fairly interesting, but I'm not sure I see the use in testing a user interface in the context of an experiment when compared to the other methods advocated in our previous readings throughout the past weeks. Of course, in a sense, the testing of an implementation of the user interface is an experiment of sorts, with the user interface's layout and style acting as the independent variables, and so forth, with other aspects of testing the implementation matching up with the corresponding aspects within the context of an experiment. Nevertheless, the the chapters provide insight into some of the more subtle aspects of testing the user interface, especially the environment where one uses the interface in question as well as the desire to produce findings that will be consistent and reliable.

Gary Miguel - Mar 04, 2008 10:18:51 pm

"Science" is in the title of my major, and yet this is the first time I've actually been reminded about the scientific method. It is actually quite important to keep in mind that there is only one way of determining truth in the world: experimentation. I enjoyed these chapters because they exposed me to how complicated it can be to run an internally and externally valid experiment. I can see how a designer of an experiment could easily get caught up in the details of his experiment and miss a major flaw that actually invalidates his results. Just like a programmer who misses a big bug because he's focused on the syntax of each line. The part about pilot experiments reminded me of the iterative design process. Practice in low fidelity, cheaply to find some of the errors in the design, then do it for real.

I just read an article [1] about how scientists aggregated together a bunch of studies to try and find trends, but people were upset because some of the studies weren't "high quality". It truly is difficult to ascertain truth. But it's important.

Gordon Mei - Mar 05, 2008 02:03:40 pm

One of the most interesting points to me made by the article was that the tests of science in the conventional sense require behavioral reactions by the users that we can see publicly. Considering that we're testing user interfaces on users, much of the emotion and learning is largely private, and consequently hidden from us. This means that part of our scientific tests would have to rely partly on what they tell us as feedback, which may not be comprehensive, accurate, or fully revealing. The article mentions that physiological measures are used (such as polygraphs for lie detection), but in cases like that, they require a level of consent to be used as evidence or for whatever it may be used. It seems that in order for the tests to qualify as true scientific tests, there are continued efforts to use physiological methods to gauge these emotions, from attempts to correlate pupil sizes with happiness, to brain imaging.

Michelle Au - Mar 05, 2008 04:47:31 pm

The section about threats to internal validity brought up some important points to consider when testing the usability of our interfaces. The points on maturation, testing and selection seem most applicable to our situation. When selecting test subjects, we should consider their previous experiences with similar products and their familiarity with our interface. In addition, participants should also be selected in such a way that they cover the entire scope of the target user group. In regards to maturation, test subjects with similar experiences may be able to navigate through the application with fewer errors and perhaps may not pick up on poor design choices because they have already become accustomed to such interfaces. This can also be the same case for subjects that have already tested the interface and became familiar with the application. These threats to internal validity have to be taken into account when conducting design and user tests.

Khoa Phung - Mar 05, 2008 07:48:07 pm

This reading was very helpful on how to create testing environments and what to watch out for. I wasn't aware of the fact that there are so many variables that can cause the result to be invalidated. The examples were very nice and concrete to understand the differences between each possible error. Understanding these differences will enable me to create better testing and take into account what I might have overlooked.

The second reading goes into more detail on how to construct a test. It shows the steps in detail from defining variables, choosing appropriate ranges, and then run a pilot before selecting a strategy on how to perform the actual testing. The author lists pros and cons from each approach which help to predict certain outcomes. In addition, the author went into detail concerning validation and reliability and the ruler example was nice. In addition, the author again mentions different interfering variables. I believe that these two texts actually make me more aware of testing practises and problems that I have not been aware of before. This will enable me to construct tests more carefully and taking these errors into account when actually evaluating the results.

Chris Myers - Mar 05, 2008 07:54:51 pm

Overall I found the reading rather enjoyable and well-presented. I noticed a lot of emphasis on definitions and choosing variables. This process can become complex when trying to gather quantitative data from highly variant sources. It is very easy to knowingly skew information based on the type of sampling and how a term is operationally defined. If we select users to interview that are tech savvy and enthusiastic about it, we will get significantly different results from people who are ignorant and unhappy. Does it work? I don't know, we should have interviewed people who are the target of our app. (hypothetically)

Eric Cheung - Mar 05, 2008 08:15:16 pm

I thought while reading the two chapters that some of the techniques they describe to get internally valid data (e.g. randomness, good selection) are easier to follow when you have a lot of potential test subjects. For the user studies in this class, I'd think it would be hard to get non-biased data when you only have 3 or 4 people in your sample size. It seems like the techniques given would provide a good starting ground for trying to get the most diverse group possible, but it's still kind of limited. I guess this is why we went over heuristic evaluation and other forms of testing without users, so that we can still have a comprehensive overview of our interfaces. I think Chapter 7's section on validity can generalize pretty well to lo-fi prototyping. When we're observing our users, we should probably make sure that we're picking tasks that accurately measure the effectiveness of the interfaces.

Nir Ackner - Mar 05, 2008 10:06:34 pm

Martin's discussion of how the act of testing can bias results was particularly interesting to me. While the focus was on potential effects of reusing test subjects, another issue is the reuse of the same test administrators. Often, when people work on an interface they are reactionary, fixing a design issue from one test by introducing a new design issue. Focusing on whether the fix works in subsequent testing, test administrators will often miss the new problems that have been introduced into the design. Varying the team of designers and test observers as well as the test subjects themselves can help address this problem.

Megan Marquardt - Mar 05, 2008 10:40:41 pm

I found the reading pretty dry, considering it was about experimental procedure. The portion that was most intriquing was the discussion of the psychological vs. physical experiments. The definition of each term in the experiment is needed, similar to a philosophy paper. This was explicitly discussed in the notion of a "murder", where there are several different situations, motives, results, etc resulting from a murder to defines what it is. The rest of the reading seemed to be things already dealt with in science and statistic classes, such as figuring out the reliability and validity, which correlates to precision and accuracy (respectively) in my mind. There are small differences between each pairings, but they seem to correlate the psychological experiment measurement requirements to the physical experiment ones.

Maxwell Pretzlav - Mar 05, 2008 11:20:39 pm

These two chapters did a good job of giving very clear and concise explanations of all the factors that must be considered in conducting psychological experiments. I can see how these ideas can be transferred to many different areas of testing and evaluation, however their direct applicability to this class is slightly unclear to me -- I guess these are the factors that must be considered when doing user testing. Martin raised a number of points which I had never fully considered as being necessary to pay close attention to in testing users, particularly statistical regression and the problems with multiple dependent variables. I also liked the idea of randomization within constraints, which I can see as being very useful in doing user testing of software in different situations.

Paul Mans - Mar 05, 2008 11:10:12 pm

I enjoyed these readings because the author presented a useful list of factors to consider when trying to construct a well designed experiment. I think what made both chapters flow nicely and keep me engaged were the examples Martin used both from his own work and other experimenters as well. In particular I liked the examples Martin used because many of them (like the violent television causality one) were testing behaviors that don't obviously lend themselves to empirical study. Thus it was interesting reading about some of the creative solutions experimenters came up for measuring these behaviors. As some others already mentioned in comments, it was also nice to be reminded of the formalities of the scientific method--statistics and experimental findings are thrown around so frequently that it is easy to forget how much work goes into obtaining good results (good results meaning in this case results that are internally and externally valid).

Zhou Li - Mar 05, 2008 11:35:21 pm

This reading is more about controlled experiments in general rather than interface design specific tests. Basically, for the results of any experiment to be meaningful, the independent variables have to be well chosen and the dependent variables have to be carefully measured. Independent variables need to have to ranges that are broad enough to reflect changes in dependent variables yet still realistic enough to model the real world environment. While the independent variables might not obvious for some experiments, the dependent variables are even harder to decide and describe. Like the example in the reading, for an experiment meant to find out whether TV violence lead to violent behavior for children, different researchers might have different opinions about how to measure the results of the experiments. There are other types of variables involved in an experiment too. Some of them are kept constant and controlled, while other variables are either random or random with constrains. The more controlled circumstances in an experiment, the more accurate the results will be. However, the more control variables also make the experiment harder to be generalized. The most useful section I find in the reading is the one about confounding variables. These are the hidden variables that can affect the dependent variables, therefore, making it impossible to check whether the independent variables are causing any changes in the result.

Hsiu-Fan Wang - Mar 05, 2008 10:48:14 pm

The comics made me laugh aloud, which I found slightly worrying.

Having taken psych classes, most of this material was in a sort of "yes, I know that" vein, but like someone else mentioned earlier, it is nice to have more science in the computer science curriculum (as opposed to the normally software engineer fare). Early in chapter 7, Martin emphasizes the difficulty in defining something to measure for things like "happiness", and I think that this is particularly pertinent for our Android applications. I had a "clever" idea of adding help text that guided the user through some bits of the interface (basically when switching to edit mode certain info panes were replaced with "select something to edit"), which I think helped move users through a task that we had imagined presenting significant difficulties. Yet, these prompts take up valuable screen space, and have other trade offs (they obscure other info panes for example). In the context of which design is "better", it became obvious that users who had experience with the Android platform acted in substantially different ways than those had not (particularly in use of the menu/back/home keys). This goes back to issues of selecting representative samples of users, but also requires developers to make hard decisions about what kind of users they are targetting, and specific tasks that are seen as important, as often changes to make things easier (wizards for example) may make things substantially more frustrating for experienced users, and so forth.

Glen Wong - Mar 06, 2008 12:01:05 am

This reading was quite interesting. After reading the first couple of pages of Chapter 2, I thought this was just going to be a boring regurgitation of the scientific method. However, I couldn't be more wrong. A lot of material presented was very thought provoking because what he wrote made sense. In particular the example where he cites that a study reported there were more husband -> wife homicides than wife -> husband homicides but that this is also affected by the disproportionate amount of reported cases of these two types that made it to court was intriguing. I think I've read a lot of studies from statistics before that I find surprising and just taken them as truth, but after reading this article I'm not so sure. In fact, from this reading it seems doing any sort of experiments is not easy at all. So many things can affect the quality of your results that one begins to wonder if any experimental results we see tell the complete story.

Harendra Guturu - Mar 05, 2008 11:24:47 pm

I thought that chapter 2 did a very good job at covering the experimental method. I especially found the sections that discussed possible sources of error that would be a "Threat to Internal Validity" very concise and informative. I can see our group getting effected by some of the threats such as "Maturation" and "Testing". Maturation can been seen as the users getting used to the Android interface after the first try and when approached for second iteration of testing they may be more confident in dealing with the design problems even if they still exist. Testing can cause a problem since after being introduced to a mobile application they may try more of the mobile applications on their phone and no longer fit their demographic, in case they are being targeted as infrequency mobile application users. Chapter 7 is a good complement to chapter 2 since it clarifies how to manipulate the variables properly to perform the experiment.

Benjamin Sussman - Mar 06, 2008 12:16:24 am

I also really like chapter 2's thorough discussion of experimental method. In response to Glen's comment, I agree that maturation and testing are going to be two very important aspects of our experiment which cannot be ignored. I would argue, however, that if we were able to control the amount of learning and understanding the subject gains as the test goes forward it may be a very valuable tool, because realistically our users will have at least some experience with the Android UI when using it in the actual device. This is because they have no doubt made calls on the phone, browsed the internet and played with the varying types of inputs they have available on the device. If anything, we may want to "mature" our users to get them at a more accurate level of experience with Android before we ask to do the most important and (usually) more difficult tasks.

Brian Taylor - Mar 06, 2008 12:40:28 am

Wow, I haven't read about the scientific method since back in the day... 6th grade science fair season kind of times... But wow, there is a lot more in this reading than I remember from that old process we had to follow. The readings make it seem nearly impossible to get a useful result with all the likely confounding variables, and surely we cannot get very thorough data from simply interviewing a couple of 'users' about our projects. There are so many things beyond our simple interfaces that could influence our test users. Not only do our reactions, the way we speak, what we tell each user, etc. affect their response, but also just the differences amongst are subjects. Some will have the thought of midterms in the back of their head, while others may be suffering from the pangs of hunger after a long hard day. Hopefully, our test subjects will be mostly comfortable and we can attribute these differences to randomness that will allow us to generalize our results. Overall, though, the readings talked a lot about variables, knowing your confounding variables, choosing your variables, and making sure that they accurately measure what you are trying to measure. I found the example of tracing a star in the mirror a bit laughable when it mentioned that one could use the number of line crossings as an accurate measure of improvement. In the end, these readings can surely help us look deeper into the methods we use to test our interface with various users, and hopefully obtain a bit more information about the user (particularly, those pesky confounding variables).

Yunfei Zong - Mar 06, 2008 12:51:07 am

The reading was a canonical statistics 101 review, so I'm not going to bother commenting on the material.

Instead, I will continue my rant on the crappy format of the readings. First of all, why did the readings have to be 20MB each? They were only 10 pages each! Back in the day, a standard text document with a b&w image was about 200 KB. This comes out to about 2 megs per reading, not the ridiculous 20 that we are forced to download per reading. Some of us have other more important things to download, like music or games. This might just sound like I'm complaining, which I am, but I'm also going to propose a solution here: If it's legal, why not photoshop the readings so all we see are the relevant text and images? We don't really need to see the subtle shading of the pages to fully grasp the material being presented. The author's bio can also be shopped out; there's a reason why bios are at the end of the book or on the inside cover.

Also, why do the readings have to be sideways whenever I load them? We are not fax machines that can scan words in whichever direction the master feeds in the paper. Why can't it load correctly such that a normal person [without a head thats bent 90 degrees to the right] can appropriately read the document? The crappy adobe pdf reader plugin only rotates clockwise, which means I have to rotate each reading THREE times to fix this. Multiply this by the dozens of people who take this class and what you'll end up with is enough time wasted to prepare at least 5 sandwiches. That's enough lunch for a week.

Benjamin Lau - Mar 06, 2008 01:01:31 am

I thought that of all the readings so far these were among the most boring. Most of chapter 2 is basic stuff that has been long drilled into anyone who has taken an introductory statistics or psychology class (aka probably everyone in the class). There is some stuff rarely covered like threats to validity, concretely divided into types like mortality or maturation. Also mentions the Heisenberg-ish testing effect. But overall I didn't feel like I learned anything new. Same deal with Chapter 7, which talks about dependent variables. I honestly fell asleep at my keyboard twice. Come on guys, throw something harder and more interesting at us.

Jonathan Wu Liu - Mar 06, 2008 09:39:52 am

I appreciated the threats to internal validity. I liked the section on statistical regression but I think their example is a little flawed in demonstrating the effect, which I am glad is mentioned in the footnote. It seems to be a minor problem if we are given a large sample size, but it's important to mention regardless. It is a good thing to remember to not choose members specifically within a group without thinking about the potential error. I think the effect that we will have to worry about the most when giving tests to our users using our application is maturation. It seems like we will not be able to reuse subjects unless we totally revamp the entire user interface. Even then, if the subject knows of certain features to look out for or menu options that are available, the test will be skewed.

Mike Ross - Mar 06, 2008 11:09:45 am

That was a nice refresher on general principles of the scientific method and statistics. I'm not immediately sure how it ties in with what we're doing in our project, though. Given the limited amount of user time and users we have for this project, I don't think we'll be running in depth experiments to find causal relations between what makes an interface more efficient or not, it seems like the ten heuristics are meant to guide us towards that. I do think it was a useful as a reminder about limiting different types of bias, though. We would like to avoid selection bias in choosing candidates, and we might want to vary our testing environments so that our lofi test runs aren't biased because a room was too cold, dark, loud, or whatever.

Edward Chen - Mar 06, 2008 11:14:45 am

The first reading wasn't quite that useful as I think pretty much everyone in engineering has heard of the scientific method and know of it pretty well. It did use some terminology and explained it in a very awkward way that didn't make sense at first, but made a lot of sense once they started using examples to illustrate their point. Overall, the read was only interesting because of all the examples that Martin brings up, which really helped illustrate a lot of what I thought were obscure points. However, I still didn't feel like the reading related that much to interface testing. As we're not trying to study or prove anything, but to evaluate the design of the interface so things like random variables and coming up with a hypothesis really has no application to the user testing.

Katy Tsai - Mar 06, 2008 11:44:47 am

In general, I felt these two readings were pretty straightforward. I think we've all conducted experiements sometime in our lifetime and have gone through the process of identifying independent variables and dependent variables. However, what I thought the readings did well were to point out how to identify the external factors and confounding variables that often affect our experiments. It seems really challenging to achieve random selection and to avoid the effects of confounding variables, when there are a lot of external factors you can't control.

I also thought it was interesting when Martin mentioned that when you take the two extremes of a test group and retest them, they tend to converge towards the mean. This goes to show that the extremities exist only during rare conditions. Sometimes it could be the individual himself, but oftentimes it's just mere "luck". Also when Martin pointed out in the star tracing example how Participant 2's first trial and second trial both had the same number of boundaries crossed, it made me realize that what seemed like a pretty straightforward standard of measurement was flawed. It goes to show that the evaluation process of an experiment and having some sort of criterion for evaluating results can be as difficult as strategizing the actual experiment itself.

Cole Lodge - Mar 06, 2008 11:58:49 am

In most of my lower division computer science classes, I continually heard computer science compared to an art. This was directly in contradiction to the very name of the major: computer science. In science there are results that can be measured and quantified; the correctness of a solution can be measured. I was happy to finally, after three years here at Berkeley, see science applied to CS. I am not just talking about quantifying results, we have all found the average seek time on a disk or the number of disk access's needed for a particular sort method. I am talking about the scientific method, and experimentation. We have all seen it in physics and chemistry; it is nice to see it applied to computer science. If you are not sure what interface design would be better, use science.

Ravi Dharawat - Mar 06, 2008 12:19:10 pm

The first reading was a nice refresher. It reminded me of a few subtleties I would not have remembered otherwise, statistical regression being the most subtle of those. The second article was a good followup on the first. and I commend whoever chose these articles for dipping into a psychology textbook. Many of the same problems that arise in creating experiments for psychology also arise in developing user-interfaces: especially those relating to the degree of subjectivity when considering definitions.

Brian Trong Tran - Mar 06, 2008 12:27:56 pm

I would say that most of this information is not new to the majority of the students. In science projects and statistics classes, we learned all about variables and how they could influence results. It is however key to keep these ideas in mind to make sure that we don't have a skewed representation of user surveys. Such misrepresentations could lead to very poor design decisions or making designers think that they are done. Neither of which is a good idea. I think one of the most important parts is randomizing the variables just because not all people act the same in a certain controlled environment.

Hannah Hu - Mar 06, 2008 12:48:12 pm

It seems to me that a good number of people found the readings dry, irrelevant, and/or transparent. However, even though most of the concepts covered were pretty transparent, I found them quite useful, as they were a review of both experimental design and statistics, and served as a refresher. You might say, "Okay, I knew this since 5th grade" or "How does this apply to user interface design?". In response to both:

1) I can't count the number of times that someone said they know some procedure and yet neglect to either use it or misuse it. The neglect comes from not being reminded of the procedure or thinking it irrelevant.

2) It doesn't apply so much to actual designing of the interface, but in testing it, the information is invaluable. The fact that we have to consider control variables and independent variables can make a huge difference between getting accurate evaluations and getting impractical ones. You could, for example, make a great interface that is intuitive to use, visually appealing, and practical, but if the user can see nothing but a washed-out screen in broad daylight because of strange colors, that will constitute a weakness.

Raymond Planthold - Mar 06, 2008 12:48:45 pm

Both chapters were interesting and readable this time. I'm not sure of the best way to put user testing in that framework though, partly since in the types of testing we're apt to perform, the user is not being evaluated but rather the interface. I especially don't see how to adapt the idea of control variables.

Other parts were quite useful though. The section about confounding variables seemed important. Since user tests are on the small side, it's probably crucial to be aware of how external factors can contribute to the results you see.

Alex Choy - Mar 06, 2008 01:05:59 pm

These chapters presented many definitions used in experiments. They were made clearer with the examples given. I liked the discussion on reliability and validity. I find that many surveys that I have taken perform test-retest reliability because they repeat some questions on page 15 that were on page 3. They also like to change the wording of the questions as done in the alternative-form method. I can also see how it is harder to test for validity than reliability since repeatedly getting similar results is easier than knowing whether we are measuring what we want to measure. In addition, I thought that the example about single dependent variables was good. Martin's single dependent variable on border crosses seemed like a good measure, but ended up being a bad one.

Lita Cho - Mar 06, 2008 12:48:30 pm

Being a Psychology minor, I love doing field studies and using the experimental methods to gain a better understanding of human behavior and society. I don't agree that most of the reading is what students already knew through statistics. Different factors are need to be considered with conducting research for trends for psychology/sociology. Doing behavioral and physiological measures with various methods are important to conducting a study. I can clearly see how we can apply this information to our user interviews. Considering the user's behavior and various factors is very something to consider when trying to form statements about our interface. Maybe a group of college students find touch screen better rather than the d-pad on their phone. But later you find out they all have iPhones.

I also found a lot of the case studies really interesting, especially about the finding out if violent TV shows raise aggression in children. Doing an initial aggression test is very important before conducting the study. I think when testing our users, we need to do an initial test of their behavior such as tech savvyness, and their patience. If a UI design is bad, some users will do everything they can to figure it out, while others will just give up. I think it is important to know this while conducting our interviews.

Max Preston - Mar 06, 2008 01:02:09 pm

I thought that these articles were pretty trivial, and that they aren't very applicable to what we're doing, especially since we aren't testing our interface on large groups of people. Really, I don't think these articles are really useful unless you have to be extremely scientific and methodical for the purpose of getting your results published in some scientific journal. Otherwise, I really don't see the point. For refining an interface, I think that simply letting people use it and observing/questioning them is all that is really necessary. If we had a large budget that we were trying to conserve or if we had a large selection of applicants to pick a diverse group from, the information in these articles could be useful, but in our circumstances, there's only so much we can do.

Jeffrey Wang - Mar 06, 2008 01:20:54 pm

The articles were pretty straight-forward and easy to understand. I felt like these procedures were more of "traditional" experiment, when compared to those described in the previous reading. The article begins by explaining the variables: controlled, random, independent, dependent and confounding. Also, the author explains the problems of internal validity: history, maturation, selection, testing, statistical regression, and interactions with selections. For far, it seems like our project is large enough to apply all aspects of the experiment. However, one thing that is important to keep in mind is to select users that are diverse and random. It is important to fully cover the spectrum in your target group. Also, it is important to not discriminate based on people's history.

Jeremy Syn - Mar 06, 2008 01:04:28 pm

The material in this reading wasn't completely new to me. I have encountered the experimental method in past science courses and especially in psychology courses. One of the elements of the experimental method that I didn't quite know about was the confounding variable. I thought they brought up an interesting example with the Coca-Cola and Pepsi-Cola example, of how the participant's answers can vary so much not depending on the product but rather sublimal differences such as the letters on the cups. I also thought it was interesting how Martin directly communicates to the reader by saying things like, people have made randomization mistakes but the you should never make mistakes like them.

Johnny Tran - Mar 06, 2008 12:26:18 pm

The experimental methods introduced by the readings seem like valuable tools for evaluating user interface effectiveness. While most of it is review (and hopefully should be review for scientists and engineers), I am reminded of how difficult it is to construct a proper experiment, and how many potential pitfalls such as confounding variables there are.

One thing that I learned was the very subtle phenomenon of statistical regression. While many experiments or projects are aimed at specific audiences, actually finding those audiences looks to be tricky. If I wanted to only admit tech-savvy people into my testing group, for example, it seems that simply making a test for tech-savvyness would open the door to statistical regression. In these cases, it seems prudent to design tests with lower statistical variance.

Daniel Gallagher - Mar 06, 2008 01:37:21 pm

Martin's writing is easy to read and understand and had a couple nuggets of interest to me. I took away from this the stories of successful/unsuccessful experiments that he described because they were either surprising or funny. The pepsi-coke experiment was a good warning to be cautious of confounding variables, but in the future if I designed an experiment I would remember it and be more cautious in general even without remembering a single vocab word from the reading. Similarly (from ch. 7) the description of the experiment to find out if people were more likely to murder unrelated housemates than blood relations was interesting because of the possible pitfalls in thinking that Martin pointed out. Something like wives getting away without trial more often than husbands I never would have considered before, but it's something I'll probably remember and act as a check on my thinking in experiments simply because of how sensational it is.

William tseng - Mar 06, 2008 01:02:30 pm

The second reading brings up a good point in that the process of "experimenting" itself should also be an iterative process. The ideas of a pilot study to work out bugs in the experiment is particularly a good point. Our group has done one of our low fidelity tests but from our debrief afterwards we realized there were a lot more specific questions things we wanted to measure that we hadn't thought of before the first interview. If we had a set of interviewees to try out our questions first it would likely lead to the second set of interviews to be a lot more productive and insightful.

EricChung - Mar 06, 2008 01:57:20 pm

Some of this stuff we learned back in high school and middle school but there is one this I didn't learn way back when that bothered me until now and that's the concept of external validity. I'm glad to know that there is a term for it and that people have thought about it. A lot of this reading is stuff that we should know but haven't formally thought about, like how humans are bad at random sampling and such. I also didn't realize what a problem confounding variables were (to the point of making the experiment worthless). I also didn't realize there were so many harmful confounding variables. The second article is similar except there is a lot less review from high school and middle school but it is stuff that we already sort of know but haven't formally thought about. Choosing the range of variables is something we do already (although maybe without as much careful thought as the article says). All of it is good to know, though.

Brandon Lewis - Mar 06, 2008 01:52:08 pm

The first reading explains some of the problems involved in conducting psychological experiments. I think it's informative, and useful information to consider. It also helps me view psychology with a little less skepticism, since there are ways to help ensure validity within a psychological study. I had a hard time reading the second page because acrobat has a poor user interface for rotating images. The provided command doesn't seem to work consistently. After 10 minutes of reading sideways, I gave up. Give me evince or xpdf, any day.

Randy Pang - Mar 06, 2008 01:41:24 pm

Although I agree that most people probably already knew this information, I think the underlying ideas are really important (despite the fact that I didn't really like plowing through the reading). In particular, I don't think memorizing all these terms and their uses the most important thing (though I assume it will be for our midterm), but rather, I feel that in general, people don't think enough. With anything in life, there's an uncountable number of factors that go into determining it, it's result, and how that result is evaluated. Of course, these factors are not all equal, some are clearly more important then others. For example, in a presedential primary, people minds immediately wander to the most important factors, such as which canidate won, what was the margin of victory, etc. But if they dig further, they might find interesting things in the geographical distribution of the votes. And if they dig even further, they might find some interesting reasons as to why those distributions are so, connections with local spending, figures, or groups. The point is, you can always think of more things, more factors, but you don't always, and rightfully so. You don't have enough time to consider everything, but in my opinion, most people should think just a bit more and figure out what things actually mean (instead of just believing what the surface rhetoric or your initial conclusion tells you).

Jiahan Jiang - Mar 06, 2008 02:14:49 pm

The readings for this lecture have some interesting points; a lot of the concepts are not new, but it is nice to see them in an organized fashion (such as the different kinds of variables). I really enjoyed the discussion on threats to internal validity; a lot of issues such as historical influences are some of the factors I have never considered; but I can see how they have a lot of influence over the outcome of the evaluations.

Bruno Mehech - Mar 06, 2008 02:20:19 pm

As many people have commented before this reading is extremely basic. All of it is stuff we already know or at least have heard somewhere else before and are well aware of. The first reading was a lot worse than the second one in terms of stating the obvious and other than being a basic review was pretty useless. The second reading presented some more interesting ideas though most of it was also really basic and didn't seem to be very relevant to the kind of experiments one would do for a user interface. One of the few god things that these readings talk about is the importance of choosing what you are measuring and how you are going to measure it. It has some good examples of things that might be hard to measure and define, but the solutions that the author proposes don't seem very plausible.

Diane Ko - Mar 06, 2008 02:24:20 pm

While the articles were very informative on psychological study approaches and goals, I wonder how relevant it is to our current testing methods. I can see where it would be relevant if we were to extend these experiments to much larger groups with much larger subject pools, but for our current purposes it is somewhat superfluous. Even if there were a larger subject pool, I still feel like many of these psychological study principles are beyond the scope of testing interfaces. Specifically, it's hard to really call one particular group a control group or a test group with interface testing. Does that mean that the people in the control group don't do anything with the program? I find also that with many of the psychological studies that I've taken part in for the past 3 semesters, there is a general trend that the psychologists are looking for. Perhaps there is some sort of interface action trend that we're looking for when experimenting with interfaces, but I feel like the application of it is very different. Rather than trying to prove some sort of hypothesis, our goal instead is to try to determine whether or not our interface is effective.

Tam La - Mar 06, 2008 02:29:03 pm

This was a good overview of the external and internal threats to an experiment's validity, and I found the examples about Coke/Pepsi and left-handers to be very useful at illustrating the author's points. These examples stressed the weakness of the experimental method when it comes to making causal statements - they can be wrong if the wrong assumptions are made. As we conduct our pilot usability tests, we will have to watch out for making wrong assumptions and thus wrong recommendations for our interface designs. For instance, it will be easy to see if some aspect of the design does not work, but it will be harder to identify the reason why.

I would like to see more specific examples about how these different types of variables and validity apply to usability tests. As Bowen mentioned, how much of an experiment's findings are determined by the fact that there is a learning curve? For the tracing stars experiment, there was a marked improvement with repetition of the experiment. Perhaps this is why we are supposed to find new users for each stage of our design process, as this will hopefully give us a better idea of how first-time users might view our project. On the other hand, people are bound to bring in design metaphors (particularly for Anoto-based systems) and so we may never quite be able to eliminate the learning curve.

Robert Glickman - Mar 06, 2008 02:28:07 pm

While I would typically find a reading like this interesting, I actually thought the information presented was quite elementary. The information was essentially a review of the numerous psychology, statistics, and biology courses covering this subject. Also, I thought that the readings lacked strong relevance to the course (aside from some general experimental techniques) since currently, this course is focusing on user study, rather than clinical trials -- related on a high level, but lacking a close relationship on a deeper level.

Kai Man Jim - Mar 06, 2008 02:38:45 pm

The readings are good to read. Actually the terms of independent variable and control variables are what we have learned in high school during some physic classes, so it makes me easy to apply them to this reading. However, the result from the reading turns out to be more statistic than computer science (UI), so I doubt how strong is the relation between the readings with the experiment that we are going to do for this class.

Joe Cancilla - Mar 06, 2008 02:37:05 pm

I thought this article was interesting, but am skeptical about the ability of researchers to comprehensively eliminate or minimize confounding variables with regards to human research. I think the example of the lifespan of right-handers vs. left-handers is a good example of this. A cultural bias towards left-handedness skewed the data, making it impossible to tell if there is any relation between life-span and handedness. There are just so many variables in quantitative human experimentation that a cost-benefit analysis of the value of the research would lead one to want to do other forms of usability testing.

Pavel Borokhov - Mar 06, 2008 02:28:59 pm

In the first reading, I found it interesting to see very specific and quantitative methods of statics and experimentation demonstrated, an important factor in evaluating user design tests and a good reminder of the fact that what we're doing should be scientific and not just based on whim. I also found interesting the mention of the study on left-handedness; as a left-handed person myself, I have a very clear personal interest in finding out the real implications of such a study. A quick search on Wikipedia turned up some scientific papers refuting the original claim. In Brazil, for example, it was found that left-handed people actually had a slight longevity advantage (albeit a statistically insignificant one), while a study on macaques found that while the use of the left hand declined in the group with age but this was entirely behavioral and left-handed juveniles were not any more or less likely to live longer.

As for both readings, it seemed to touch on experimentation concepts that are very general. While certainly useful reading and a good refresher for anyone who hasn't done a well-structured and controlled experiment in some time, it would have been better if they related more directly to the topics that were concerned with in user interface design.

Timothy Edgar - Mar 06, 2008 02:37:37 pm

The articles seem quite standard. We learned about the scientific method back in middle school, however it seemed to cover a bit more detail such as confounding and random variables with constraints. The second article seemed to be a lot more interesting (well the comics were by far the most interesting aspect as I did laugh too) in deciding a balance between independent variables to be able to differentiate conclusions from the various factors. It seems everything is a trade off in reliability, cost, contamination, detail and aspects. It brings up a lot of points relevant to our analysis for the lo-fi as in making sure we keep things as controlled as possible to make sure we are making the right conclusions about our interface.

Yang Wang - Mar 06, 2008 02:49:55 pm

Well, I have seen some comments saying this reading is not closely related to our topic and should more directly reference to user interface design. In fact, this reading is not about user interface at all; the title reads how to conduct psychological experiment. I guess that says a lot. This reading is not meant to be closely relate to user interface design. It gives some general guild lines for scientific experimental practice. It is understandable for psychological study, as experimental psychology is a very important devision of psychological study. While this is not directly related to our course material, reading this can be a helpful reminder of how we should conduct our future readings. Though honestly, I have heard enough of this from my chemistry roommate, but anyway.

Adam Singer - Mar 06, 2008 02:51:28 pm

I don't know about everyone else, but these chapters gave me a good refresher on good scientific practices for experiments. I don't think I've conducted a 'serious' experiment since high school and it was definitely a good read. Martin is careful to distinguish between many different kinds of variables that may be present in an experiment. It was interesting to read about how some serious research studies were ruined by confounding variables or an unrealistic choice of independent variable range. With the sheer volume of factors that can threaten internal validity, it's amazing that any experiments can be applicably translated to the real world. With these factors in mind as we test our user interfaces, it is important that we control our variables well enough so we don't get bogus results if our test users are exposed to unrealistic environments. If we aren't careful to keep variables controlled, we may find that the results from our experiments to be inconclusive. If this happens, our finished application's user interface may not be as polished as it could be.

Zhihui Zhang - Mar 06, 2008 02:36:52 pm

Most of the material presented seem to match up with those presented in an intro statistics course. I am curious as to how some of what's been mentioned can be applied to what we're doing now. The statistical methods seem to be more geared towards a high number of participants whereas with our user trials, we only have 2-4 users to work with. For example, we don't really have control variables that we vary...

Reid Hironaga - Mar 06, 2008 02:50:10 pm

The readings from Martin are good guidelines for research parameters and specific segments of the iterative design process, including testing with users. By raising the issue of bias in people with respect to their involvement in the design of a product and its development in reaction to perceived problems brings focus to how people may be overprotective of their work in such a way as to focus on certain problems rather than seek out every possible solvable problem. The research side of the reading had basic concepts of the scientific method and how the parameters such as range of variables and specific definitions should be applied in order to avoid confusion and too much variation in personal interpretation of elements of the research. For example, qualitative measures should be especially focused on as a potential inconsistent meter of any particular element, since it can be so greatly based on personal experiences.

Bo Niu - Mar 06, 2008 03:00:29 pm

Today's reading reminds me of the experiment methods used in psychology studies as i was taking several psychology courses. Even though the previous readings were theoretical, at least those are based strictly on user interface designs. But the readings are getting more and more general, and i really don't think these things should be emphasized in this class. One might argue that to make a good user interface, the developer must have these back ground theoretical ideas in mind. It's true that these "common sense" is important but i just don't think it should be the focus of the course, at least not to this extend. It's just like, to be a good programmer you must be a good person to start with, but you wouldn't expect to learn how to act as a good person in a cs course...

Roseanne Wincek - Mar 06, 2008 02:59:41 pm

I thought these readings were clear and concise, yet intuitive for anyone who has taken any kind of intro psychology class, or anyone who has ever done any science research. The author makes clear that experiments become more difficult and harder to randomize when dealing with human subjects. Thinking about factors such as history and external validity are important even when designing our own studies, like the contextual inquiries and the lo-fi prototype project. The goal of achieving a random sample of the population is not realistic for the exercises in this class (3 is too small a group to be statistically significant, anyways). However, it's still important to be aware possible biases within the test group and to minimize them whenever possible.

David Jacobs - Mar 06, 2008 03:10:04 pm

I found Martin's discussion of control variables surprisingly interesting (given the potentially dry subject matter). His argument for leaving factors as random variables holds up very well against the decade or so of schooling that has taught me to believe otherwise. I guess the part that makes me happy is the idea that your model is always an approximation. If you want a model that takes temperature, subject age, rotational velocity of a pulsar across the galaxy, etc. into account, then the only way to create such a model is the leave them as independent variables. For independent variables that cannot be set (or are assumed to be relatively unimportant), we can simply take a bunch of samples and get a model that works well for "most" sets of circumstances.

Michael So - Mar 06, 2008 03:10:14 pm

I found the readings to be sort of nostalgic because the topics independent variable, dependent variable, control variable, and so on were things I have learned before. Any ways, I liked reading about the different threats to internal validity that would confound your results. I particularly liked the example of the left-hand and right-hand experiment because I am left-handed myself and I found the results of that test to be grim. But then that conclusion is disputable because of interaction with selection and history of the stigma attached to left-handed people and how left-handers were forced to be right-handers. But nowadays, we left-handers are apparently more accepted.

In the second reading, I don't think I got a clear idea about the different kinds of validity like content validity and predictive validity. I still am unclear on the meanings of those terms. The stuff about specifying an operational definition when choosing independent variable(s) and dependent variable(s) was a good point to bring up because just saying you are going to measure "aggressiveness" or "violence" is too subjective and an experimenter should realize that.

Daniel Markovich - Mar 06, 2008 03:07:33 pm

In Martin's How To Do Experiments he gives an overview of different types of variables and the role that they play determining the results of a scientific experiment. I was slightly discouraged after starting the reading as I thought it had no relevant information that would help us in our Android prototype testing. But in the latter portion of the reading he gives solid examples of how different types of variables threaten internal validity, and I feel that thinking about these types of issues will definitely help us during our initial testing. I felt the most important concept that we must adhere to during our testing is to try not to control the situation too much, as it may bias our results and be too specific to that particular situation. This would not be beneficial as it will not generalize over all target users in different environments.

Henry Su - Mar 06, 2008 03:22:42 pm

Martin's chapter 2 closely paralleled what I learned about experiments in an introductory psychology class. I like Martin's organization better, however, as it consolidates all of the different variables into a chart at the end, with "circumstances" on one side, and "behaviors" on the other.

Martin's chapter 7 presents some interesting information. The importance of an "operational definition" can be seen when, say, you are testing how much potential users like your design, based on a user test. Would the number of times they say "cool" be the benchmark? How about the the speed they are able to go through your tasks? Choosing one or the other may give quite different results, but either is an operational definition, and can be quantitatively measured. Pilot experiments also seem very important. The idea is similar to how we are all advised to run through a mock user test before testing on real users. This way, we can see if our experiment is realistic, and avoid wasting real users' precious time. The idea about multiple or composite dependent variables seems like a good one. Going back to user interface design, it would probably not be a good idea to rate our design solely based on the number of times the user said "cool". The amount of time he/she gets stuck, for example, might be important as well. By using multiple dependent variables, we can make more robust conclusions.

In general, I think that these two chapters, although seeming to come from a psychology textbook, have the potential to help us conduct user studies more effectively. In particular, it tells us how to set up the experiment correctly to yield useful results (in terms of setting up control and independent variables), and what to watch for and record when the user is testing the interface (dependent variables).

Ilya Landa - Mar 06, 2008 03:14:13 pm

Finally, an interesting and applied reading. The highlight - lab mice geting an equivalent of a truck load of Marijuana per day. They probably turned into a jelly by the end of the experiment. The readings provided clear and practical instructions on how to set up variables in experiments. Even though I already knew about most of it, these articles can be used as directions for setting up an exeriment. They also emphasized an important point on not overloading the control variables. Las semester I was participating in an experiment that attempted to control movements of bugs with light. After several designs failed to affect the bugs, it was discowered that bugs always turn away from a full-size lightbulb 5cm away. Upon seeing this, one of the experiment designers commented: "I don't think we can write a paper on controlling bugs through sticking burning lightbulbs into their faces."

Scott Crawford - Mar 06, 2008 03:06:58 pm

"If you happen to be a computer buff, you can use the computer to generate random numbers or event." I just found this amusing (anyone in the class falls into the computer buff class by this definition), so I thought I'd site it. I liked the section about threats to the internal validity of an experiment's results. In general, it's anything that causes the testing group to be non-representative of the actual user group, whether that be from a predisposition in the testing group, or from some element of change during the testing process. In the other article, I definitely like the idea of a pilot experiment, because actually running a test, even if you're the one who created it, is much different than the actual test creation process, and so can yield quite promising insights into both the validity/efficiency of the test and improvement thereof.

Jeff Bowman - Mar 06, 2008 03:21:37 pm

Martin's description of variables and control is pretty exhaustive, and definitely good to know. However, I think it's of varying use to user interface designers: While some user interface designers will go out and recruit test participants themselves, many more will rely on secretaries and assistants to do the recruiting. I suppose it is in the interest of the user interface professional to know these things, teach their assistants about them, and to acknowledge biases that could arise out of the testing protocol.

It's a shame we don't have an opportunity to use this kind of specific research in our own projects, at least not to the degree he describes.

Siyu Song - Mar 06, 2008 03:28:24 pm

This reading was good review of when took Pysch2. I thought the Coke/Pepsi letter preference experiment was really interesting. I thought the results were really surprising and it was a good introduction for the types of errors to look out for. Things to watch out for like statistical regression and mortality are interesting things to keep in mind but I thought they left out things like experimenter bias, where the test subjects inadvertently give responses that they think the experimenter wants even if they do not know the experimenter, where the experimenters needs to watch out for things they are doing.

Andrew Wan - Mar 06, 2008 03:34:14 pm

The reading was useful in outlining important statistical considerations in testing prototypes. The process of selecting testers, and identifying dependent/independent variables helps the experimenter define tests. Other considerations, like threats to internal validity, help determine the accuracy of testing. Mapping the behaviors of subjects is clearly as (or more) important than simply taking their word on face value.



[add comment]
Personal tools