Quantitative Evaluation
From CS160 User Interfaces Fa06
Lecture on Nov 6, 2006
Readings
- How To Do Experiments. Doing Psychology Experiments. Chap 2. Marin.
- How to Decide Which Variables To Manipulate and Measure. Doing Psychology Experiments. Chap 7. Marin.
Ramy Ghabrial - Nov 04, 2006 10:47:46 am
How to do experiments: For the most part, this is a good refresher on middle school scientific theory couched in slightly more elaborate language. I imagine it is good to keep in mind when doing user testing. It is important not to bias the observed experimental results we take from our users (internal validity). It is also important to make sure the users we pick can be generalized to represent all our possible users (external validity). In the case of our project, dependent variables would have to do with ease of use, efficiency and time taken to complete tasks, while independent variables would have to do with aspects of our interface.
How to decide which variables to manipulate and measure: Reading this (particularly the notes on choosing the right independent variables), after commenting on the last reading, reminded me that my selection of independent variables is not entirely accurate. This is because our independent variables do not have two or more values; instead, we are testing users on just one interface configuration. We could perhaps rectify this by performing user testing before and after changing parts of our interface and determining which variations produced better results; however, this would lead to confounding variables such as practice. I think we need a lot more users and testing procedures than we are actually employing in order to make use of these readings (for example, to make use of the reliability and validity tests). However, the importance of properly selecting dependent variables is a good thing to take away from this reading.
Jason Shangkuan - Nov 04, 2006 02:12:09 pm
How To Do Experiments:
This article helps with us to approach the next steps after our hi-fi prototyping because with the hi-fi prototype, we have to see if our results from the lo-fi are directly reflected in our design. I think there are two main points that really have to addressed when we do experimenting, which are controlling the random variables and internal validity. For random variables, the audience, setting, and materials all have to be properly understood and predict the potential interactions. The more variables we have when trying to analyze data, the more difficult it is to understand the sources of error or even success. For internal validity, age range and user backgrounds have to be considered when trying to understand the correctness of data. Internal validity and random variables seem to relate in terms of what each does to affect results.
How To Decide Which Variables to Manipulate and Measure:
This article follows logically from "How To Do Experiments" because in the previous article it mentions how different variables affect results, and my concern from the article was how to control them. The independent variable for example in general may be the ease of use of the pen. For example the improportionate size to weight ratio of the Anoto pen may be something that can be controlled. Typically a pen of the anoto size should be heavier, which can distinguish it as being high quality. Otherwise it can be confusing as to why it is so large but so light. In order to establish whether this is an issue, this can be discussed or monitored during a pilot experiment. This way this issue can be changed with weights during the final experiment.
Andrew Hao - Nov 04, 2006 03:29:19 pm
Experiments
HCI experiments do not seem to be the same deductive-type experiments that this article was addressing. Certainly, we controlled certain variables (i.e. we kept the introduction and testing procedure the same throughout each trial. However, we had no quantitative measurements to perform statistical analysis with. Rather, our methods [on Lo-Fi] seemed to reflect an open-ended ethnographic approach, allowing us to discover new things about our project that quantifiable benchmarks would not otherwise have.
Choosing Variables
So to sum it up, this article describes the types of variables that come up in a study and how to control them so that we may come up with a scientifically valid study.
By this point, I'm somewhat confused. I'm not very certain why we're studying these sort of experimental measures. How will we set up this scientifically standard study with measureable results? I'm guessing that this comes into our Usability Studies where we'll have to somehow quantify the improvements made in our interface design to the imaginary marketing department.
Bowen Li - Nov 04, 2006 09:45:50 pm
Ch2: I find the point about the Coke and Pepsi experiment. It just goes to show that a lot of times when experiments happen, we can't be sure about the results, especially when judging people's reations are involved. I think this should be reflected in our user tests. We shouldn't assume the results of tests are necessarily due to the causes we think of. Overall, I think the paper was a good review of experiments and statistical analysis.
Ch7: I think it's interesting how there seem to be conflicting interests with regard to the design cycle/iterations and the fact that in doing experiments people have a learning curve. It will be difficult to use the same people in tests because they have seen the interface before, so quantifying their response will be non-trivial.
Tak Wong - Nov 04, 2006 10:32:40 pm
How to Do Experiments: This sounds like the first part of a statistics class. Before doing an experiment, we should find out what we are trying to test/prove, what outside factors can affect the result, and how to identify and minimize those factors. The new concept in this article is the internal validity, including history, maturation, selection, mortality, testing, and statistical regression. These concepts seem to be very helpful in identifying problems to a study in a large project. However, since our project involves a small and most likely different groups of people, I doubt there is much effect between user groups.
How to decide which variable to manipulate and measure: This articles sounds somewhat like an overall summary of this class so far. The choosing the independant variables part sounds like choosing a focus that is neither too narrow nor too broad, focus on something one can actually prove, and do a test run. And then the article goes on to talk about the reliability of the experiment and it sounds like the master-apprentice model and its alternative models. The last part of this article talks about measuring the results by IQ or some numeric number, much like our cognition and GOMS model. The only difference is applying the same theories to psychology instead of cognitive science.
Jonathan Yen - Nov 04, 2006 11:06:04 pm
How to Do Experiments: Having taken a few psychology and statistics courses, there doesn't seem to be anything significantly new in this chapter. Though this stuff is familiar, I don't intend to say that designing experiments is simple. I can definitely see where this chapter is heading in terms of how we would be doing quantitative evaluation, and I think that coming up with an appropriate experiment requires a fair amount of creativity and thought.
How to Decide Which Variables to Manipulate and Measure: The section on reliability and validity appears to be pretty important and very useful in terms of deciding how to set up an experiment and interpret it. From past experience, coming up with independent variables and dependent variables isn't really a trivial task. Overall, I think this chapter provides some good rules of thumb for deciding which variables to change.
David Hoffman - Nov 04, 2006 11:24:32 pm
How to do experiments: This is fairly basic discussion of how to conduct an experiment using good science. Probably the most important aspect of the chapter was the discussion of external validity. This means that the resulsts can be applied to some problem outside the lab. It is also one of the most important aspects of the experiment. The chapter also went over the use of controled variables in an experiment. This is important and fundamental to a good experiment. This type of discussion of experimental procedure is useful in theory however, it is often very difficult to remove all confounding variables. In the absence of being able to eliminate a confounding variable, it should at least be looked at in isolation to see what type of an influence it could be responsible for.
How to decide which variables to manipulate and measure: This chapter covers how to define a variable to to know whether or not it should be controlled or tested. They talk about the value of a pilot study to determine how to set up the experiment in such a way that you identify early on potential problems with the experimental setup and also find out what kind of theory you want to have for your results. This is important because a bunch of variables do not mean that you understand anything better. These variables have to answer some question instead of just being found. They also must be in units which other people can appreciate.
Maksim Lirov - Nov 05, 2006 12:43:55 am
How To Do Experiments: This chapter was a very good overview of performing statistics and experiments. One thing I got from this reading is that we need to carefully choose the user testers for the next assignment. If we are not careful, then then some variables of the group might be biased to a certain side and thus it would be harder to establish that the results of the testing pertain to all members of our target user group. It might be a good idea for us to try to randomize some of the variables of the user testing - the room where the testing took place, the time of day, etc.... The Pepsi vs Coca Cola experiment was the most interesting example for me in this chapter, because the results were related to an overlooked detail - that people preferred the letter M to the letter Q - and thus the experiment didn't really prove that Coke drinkers preferred Pepsi.
How To Decide Which Variables To Manipulate and Measure: This chapter is a good introduction to the area of choosing variables to control and/or measure. It is important to choose levels of independent variables that will provide meaningful resulting ranges. I thought the sections on dependent variables were helpful in this chapter. I think this knowledge will help in the upcoming Pilot Usability Study assignment where we will be able to better intepret the results of task completion time, number of task errors, etc.....
Andrew Tran - Nov 05, 2006 02:33:06 am
How to do experiments: Who ever thought randomization was a big deal for experiments. I know there sometimes interviewees are bias, but i would not have thought there were other factors that can cause them to be bias. From the children and violent tv example, i would never have guessed the time of day or social events that occurred during that day or week would have influenced the childrens. I agree with Martin in how not all variables should be controlled, or else the results are too narrow and specific and cannot be generalized. The pepsi coke test was quite interesting because it shows how confounding variables can affect the outcome. I like how Martin gives good examples in all the sections talking about treats to internal validity. Those examples made it much clearer to understand the treats and why.
How to decide which variables to manipulate and measure: Why does this chapter make choosing an independent or dependent variable seem that hard. I don't think it should be that technical to have to define an operational definition of the variables. For example with the children's aggressiveness experiment, i believe the majority of the people already generalize what agressiveness means. I am not disagreeing with what Martin has to say about how we really need to clearly define our independent and dependent variables, i am saying i don't believe it should be that technical. Therefore i believe the reliability and validity is very important in experiments because you never know the outcome. I like the three methods of determining test reliability, i would never have guess there are three ways or more for testing reliability.
Tabassum Khan - Nov 05, 2006 08:29:38 am
How To Do Experiments: To illustrate the types of variables that go into an experiment, Martin described a very interesting study that he and his colleagues conducted. The goal of the experiment was to measure the impact of instructor's lecture pace on the attentiveness of the students. Martin mentions that to get a reliable, quantitative measure of student's attentiveness the background noise level in the room was recorded and it was inferred that when students were quietest, they were most attentive. This is where i disagree with Martin, I believe that students can be quite because of several other reasons such as sleeping in class, surfing the net on their laptops, or working on a homework for some other class. These would result in the students being quite but not attentive to the lecture.
How To Decide Which Variables to Manipulate and Measure: I believe that the selection of dependent and independent variable, such as the degree of violence in children's behavior, is also highly influenced by the cultural background of the panel and the experimenter. In some cultures even calling names is considered to be violent and inappropriate behavior while in others children freely play video games that are based on the concept of fighting, killing, shooting, etc. and in some the control is in the form of a gun to provide a feeling of real fight to the player.
Kang Chen - Nov 05, 2006 10:19:51 am
How to do experiments
This chapter included many of the concepts that are taught in stat courses. As mentioned in the chapter, there are many ways to bias the findings of our experiment. Each step in the experiment must be carried out with caution. For instance, in order to isolate the casual relationship between the independent variable and dependent variables, we must carefully set the control variables and random variables. However, if we are too careful, the results of the experiment will not generalize well with the population we intend to represent. Even if the balance of constraints are set appropriately, there's also the randomness of the sample to take into account. The list goes on and on with confounding variables and other elements of designing experiments.
How to decide which variables to manipulate and measure
This chapter provided a wealth of information on the importance of reliability and validity of experiment measures. It's very crucial that the test provides reliable measurement of the dependent variables or it will be difficult to come to conclusion the magnitude of the effects. At the same time, it's worthless if the test is invalid despite producing reliable and repeated results. I particularly liked the ruler measurement example Martin used in this chapter where the ruler yielded the length of an object in "inches" but an "inch" on this ruler is actually twice as long as the standard inch.
Patti Bao - Nov 05, 2006 11:48:28 am
How to do experiments: This was a good overview of the external and internal threats to an experiment's validity, and I found the examples about Coke/Pepsi and left-handers to be very useful at illustrating the author's points. These examples stressed the weakness of the experimental method when it comes to making causal statements - they can be wrong if the wrong assumptions are made. As we conduct our pilot usability tests, we will have to watch out for making wrong assumptions and thus wrong recommendations for our interface designs. For instance, it will be easy to see if some aspect of the design does not work, but it will be harder to identify the reason why.
How to decide which variables to manipulate and measure: I would like to see more specific examples about how these different types of variables and validity apply to usability tests. As Bowen mentioned, how much of an experiment's findings are determined by the fact that there is a learning curve? For the tracing stars experiment, there was a marked improvement with repetition of the experiment. Perhaps this is why we are supposed to find new users for each stage of our design process, as this will hopefully give us a better idea of how first-time users might view our project. On the other hand, people are bound to bring in design metaphors (particularly for Anoto-based systems) and so we may never quite be able to eliminate the learning curve.
Simon Tan - Nov 05, 2006 06:49:06 pm
Martin-Chapter 2:
"Correlation does not imply causation" - in so many ways. At first, I was tempted to dismiss this reading as a rehash of lower division psychology (and even middle school science, as some have suggested). However, upon reading the "Threats to Internal Validity" section, my respect for the topic grew. The basics of experiment design cover independent and dependent variables, and introduces the concept of a confounding variable. This reading goes further by defining the notion of 'internal validity', and naming a thorough list of possible threats to it. Reading through them made sense, but a few were quite subtle and something I probably would have never considered (i.e. statistical regression). The examples provided were also very helpful in understanding the material, as well as provided for interesting trivia (i.e. the Pepsi/Cola scandal and the reason why there are no 80-year-old left-handers).
Martin-Chapter 7:
The sometimes extraneous illustrations that the author makes get even more so here. I believe the important takeaway, though, is to remember to be careful when creating operational definitions of variables when doing an experiment. In the experiments I have done in the past, this always seems to be the most difficult aspect of setting an experiment up. How *do* you define what a "violent movie" is? And is there some accepted standard somewhere? Because if there isn't, then all the experiments happening based on "violence" may all give slightly skewed results.
Michael Moeng - Nov 05, 2006 07:51:25 pm
How to do Experiments:
Although the beginning of the article was somewhat stale (perhaps this was an introductory psychology class, so basics in studies needed to be covered), once Martin began covering threats to validity several interesting points came up. Time seemed to be a major cause of threats to validity, so ideally all experiments would be one-time tests of large groups—this doesn’t quite solve the problem of gauging the effect of a drug or treatment on a group over time, however.
How to Decide which Variables to Manipulate and Measure:
I particularly enjoyed the section written about “being realistic.” Using the Television – Violence experiment as an example, it could be argued that traumatizing viewers with excessively violent television might make them less likely to participate in violent acts…
Antonis Mannaris - Nov 05, 2006 09:54:23 pm
How to do experiments This chapter offers some great insight on how to perform experiments on any field. The classification of the different types of variables is quite clear and important in any type of experiment. It is also quite clear that applying these principles in our experiments is going to be very challenging. It is unlikely that we will have a large enough number of users to claim our selection is random and represantative. We can try however to be careful on classifying other variables. The other problem is that in our case, our experiments are quite longer (per user session) than the examples of the chapter. How can we make sure that all the variables are kept during the whole duration of the experiment. For example, towards the end of the experiment, the user will likely be more familiar with the overall look of the design so the task time is likely to decrease.
Which variables to manipulate Experimenting for UI design presents many problems when it comes to choosing variables. First of all, we are unlikely to be able to change our whole design for the experiment so the UI itself is hard to be considered an independent variable. I think we need to decide which parts of the design we will manipulate and see the effect they have on the dependent variables. Another approach would be to consider two values for the variable: with and without our system. We could compare the time if took users to perform a particular task without our system with the time it takes them with it. On the other hand, our dependent variables should be relatively straightforward to determine.
Ming Huang - Nov 06, 2006 12:06:38 am
Chapter 2 of Martin’s “Doing Psychology Experiments” shows us some basic concepts on the design of an experiment, the quantities involved in the design, and how to tweak the design to preserve internal and external validity. Guidelines on confounding variables and internal validity are indeed, per previous comments, a step up from our normal psychology teachings. All in all, this chapter reminds us of some critical evaluation of the design and points out, perhaps sometime non-trivial, aspects that might influence the correctness or applicability of results obtained by our experiments.
Chapter 7 of DPE talks about how to search for, define, and interpret variables and measurements. Interestingly, methods listed as good for determining a good design of the experiment bears the same spirit as the design techniques taught in class. For example, the pilot experiment helps iron out immediate and obvious errors and provides helpful insight to better experiments. This is just like how low fidelity prototypes provide the preliminary testing ground for the final user interface design. The difficulty and subjectivity of dependent variables are just as elusive as designing a good interface. Everyone thinks their interface makes the most sense (face validity), however only user testing and field data supports the usability of the interface (content validity). This is yet another example of how deeply related the fields of HCI, psychology, and cognitive science are.
Tony Yu Tung Lai - Nov 05, 2006 10:14:10 pm
Experiment:
I think most of the information in this chapter are something that we know already, but it is always good to put them in words so we can review what's important. In particular, I find the Coke-Pepsi Slugfest to be interesting. It sounds like Pepsi really thought it through when they chose M and Q as the symbol for Pepsi and Coke. What I really want to know, though, is the reason behind why people like "m" more than "q". I am guessing it has to do the fact that "m" is more commonly used than "q".
Variables:
Although I don't think we will be using Physiological Measures as our dependent variable anything, the concept certainly sounds interesting. The development of the likes of lie detector, though not completely, gave experimentator another dimension to evaluate their experiments. However, I don't think physiological measure should be the sole dependent variable for an experiment, since there are simply way too many unknowns regarding the human body response, especially when involving brain signals.
Robert Taylor - Nov 06, 2006 12:18:15 am
Chapter 2: I found this reading to be rather generic and in a sense similar to the heuristic evaluation reading in that we already have an informal idea of most of the concepts. For example, it's rather trivial to take the concept of independent and dependent variables and apply them to experimentation. Apart from bits on generally setting up an experiment, I'm actually not even sure how HCI would even really apply so much to this reading. We're really doing more of a heuristic evaluation for usability testing; the kind of experiments one might do with this guide seem more related to medical experiments or testing a drug, not usability and design...
Chapter 7: I was impressed by the number of ways validity can be defined or corrupted. Given that our experiments will likely have fairly subjective measures in the final stages of our project, not only defining them but actually assessing the validity of the results we get will be likely be very difficult. That is because, what is usable for someone may not be usable for someone else. Also the different forms of experiments for testing consistency (ie if the experiment is a good test of what we're trying to measure) may be ineffective: if we give someone the same interface twice, they probably will have learned it after the first time. Same with a slightly altered one (I actually find the slightly altered one to have a built in contradiction- either its not similar enough to the original and its not a reliable test for consistency, or it's so close that it's basically the first kind of experiment).The third test, using the odd and even numbered questions seems like it could work though.
Rayhan Lal - Nov 06, 2006 12:34:15 am
How to Do Experiments: It is interesting that this is the first time in the computer science department that I have encountered the scientific method directly. As several others have mentioned, these subjects are discussed extensively in almost any psychology class one takes (particularly psychology 101). Though this material may be vaguely familiar and seem simple, designing a good experiment is no simple task. One must come up with findings that can be generalized (producing external validity) and reproduced while eliminating confounds (maintaining internal validity). Choosing a good sample from a population is challenging in itself, even if one had access to the entire population. Any constraints that limit the accessible sample can also create third variables.
How to Decide Which Variables to Manipulate and Measure: For the different stages of development our independent variable has been the design of the interface (which we have attempted to refine through each iteration). Our dependent variables are measures of usability which we hope are both reliable and valid. As the article describes operationalizing both independent and dependent variables can be a challenge but thankfully in many cases the work has already been done for us. Since we will be performing a pilot usability study, I think our choice of participants need not be as important as evaluating our experiment and measures of variables.
Robert Held - Nov 06, 2006 12:37:45 am
Chapter 2: This chapter did a decent job of outlining how much time and care must be taken to produce scientifically sound studies. The use of examples was also effective. However, their example of lecture pace vs. student attentiveness seemed weak. In particular, their metrics were questionable. They decided to measure attentiveness by recording the lecture and quantifying the background noise. Their reasoning was that lower background noise indicated more attentive students. However, I would argue that lower background noise could also indicate more sleeping students. The options for interpretation of the results actually serve to emphasize the author's point that care must be taken to produce solid, interpretable results.
Chapter 7: The author's description of brain functional imaging did not seem to give the field much credit. He gives a detailed description of fMRI and its possible benefits, but then concludes with a footnote referring to someone who seriously doubts the technology. I think it's important to give both sides of an argument, but simply listing a vague criticism seems like a disservice to the reader. I think it would have been more useful for them to specifically describes some revelations about brain functions that have risen from fMRI and PET studies, and then to also mention the specifics of the two technologies that could lead to some doubt in the results.
Roland Carlos - Nov 06, 2006 12:32:32 am
How To Do Experiments: Nothing too fancy here I think. There's just a lot of technical terms for stuff we probably all did for science fairs back in high school. Still, I like the refresher and it's good to formalize some of the techinques. The reading hit on two pretty important points. One, we need to make sure we know who our users are. This has been a point we've worked with in the past (test your project on the right users). But we need to make sure that the results we get from the users we test with can be applied for more than just that testing group (can we generalize the results). Two, we need to make sure we establish reasonable controls on the experiment. We can't control everything in the test, otherwise we end up with a very unique testing situation, but then again, you don't want to leave it all up to chance, because then the situation becomes too random and testing results become flawed.
How to Decide Which Variables To Manipulate and Measure: Makes sense reading this right after Chapter 2 (I wonder what's in 3-6 then?). This reading ventures into a little bit more of a newer, fresher topic for me. Of course, it has a very valid application to the projects we're all doing. I found it interesting that we have to run tests (pilot experiement) just to make sure that the variables we choose make sense (reasonable results for example). And then of course, there are various ways to test these variables (test-retest, alternative-form, split-half) each with their own advantages and disavantages. It seems rather complicated but the reading makes it clear there is a method to the madness. I'm not sure how much of what we read here will be applicable, but there are definitely some basic principles to take away for our future testing.
Alex Wallisch - Nov 06, 2006 12:21:35 am
How to do an experiment
This is one of those articles that everybody should read, even if they aren't a psychologist. Even if they aren't planning on conducting any experiments anytime soon, this kind of article provides insight on just how studies are carried out and what the results mean. Specifically, it gives clues as to what to watch out for when a report claims something like, "Stanford grads are twice as likely to make over $70,000 out of college than Cal grads." In this case, it would be important to look for confounding variables such as the geographical location where the subjects worked or the fields they went into.
My parents are both researchers and I've been exposed to to a lot of these concepts before. However, I still feel that this reading was worthwhile, as it illustrated many of these ideas particularly well.
How to choose variables
I was aware of the difficulties in defining certain terms before I read this chapter, but I think I developed a greater appreciation for just how tricky it can be. I find it ironic that, not only do you have to come up with a good operational definition of your terms, but you need to be able to come up with a good operational definition of what it means to have a "good operational definition." This is a part that I've never been particularly good at, so I'm glad to have the opportunity to read about it.
Qingyun Tang - Nov 06, 2006 12:12:59 am
How to do experiments: This article has reviewed a lot of statistical concepts such as variables, random sampling, etc. Similar to statistics, when we do experiment, we need to control variables, improve the external validity, set random variables, and minimize confounding variables. They will help us do the next step to optimize our hi-fi model of the project. I found the IQ test example in statistical regression section to be particularly interesting. Overall, I think the article gives a very helpful statistical review.
How to decide which variables to manipulate and measure: I agree with the article that specifying an operational definition is very important. Anyone can therefore do the same experiment by doing exactly the same procedures. There are many variables involved in the chapter. I would like to see more detail on how each variable functions and where to use each of the variables. Also I would like more examples on the validity of tests. The article only gives one example on a test score.
Cheng-Lun Yang - Nov 06, 2006 01:37:56 am
Chapter 2: The concepts presented in this chapter are mostly concepts that we learned in other introductory experimental classes such as Chemistry, Psychology, or Biology. The terms control variable, independent variable, and dependent variables are familiar terms that we learned in high school. I found the real life problem of designing the alarm system for pilots amusing. It’s hard to imagine one experimenter will be so detail oriented to control all the irrelevant variables. I also think finding the balance between controlling enough independent variables to not affect the dependent variables and maintain generality is a hard task.
Chapter 7: I found the fact that psychologists have to define operational definitions before they can start their experiment but physical scientists do not very interesting. From one of my classes before, I heard a theory that physicists derived the equations for fluid mechanics before the invention of multivariable calculus. Then later one, mathematicians tried to make sense of the equations and invented multivariable calculus. This is an example of what is discussed previously, physicists did not have to define the mathematical terms used in the formula but mathematicians did.
Michael Mai - Nov 06, 2006 03:15:05 am
Chapter 2
This chapter introduces how to evaluate the statistical data we get from the remaining phases of our project. Unfortunately, I feel that a lot of the material can be skipped and is not necessary to know. Although randomization is important when collecting results, our projects are geared towards specific needs and industries and hence do not need to be generalized. Also, statistical regression does not play a factor unless the person takes the test multiple times, and for the theory, a person who scores below the mean technically can not move above the mean. Finally, a concept to be careful about if we do proceed with testing may be confounding factors. Watching out for these will prevent a voided data set.
Chapter 7
I feel that the core of this chapter revolves around the concept of operational definitions. It is important to settle upon what level of stimuli should be considered a response for each variable. This concept will probably be combined with the composite concept to allow for a broader range of acceptable values. Applied to our project, the use of composite value results will allow for evaluations of different phases and sections of our projects through assignment of importance factor. Hopefully by having designated members of the group as judges of the same variables, we will be able to ensure reliability and validity.
Chen Chang - Nov 06, 2006 05:22:47 am
How to do experiments: This reading gives good insight on how to conduct an experiment properly and successfully regardless of which field of research. It definitely reminded me of a statistics course that I have taken in the past when the technical terms started flying out from the threats to internal validity section. I felt the going over of all the different kinds of variables at the start to be somewhat redundant and almost common sense -- it was the internal validity section that caught my attention. Obviously confounding variables should be avoided if at all possible simply because they cause many threats to internal validity. The example that particularly stuck out for me was the computer visuals versus the traditional overhead projector slides to demonstrate the history threat to internal validity. Overall, I feel that we should keep in mind the material presented in this chapter for the next steps of our project as we should pick a group of users that can represent the population of all possible users as well as avoiding biased experimental results.
How to decide which variables to manipulate and measure: This chapter is mainly concerned with presenting the reader definitions of all types of dependent and independent variables. I feel that this reading comes right in line with our pilot usability study portion of our project up next. We need to keep in mind the two main terms: reliability and validity when working on our projects. Reliability is described as obtaining the same result over and over again when a measurement is taken. Validity is described as an answer or solution that generally obeys the mass consensus of the population and agrees with a commonly accepted standard. Lastly, I enjoyed the comical visuals that appeared on a few pages of this chapter.
Julius Cheng - Nov 06, 2006 05:25:02 am
Chapter 2:
I remember learning about the scientific method in 6th grade. Since then, we've become so familiar with the precepts of good experiment design that we instinctually apply them to informal experiments in our everyday lives. I suppose it couldn't hurt to hear them again, and to reminded of them as we begin to conduct formal tests on our interfaces, although realistically, we won't be able to apply much of what the author wrote about randomizing and selection biases, since we won't be able to get a large pool of subjects. "Internal validity" us a new term that I've never heard of, but that's about the only thing I truly learned from this reading.
Chapter 7:
This chapter reminds us that it's not enough to simply get results, but that the results must be valid and bias-free. I'm sure we will not totally be able to fall in line within the rigorous guidelines provided by this reading, especially under our limited time and pool of test subjects, but it is good nonetheless to be aware of the shortcomings of our tests and to properly report them. Furthermore, this reading seems to be written for cognitive psychological tasks that use EEGs, fMRIs, and other brain testing devices that will be unavailable to us. Our variables will likely be measured in seconds or tenths of seconds, rather than milliseconds, so our testing is not likely to be as scientifically rigorous as these two readings would like us to be.
Heung Tai - Nov 06, 2006 07:28:44 am
Chap 2. Marin This is like a review for statistics class. For an experiment, we have to know what we can control, that is the independent variable, and what we can't control, that is, the dependent variable. Watch out for lurking variable (confound variable). The interesting part is randomization within constraint, because it exerts control on the outcome but letting it goes somehow. That is quite useful because by exerting control, we know there is a cause and effect relation between two variables. The randomization would just give us some more information about the exact correlation.
Chap 7. Marin Chapter 2 introduce the terms, this chapter illustrates how we should choose the values of those terms. Many people easily are confused by the fact that one variable is called independent and another is called dependent. For example, giving a degree in Celsius and translates to Fahrenheit. What is the independent and dependent variable? They are just correlation and no cause and effect! I think in order to know what variables are in control what are not, we need more imagination. In my example, we can imagine that we know the Celsius degree already but just don't know Fahrenheit. That means we control Celsius and so it is a control variable.
David Eitan Poll - Nov 06, 2006 08:53:36 am
Ch 2: I found it quite interesting to force myself to think about what could go wrong in an experiment where controls are not so clearly defined as in other scientific fields. While it's true that much of the things mentioned are part of "6th grade science," I think that the important aspect here is that people don't behave with the same reliability as chemicals in a test tube. Each person reacts in his own way for a large variety of reasons, and this has to be addressed through experimentation. This is particularly important for UI evaluation, as it is, in effect, a psychological experiment, and the techniques described can be very useful under such circumstances.
Ch 7: Some of the content here was a little dry again, but interesting nonetheless. I especially appreciated the piece about turning multiple independent variables into composite variables. It makes it possible to draw conclusions from an experiment even when there is concern that not enough control was put into the experiment to suppress multiple side-effects. Seeing as there is rarely just one measureable change in people due to some stimulus, this would be an important technique to have mastered for any psychological experiment.
Yimin Yao - Nov 06, 2006 09:10:34 am
How to do experiments: This article definitely reminds me of the biology labs and pyschological experiments I have been doing for the past few years. Although some people feel that these things are more of 'commen sense' or middle school materials, these concepts of conducting a well planned and controled experiments are critical to cutting edge research experiments. Many scientific claims have been found invalid because researchers failed to take care of comfounding variables or failed to select a random pool of experimenters that can represent generalized population.
I am trying to envision how these concepts would be incorporated into our projects since testing out applications does inheritly contain more variables than standard controled and well defined experiments. And given our limitation in number and localtion of experimenters, it is hard to collect a good pool of users without significant bias. But this article would be a good guideline for us to follow in order to minimize those bias or comfounding variables.
How to decide which variables to manipulate and measure: The article provides some good examples to remind us about being careful in defining what you present to users and what you measure in experiments. I found the star tracing example pretty effective in illustrating that defining the right measurements to take is critical in preventing misleading results. If we want to test some features of the UI, perhaps to see whether the location of the a particular button affects the ability of users to complete a task; do we measure the time it takes for users to do the task, or number of clicks the user takes? I guess in this case, we must also take care of the historical (learning) effects as well.
Scott Friedheim - Nov 06, 2006 10:42:36 am
How to do experiments:
I appreciated the level of detail that the author makes in describing the randomization of tests and variables. For people like me who ordinarily don't conduct these kinds of experiments it is overwhelming to think about everything that must be done to ensure a test is indeed random while controlled; but not too much controlled! Also, something that I would have never thought to consider when looking at a study done by others is the issue of morality. If a test is against the moral values of many people such that the test subjects are not representative of a total group, then how valid is the test anyways.
How to decide which variables to manipulate and measure:
While reading some of the earlier comments I found one that I really liked. Patti (above) made the example, "Perhaps this is why we are supposed to find new users for each stage of our design process, as this will hopefully give us a better idea of how first-time users might view our project." I thought this was a great relationship between the reading and how we are testing our applications. It makes perfect sense that as the design process continues to iterate, so must the pool of test subjects. As a quick example, some test subjects being re-used in later testing might pay more attention to aspects that they complained or suggested about resulting observations that are focused rather than on the bigger picture.
Melissa Jiang - Nov 06, 2006 11:05:43 am
Experiment: Like many others have pointed out already, this article reiterated simple statistic measurement rules that we probably learned in high school. However, I did find the idea of confounding variables a good warning. Although it just sounds like another name for hidden variables in stats, people do have a tendency to forget about those confounding variables. I probable would never have thought that something like the time of day would be a confounding variable. While it seems good to take into account and try to eliminate the confounding variables, I can also see that not all confounding variables can be accounted. In order to take into all the confounding variables, experimenters may need to constrain the time, the lighting, the place where they need to take the experiment at. Those variable seem rather easy to control but it will get ridiculous if experimenters begin controller what the experiment eats or drinks before the experiment, how many hours they sleep, who they see and interact with (maybe dependent on whom they interact with, their emotional levels may differ), etc.
On another note, what ever happened with the experiment with the students?
Variables: This article provided a good brief view on choosing and defining the correct variables. I always found that once you define what indepedent and what dependent variables you want, the experiment will roll on much easier than if you did not properly define them. With defining the variables, you are setting a type of guildline for what you want to follow instead of blindly looking for any results and then trying to interpret the results later.
Kimberly Lau - Nov 06, 2006 11:42:31 am
How to do Experiments This article reminds me greatly of an introductory psychology class I once took, in which the experimental process was detailed. Even so, I was still surprised to see how many variables are taken into account during an experiment. As testers, we must maintain firm control and observation over all the variables, relative to what those variables are intended to do (ie random variables should vary randomly, indepenent variables are manipulated for the experiment, control variables are not supposed to change) because variables influence results. Ultimately, this allows us to maintain high internal validity as best as possible, because there is a very real threat of outside uncontrollable threats like maturation or history.
How to Decide Which Variables Whereas the first article described the many variables and what could happen with them, the second article guides us on picking the right variables for our experiment. Although these articles are overall very useful, I feel that these tips would be more helpful for psychology experiments where the many variables named could be applicable. In terms of our project, we are more geared towards having people test our product and give us input on what to improve and what their feelings about it were. In effect, we are trying to "manipulate" the independent variable. Rather, we want them in their normal state to help us.
Johnathan Hawley - Nov 06, 2006 11:52:47 am
Doing Psychology Experiments - I've taken a couple of psychology classes Before, and most of what was mentioned was review: things like independent, dependant, and confounding variables. This article dove into the whole processes a little deeper than what I've been taught before. Topics like maturation, history, and mortality I have not seen before. However, they seem like common sense given that we are working with the scientific method. I thought statistical regression was an unusual phenomenon. I'm not quite sure how scores move to the mean as tests are subsequently taken.
How to Decide Which Variables to Manipulate and Measure - Once again this article seems similar to what I've read in other psychology books. After this reading I can see why our next assignment is the pilot experiment. I suppose our interface is the independent variable, and user satisfaction is the dependant variable. It is kind of difficult for me to break our interface into a more measurable independent variable. We could concentrate on time: how long it takes for our user to accomplish a task. If it takes the user a really long time to accomplish a task, it will definitely be reflected in how satisfied they are with our design.
Patrick Rodriguez - Nov 06, 2006 11:58:33 am
How to do experiments: This reminded me of some of the other classes that I have taken in college and high school. It's a good reminder that there is a scientific component to UI design, rather than it just being based purely on the asthetic whims of a designer. If we really want to understand why our designs will or will not appeal to a certain demographic, then we need to adopt the experimental techniques used in other fields. The article did not directly address UI design, but it's clear from its inclusion that it just as well could be.
How to Decide Which Variables To Manipulate and Measure: This was a good complement to the previous article. The selection of proper variables, both dependent and independent, is key to the success of an experiment. Yet, it isn't always such an easy task to accomplish. What variables should we pick when testing a UI design?
Eric Yoon - Nov 06, 2006 11:59:39 am
How to Do Experiments. This is an interesting article about how to define what you are testing. They introduce the concepts of independent variables and dependent variables, random variables and internal validity. They also touch upon threats to internal validity such as selection -- the problem in which your assessment may be thrown off by the fact that you somehow pre-selected for a certain type of tester with certain kinds of traits. I wondered about this quite a bit in choosing our own testers. For example, in one case we gave a financial incentive (a gift certificate) for people to be our testers; in another case, we used a friend of a friend. What is a "worse" tester, someone with indirect personal ties, or someone who is given some money? I guess both might be seen as being predisposed to help us or say good things, for different reasons.
How to Decide Which Variables to Manipulate and Measure. The article continues to further refine our our understanding of what exactly we will be measuring when we perform further tests of our prototype. Among other things, the article goes into how difficult it can be to pick a dependent variable and accurately obtain a result from your observations. For example, it occurred to me that we could assess the simplicity of our interface in terms of how fast the user could perform an action. That would be directly observable and quantifiable. But after reading the article, I realized how challenging even that seemingly empirical approach is to get right. For example, speed might very well be dependent on how urgent the tone of your voice is in commanding the tester to act as quickly as possible. In that environment, previous experience with technical interfaces plays a bit role too. What I'm learning is that it's quite challenging to come up with good test!
Joe Hart - Nov 06, 2006 12:17:51 pm
Experiments It seems to me that HCI has an issue with scientific experiments. Like psychology, HCI has the human component that prevents the experimenter from getting rid of all confounding variables. Even though you can try to control all the variables it usually comes down to statistical analysis and guessing at the results.
Choosing Variables This should come into use when we start our usability testing. What variables should we be looking at tweaking? How do we determine which ones are important? unimportant? Determining the correct variables to examine could result in a much improved user interface. One such variable that comes to mind is time to navigate, and time spent on each view of the interface.
Hiroki Terashima - Nov 06, 2006 12:06:35 pm
How to Do Experiments. I learned most of what is mentioned in this article when I was in eighth grade, so I have forgotten most of it. The part about selection and how it mentioned that the crop of students differ from semester to semester is easy to relate: when I was TA in CS3, the students that I had in the fall were more competitive (because they were mostly eecs/cs wanna-be's), and the students that I had in the spring were chill (because most of them were just trying out CS). The examples that he used helped me understand what he was trying to say: like many others, I also enjoyed the Pepsi and Coke example- it helped me understand confounding variables and its effects on the outcome of the experiment. All in all, the point is to try to minimize things (confounding variables) that can influence your experiment in unwanted ways.
How to Decide Which Variables to Manipulate and Measure. - I liked the "Be realistic" section and the example it had with the mouse and marijuana, although I feel very sorry for the mouse for having to go through the experiment. It says that "determining the best range of [independent variable of] an experiment is...guesswork", which leads me to believe that there a lot of things that can differ from person to person even though they're doing the same experiment because we think differently. Also, this article seems to suggest that doing the experiment "right" requires lots of consideration of these small details and hard work, and add on top of this the physiological and behavioral measures which are hard to interpret, and this adds a bit more pressure and stuff to think about to the group on our next assignment.
Vahe Oughourlian - Nov 06, 2006 12:06:12 pm
How to do experiments
Though somewhat wordy and repetitive ("the random element of randomness", hmm), this paper is good for the definitions it provides, in terms of given us terms for different elements in our experiments, and the number of pitfalls we should be avoiding, such as the confounding variables mentioned or the internal validity we must maintain throughout our experiments. A related note is the difference between the terms correlation and cause. Two things may occur at the same time, but one may have not necessarily caused the other, an observation this paper points to that we should be wary of making.
The one section I didn't quite appreciate was the definition of randomness and random variables. It seemed that the definitions for these terms was too loose, in that the author refers to "random variables within constraints." If it's random, it's random. Putting constraints on a variable kind of takes away from the randomness, yes?
How to decide which variables to manipulate and measure
Here, we find more concrete definitions for the variables we choose, which is somewhat useful going into our user tests. However, multiple variable definitions that begin to get confusing towards the end of the chapter. Another somewhat frustrating aspect of this writing is the "I already know this stuff" nature of some of the advice. For instance, when he mentions "being realistic" and "selecting a range that shows effect", these should be quite obvious to those who took any sort of science course involving the scientific method. It just seems rote to go over it again. The useful stuff comes in relating his terminology (independent variable, for instance) to something we can relate to so we can use his method effectively.
Sung Yi - Nov 06, 2006 12:35:07 pm
How to do experiments:
This articles involves a lot of pychological aspects, discussing how various factors can alter the experiment's results, and thus validity. It also outlined some statistical concepts like independent/dependent variables, random variables, random sampling and etc. When making explicit causal statements, we need to be careful that we don't make wrong assumtions that can lead to wrong results, especially for the next assignment.
How to decide which variables to manipulate and measure:
This article involved some questionable arguments. When deciding independent/dependent variables, we need to consider other factors such as background, cultures, morality, values etc. How could we so easily select them?
Yen Pai - Nov 06, 2006 12:23:36 pm
How to do experiments: While some comments have focused on the fundamental/basic aspects of this particular reading, I think the author's primary point was to illustrate that it is easy to make mistakes. It is not difficult to understand the definitions of independent, dependent, random, and confounding variables ; it is not difficult to create and conduct an experiment if you know the variables you are trying to isolate and if you think you know how to choose an unbiased subject group. However, it is very easy to make a mistake in isolating the correct variables and very easy to not think about threats to internal validity (especially for longer term tests). The Pepsi/Cola experiments are a good example of such subtleties at play. Applied to the more necessarily informal nature of experiments conducted within HCI, the reading really shows just how much UI testing is an inexact science and how much care has to be taken in conducting and interpreting results.
How to Decide Which Variables To Manipulate and Measure: Again, I believe the primary point here is to illustrate the subtleties of variable choice and measurement and to demonstrate how easy it can be to make a mistake. Of particular interest to me are the methods for isolating unobservable or hard-to-measure variables like cognition. What is striking about something like dual-task methodology is that even though it provides a practical way to measure the usage of cognitive resources, it presumes a specific model of cognition (single resource).
Siyan Wang - Nov 06, 2006 12:31:14 pm
How to do Experiments: From my background in cognitive science and psychology, most of these topics are quite familiar to me. However, it seems in this class we haven't really focused on these types of very precise experiments, rather our evaluations seem to have been carried out in a more qualitative fashion, examining the user's responses, etc. This chapter did reveal some new information that I was previously unfamiliar with in the section about threats to internal validity. It seems this would be a problem in our test with the hi-fi prototype if we tested it with people who were experienced with the lo-fi prototype.
How to Decide Which Variables to Manipulate: Again, with my background in psychology and cognitive science, this material is quite familiar to me. However, this seems to be less of a deciding factor in HCI experiments, since it seems we are always trying to improve the usability of our interface. The only thing that would really matter is our operational definition as to what qualifies as a more "usable" interface, but it seems that we often use speed of learning and manipulation as a quantitative measure, such as with the GOMS analysis.
Michael Udaltsov - Nov 06, 2006 12:38:08 pm
Ch2 - Experiments - As other people noted, this seems like a general introduction to the experimental method and how to perform valid scientific experiments. This chapter doesn't specifically mention how to apply this to HCI or what kinds of variables to consider when evaluating user interfaces and the interactions, though I expect more information will be presented in lecture and other readings. Personally, it seems odd to use the standard experimental method to evaluate interfaces, since there are so many random variables when dealing with computers and users of various skill levels. I think other methods of testing interfaces and identifying flaws would allow more and better improvements to be made.
Ch7 - Variables - This chapter starts with some more detailed information on choosing the variables for an experiment, but I think the parts that will apply the most to this class are the Physiological and Behavioral Measures. Monitoring users' behaviors while they use an interface could serve as a good indication of places where the interface is easy or difficult, and what parts should be reviewed and updated afterwards. At the same time, since it's hard to accurately measure indirect variables such as behavior and stress, the results might not always be reliable.
Sean Carr - Nov 06, 2006 12:37:17 pm
How To Do Experiments:
This chapter seems to be a review of high school science and stats classes. It is a good refresher though because I haven't read these rules for experiments in a while. He covers "Threats to Internal Validity" better than I remember in high school. Specifically the ideas of history and mortality.
How To Decide Which Variables to Manipulate and Measure:
I was hoping this chapter would not be another refresher on experimental method, but it was. It did go into more detail and had some good examples, but I think most students at Cal (not just CS students) have learned about how to properly setup and conduct an experiment. Once again it is good to have a refresher, but 40 pages seems like a little much. This could probably just be discussed for a little bit in section and that would suffice.
Dexter Lau - Nov 06, 2006 12:50:03 pm
How to do experiments: This article discusses the scientific process of creating an experiment and then correctly analyzing the data it produces. When creating an experiment, there is a given situation with an independent variable and any number of control variables, random variables, or variables randomized with constraints. The point of the experiment is to measure the resulting dependent variable. After collecting data, you must be aware of all circumstances that may also alter the way the data is interpreted such as history, maturation, selection, interactions with the selection, mortality, testing, and statistical regression.
Variables to manipulate and measure: Coupling well with Chapter 2, this article deals with the processes of creating the same conditions with the given variables. You must also specify the range of variables that can be used. This is tricky to do because you want to be realistic yet describe the effect of the variable with full emphasis. Everything must be arranged in such a way that it can all be recreated with the full support of valid and reliable results. Reliability can be measured through a number of ways: test-retest, alternative-form, and split-half. Likewise there are a number of tests for validity: face, content, concurrent, and predictive. Directly observable variables are easy to measure objectively, however their counterparts, indirectly observable variables, are very hard to measure since there is a level of subjectivity involved. This can be mitigated with behavioral measures.
Tom McClure - Nov 06, 2006 12:51:50 pm
It is interesting to consider that in adopting the UI design cycle, the designer must for some stages of the cycle take off his or her design hat and don the hat of the scientist. For very large projects, specialists might be brought in for these stages of the cycle, but I would guess that for most projects that require UI development, the designers themselves would need to commit to this role of scientific experimenter. It's helpful to be reminded of the guidelines for conducting experiments, and getting results that you can trust. Bias can creep in from all sorts of unexpected places, even in setting up and designing the experiment itself. My wife always says she's never seen a statistic she can trust, and this seems especially true in her field of medicine where a great deal of what she does is to read studies published in journals, and she always does so with a skeptical eye toward how the authors' conclusions were drawn and whether they "followed the rules."
Huangnankun - Nov 06, 2006 12:58:17 pm
the scientific methods described in this section are applicable to a variety of mediums and scientific research. While I think they apply to a certain degree to HCI experiments, I feel that they are an "overkill" most of the time since in HCI, much of the constants in experiements are actually variables due to human differences and its very hard for the conductors of the experiment to keep a exact log and control over the experiment.
This article deals with specific experimental techniques which allows us to conduct unbiased experiments. The tools used in this article are "variables", factors which the user can control in order to evaluate the result of an experiment. I think this will come in handy when we are doing our hi-fi testing
Siu Pang Chu - Nov 06, 2006 12:41:53 pm
How to do Experiments: The article give some basic ideas on the design of an experiment. It define some variables that we may need to record during the experiments: independent variable , dependent variable, control variable , confounding variable and random variable. At the end of article, there is a example of a experiment,. In the diagram, some circumstances are chosen as the independent variables,and control variables and random variables. As the example show , many variable become the control and didn't throughout the experiment
Eric Vacca - Nov 08, 2006 04:20:25 am
Ch2: This is a good review of they basic aspects of running an experiment. The material covered in this chapter is pretty intuitive and most sciency people should be familiar with the terms already. These concepts are very important to UI design because in testing our UI designs we're testing how people use them. So while its called testing for UID, they still really are experiments and being aware of random variables and other threats to internal validity are very important. I
Ch7: This chapter extended upon chp 2 and gave us more of an idea of how to do a pychological experiments. Procedure was given on how to use and assign each variable with clear concise examples. One part which i thought seemed unnessessary was the pilot experiment. It was unclear why an experimenter should choose the range before hand and what meaning it really has. The authos weakly implies that the range we are choosing here relates to the type of test we are running but the example is too simple to clarify the situation.
Jason Lee - Nov 09, 2006 03:48:55 pm
Chapter 2: Reading this article was interesting, as it reminded me very much of an AP Statistics class that I took in high school, as well as a psychology class, that taught us the fundamentals of random samples, control variables, and other such things. One of the more interesting things to consider about experiments is the role of the confounding variable. In the Coke vs. Pepsi example, one wonders whether Pepsi knew about the preference of M over Q ahead of time and set up the experiment that way on purpose. As someone who strongly believes that there is always more than two sides to each story, I am always thinking of other explanations for something that may seem clear for a single reason. This has come up quite often in coding in every CS class I've taken, especially in this class, in which confusion over user interfaces can stem from a variety of reasons. It is important to always consider that other variables/factors could be contributing to your results.
Chapter 7: Once again, this brought back early school memories, particularly about learning the scientific method and how to properly go about setting up an experiment. It is important to choose the correct independent variable and the correct range in which to vary it, as the outcome and the analysis of the experiment hinges on correctly deciding these factors. Incorrectly choosing an independent variable can lead to criticisms of the experiment giving misleading results. This often comes up in real life, as the results of psychological experiments are sometimes discredited or criticized due to the methods of the experiment. This also comes up in the methods of creating our user interface, as the tasks we create for our experimental users must accuratley reflect real tasks and also reveal any potential UI problems.
Utsav Shah - Nov 12, 2006 06:26:32 pm
Ch. 2
This reading was quite interesting and demonstrated the complexity of conducting experiments (and in terms of class, usability tests). One of the key points all experiments must strive for is what Marin calls external validity (after all, if the experiment is not generalizable then it is practically useless). One thing that caught my attention was Marin's explanation of random assignment and selection. However, in most design projects I'd say that the primary concern is to find a specific population of users to observe or test. The questions then is how we account for results from a specific group, but still handle weird cases and exceptions.
Marin advocates an unbiased experiment with valid generalizable results that take care of all the variables (random and controlled) in a systematic way. I think this is a pretty interesting assertion and if you look at usability studies done by big companies like Yahoo and Microsoft they are often conducted in this scientific method manner in a close environment. However, I believe that our previous readings and lectures have demonstrated that there is more relevant information in an experiment conducted in a user-centered manner. Marin says that it’s important for validity's sake to have a controlled environment. However, often for design purposes we want ethnographic information where the environment is open to randomness. This way we can design a system that is more robust.
Ch. 7
This chapter was where Marin went into lengthy details from Chapter 2. One of the things I didn't quite agree with was the manner in which ambiguous concepts were tested. Marin writes that if there is an experiment to test whether children are more aggressive after exposure to violent versus nonviolent TV programs, then we have to give operational definitions (systematically reproducible versions) of our independent and dependent variables. I guess that means if we are testing something like 'ease of use' then we must define it. However, this definition is only definable according to the system we are testing. For example, if I am testing the CalMail system, ease of use may be how quickly I can access my mail. For someone else doing the same experiment, it may be how easy I can compose messages. It’s hard to come up with a specific definition of “ease of use”. Marin says to look at existing scientific literature; however, for something like design, this is very difficult since everyone has a different system.
CharlesLeung - Nov 13, 2006 12:09:02 pm
Ch. 2
I thought that this reading was pretty much review because for most people in this class, they took cs70 and that class already went through taking surveys and what kind of problems should be avoided (like non-random samples and confouding variables). I suppose this is still important to review for most people because designing good experiments can be hard to do if one does not keep these things in mind. I'm not sure how as a group we can use these methods because it is hard enough already to get people to test our programs. In addition, it would be hard to make sure that our testers are randomly selected. I'm sure all the groups in this class have selection biases for their testers.
One thing that this chapter mentioned, which I found was very interesting and new was the mortality threat to internal validity. In fact, I'm sure that we would encounter this problem if we needed to interview one of our interviewees for the last project again. I suppose if one offered some sort of reward to our testers, he/she could better fend off mortality.
Ch. 7
I thought that the way they defined the range if a program was violent or not was very interesting. I definitely would not have thought of using the percent of people who thought that the program was violent or not as the range for the independent variable. Although this seems like a very clever way to define the range, I was thinking that it still doesn't differentiate between how violent a show was. For example, a show could be obviously mildly violent, or obviously extremely violdent. In this example they would both get a 100% violent rating.
I thought that their example of concurrent validity was kind of confusing. Although it seems like this version of validity was supposed to be the strongest form, the example they used seemed to be dependent on the subjectivity of the respondents. I thought that the example for predictive validity was a much better and more sound than the example for concurrent validity.
Aleksandr (Sasha) Ashpis - Nov 13, 2006 12:42:45 pm
Chapter 2
- Although I have taken a science class or two in my day, I in theory understand why variables of all types are important and necessary, I don’t see the need in an app development. First, large scale testing needs to be done to get any kind of valid/valuable information. In addition, I see it only helping if the results show something way out of proportion. Other than that, it seems like shaving seconds of a mean or trying to fit into a standard deviation seems pointless.
- However, I am a big proponent of doing user testing, I think that’s an extremely valuable part of the design and implementation process that unfortunately to often get overlooked. Furthermore, statistics are a dangerous thing, like the article states, tests and data are prone to statistical regression, even if it is not intentionally done so, let alone if someone is trying to. Another negative with statistics is the stacking the deck problem, as demonstrated in the reading by there being no 80 year old left handed people and concluding that left handed people live short lives. But as the article points out, this overlooks a key element that left handed people were usually socially pressured to change to being right handed and thus 80 years later, the statistics are stacked against left handed people and conclusions that are reached are not valid, if scrutinized closely.
Chapter 7
- I completely agree that an independent variable is the most important part that needs to be chosen correctly, but there is no standard method for choosing independent variables are sometimes incorrectly chosen and produce meaningless results. Also, reliability and validity are of course critical to any experiment, but once again one must be careful. The article states that measuring something twice and coming up with the same result is a good test for reliability, which is true for a ruler, but what about a mechanical machine that after a certain period of time needs to be calibrated to produce accurate results. The calibration period is usually derived with a cost benefit analysis, of how often one needs to calibrate and the cost doing so, of course a company would want to keep the number of times low, a great selling point. Therefore, if the calibrating period is once a year, then the month before calibration the machine might be giving unreliable results, and operators and users accept them incorrectly as valid.
- As mentioned previously, the choosing of which data and/or statistics to use can have a profound impact on the outcome, that can be far from the real truth, and this sometimes this is done intentionally to prove something with faulty science, and sometimes its just overlooked, but in the end the statistics are flawed and the conclusions reached are not accurate, that is why one must be careful when quoting statistics and using them solely to prove anything. Furthermore, due to this concept I am not sure what purpose the actual numbers pose to designing and implementing a UI, unless there is an industry standard for measuring. But having a standard way, has other problems with it, because one cannot group all UI’s into one standard, and these are just some of the dilemmas.
Jae Chang - Nov 13, 2006 12:49:51 pm
How to do experiments:
Comment 1: This article was very interesting. It basically teaches us how to define what you are testing. Even if identifying variables in experiments is a key to success, I believe that it is a very difficult task. However, this article tells us how to define variables and how the variables affect experiments. The variable factors can alter the result of the experiment and also ensure the validation of the result. Wrong assumptions can be lead to conclude invalid result. Defining different types of variables and selecting right variables without wrong assumption is crucial fact of the success of the experiments.
Comment 2: The article introduces experimental method allows causal statements to be made. Independent variable is for the circumstance that is manipulated, and dependent variable is for the behavior that is measured because it can be dependent on the levels of the independent variable. Control variables is for some of the other circumstances because they may be set at a particular level and not allowed to vary, and random variables are for other circumstances that can be allowed to vary in a random. I believe that defining variables in experiments are very important because the experiments can not be tested without knowing variables. Also, the variables very useful because they reveals the correlation among different experiments.
How to Decide Which Variable To Manipulate and Measure
Comment 1: The article basically tells the same concepts in the article How to do experiments. Choosing an independent variable is very important, but there can be a problem: difference in precision between what the general public will accept in defining a term and what experimental psychologist will accept. Also, choosing the range of the independent variable can alter the result of the experiment. Choosing a dependent variable is next step and reliability and validity of the dependent variable should be resolved logically. The approach introduced to build multiple variables may help the experiment, but I think that there can be too many unknown variable so that the validation of the experiment cannot be ensured.
Comment 2: The operational definition of the variables should be specified first when we choose an independent variable. Another important thing we need to consider is that the range of the variables should be selected very carefully: the range should be large enough to show the experiment effect but small enough to be realistic. However, some people might argue with the article in choosing variables because other background factors such as region, culture, language, and sex should be considered but the factors are very difficult to define. The article somewhat seems like that the author is trying to provide readers a way to convince other people using the variable methods.
Bryce Lee - Nov 13, 2006 12:20:15 pm
Chapter 2
This chapter emphasizes the critical, yet often overlooked, elements of an experiment that we must keep in mind. Often times, individuals are eager to accept results which match their hypothesis, disregarding other factors may have contributed or that the results are not externally valid. For example, for my senior year statistics project in high school, my group did a survey dealing with the amount and distribution of costs of Senior prom among dates. However, we did not take into account school-specific traditions about girls paying for certain portions and the fact that the area's mean income may have affected the amount spent.
Martin's point on the number of control variables affecting the external validity and generality is also something that I tend to overlook. There needs to be a fine balance between generality and specification if studies are to extend to the general case. The fact such consideration is ignored often brings doubts to many modern claims and even research publications. Statistics has been largely discredited due to people ignoring these critical components.
Chapter 3
The correct definition of an independent variable often times goes against the desired outcomes of the experimenters. The more granularity exercised, the further away the experiments moves from appearing to have a large, major result. Martin mentions that extreme selections in a range should be avoided due to other influences that may be in play. However, if a person intends to study a certain range away from the mean, how should such selections be made to avoid this problem?
Also, I believe that Martin takes the ease of pilot studies a little too far. He talks about how you can coax people to participate or even participate yourself. By doing so, you introduce unaccounted variables that may act as confounding components to your end conclusions. I think a much better method to would be to approach the pilot study with the same rigid standards as a normal experiment. The point of a pilot study is to refine your experiment before the actual experiment, but not to loosen parameters.
Charles Lee - Nov 13, 2006 12:17:59 pm
Chapter 2:
- This Chapter seems to cover basic statistics and experiment design, nonspecific to user interfaces. Lurking variables and correlations can sometimes be hard to remove - they can be strongly tied to the variables being tested for. Ideally, a well-designed experiement would have some method of indirection to separate the tested variables from any related ones.
- Even the examples provided in the chapter potentially have linked variables or false assumptions. The chapter linked a quiet room with an attentive class, but a quiet room often indicates a sleeping class, or at least an entirely unengaged class. The invalid assumption of "quiet == attentive" caused the rest of the experiment to show false results.
Chapter 7:
- More users would benefit our UI testing. The benfit of experiments being done on a large sample are to average out some of the weakly linked fluctuations, such as some people already knowing a similar interface and thus completing tasks quicker. With a large sample, we can at least claim knowledge about an average computer user, and be further from sampling errors.
- Experiements in the scientific community are often published so that they can be repeated elsewhere. This solves many of the problems involved in designing an effective experiment. First of all, more eyes on an experiment allow for a greater chance that accidentally linked variables will be discovered. Second, more people attempting to repeat the experiment with different subjects could expose linked variables from the first experiment, if the subjects from a different area were did not have the same linked variables. Unfortunately, the most reliable solution to badly designed experiments is an expensive one: wider testing.
Keenahn Jung - Dec 08, 2006 10:08:36 am
Chapter 2: All too often, as engineers, we forget good scientific principles. We rush to get something that works empirically, but does that really prove that our interface is good? Modify one variable, test it against a control group, and crunch the data. That's the only way we can know for sure. This chapter is a basic review, and for anyone who been published in a peer-reviewed journal knows that their study will be torn apart if they don't adhere to these principles. Even for people who have simply read peer-reviewed articles, this stuff is review.
However, one thing that they point out that is great is that it is often very difficult to remove all possible confounding factors, especially when dealing with human subjects. Being at college, most of our user pool for our prototype testing are going to be students, and thus have actually very similar economic backgrounds and states of mind. Thus, unless we test our products on the general public, it will be difficult to see how well our interfaces do on non-students. Even if we tested on more than three people, there would still be subtle inherent biases, no matter how hard we try to remove them, in our demo scripts and instructions, how we present the interface, etc. It is nearly impossible to conduct a study completely objectively, but by following these guidelines we will minimize the effects.
Chapter 7: In the scientific community, we realize the importance of publishing our results. If results cannot be independently verified by thrid parties, then there your study is invalidated as a fluke. In fact, the community moves at blinding pace to declare studies as being invalid. You often do NOT see this in industry, where companies conduct their own research, that is not reproducible, and then make claims based on it. This may be good business, as they can manipulate the results as they see fit, but it is horrible science.
The basic overview of statistics analysis techniques may be useful for the uninitiated, but for those of us who have conducted experiments before, with the intention of the results being published in peer reviewed journals, this is all review. I do like how the author stresses the point that correlation does NOT mean causation, which, although it seems obvious, is often presented as such in non-scientific circles. The news often publishes articles of studies that have been conducted by private parties (not academic institutions) that link two seemingly unrelated factors, trying to establish causality.
Robin Franco - Dec 15, 2006 07:27:12 pm
How to do experiments 1: Most of the data presented in this article is the typical information one would learn in an introductory psycology class or statistics class. The basis for setting up a scientific experiment is an important aspect to learn if one wants to have any credibility in their findings, so I believe this is very important information despite it not being directly related to user interfaces.
How to do experiments 2: I find it very interesting how such small aspect in an experiment such as simply naming one choice "M" and another "Q" could have such strong effects on the outcome of an experiment. It just goes to show you how careful one has to be in the design of their experiment. But in this specific case one has to wonder what exactly leads to the preference of one over the other. Can this be extrapolated to other aspects of society? Is there a correlation between the winner of elections and the sounds in their names? One can only surmise at this point.
Variables to manipulate and measure 1: This extends on the ideas learned in the other reading. It gives good concrete examples of the various variables. This information is useful for setting up our future studies, but in my opinion this level of detail is not as important for our purposes.
Variables to manipulate and measure 2: We are presented with the concepts of reliability and validity. Reliability is determined by how often the experiment creates similar results. Validity is determined by the relationship of the dependant variables and the hypothesis we're trying to test. The examples presented to test for these to concepts will be useful for future studies.
