Wednesday, August 4, 2010

Human Computation Math

Luis von Ahn, a professor of Computer Science at Carnegie Mellon University and inventor of CAPTCHA or “Completely Automated Public Turing test to tell Computers and Humans Apart” (those annoying things you fill out when you sign up for services online), gives a Google Tech Talk on Human Computation. The actual talk is about 40 minutes long with an additional ~20 minutes of QA at the end. Take some time and watch the video below, it’s well worth it.



A great mix of Statistics concepts:
@3:14 On problem with polls on the Internet
A few years ago Slashdot, which is a very popular website, put up this poll in their site asking which is the best computer science graduate school in the United States. This is a very dangerous question to ask over the web. As with most online polls, IP addresses of voters were recorded to make sure that each person could only vote at most once. However, as soon as the poll went up, students at CMU wrote a program that voted for CMU thousands and thousands of times. The next day, students at MIT wrote their own program, and a few days later the poll had to be taken down with CMU and MIT having like a gazillion votes and every other schools having less than a thousand. I guess the poll worked in this case. In general, this is a huge problem because you simply cannot trust the results of an online poll because anybody could just write a program to vote for their favorite option thousands and thousands of times.

@16:45 On probability of data corruption from cheating when 2 players play ESP Game (a game that labels images)
One thing that some of you may be wondering about is… uh… what about cheating? For instance, could you try to cheat to screw up the label? Something like “my office mate and I could try to log in to the game at exactly the same time maybe we’ll get paired with each other and if we pair with each other we can agree on any word for any image” or even worse someone could go on Slashdot and say “Hey everybody, let’s play ESP Game and let’s all agree on the word a for every image.” Could happen. Fortunately, I thought about this and the ESP game has several mechanisms that fully prevents cheating. Let me tell you a few of the things that we do to prevent cheating. Here’s one. At random, we actually give players test images. These are just images that are there to test whether the players are playing honestly or not. And what they are is they are images for which we know all the most common things that people enter for them, and we only store a player’s guesses and the words that they agreed on if they successfully label the test images. So, if you think about it, in a way this gives sort of a probabilistic guarantee that a given label is not corrupt. What’s the probability that a given label is corrupt given that the players successfully label all their test images. This probability can be boosted by next strategy which is repetition. So we only store a label after N pairs of players have agreed on it. Where N is a parameter [here he means parameter as in input variable to a function not the sample statistic vs population parameter] that can be tweaked. So every now and then we actually delete all the taboo lists for the images and we put the images back into the game afresh. We only store a label after N pairs of players have agreed on it. So if we let X be the probability of label being corrupt given that players successfully labeled test images, then after N repetitions the probability of corruption is X to the N. This is assuming that N repetitions are independent of each other. But if X is small, X to the N is really really small.

@23:50 On statistically significant difference when measuring play time in A/B Testing
We continually do measurement to try to figure out what are the things that make people play longer. So let me explain one of these measurements to you. At some point in the history of the game, I added this very small message in the corner of screen alerting you whether your partner had already entered a guess or not. So a very tiny message, this just a magnification of it, just tells you if your partner has already entered a guess or not. When this was added to the game, it wasn’t added to all the players, it was just added to a small random subset of the players, and then we measured whether the players who have this feature played longer than those who didn’t. It turns out that those who had this feature played a whopping 4% longer than those who didn’t. You might not think that 4% is very large, but actually is a statistically significant difference. And, if you think about it, it’s just a very tiny message in the corner of screen makes people play 4% longer.

@39:10 On Sampling Bias. Population of gamers vs. population of all (gamers and non-gamers)
Audience question: Does it concern you at all that the fact that you are using a game will automatically give you a very biased population of people that are giving us answers to problems that we want answers to. and this population of people are people who have way more time on their hands and are not motivated to maybe get a job or do something… [indistinct]
Luis von Ahn: Very good question. Very good question. It’s true that the population is biased. There’s no questions about that. But for alot of really simple things, I mean.. anyone can do it. So it is true, but it is true that the population is biased. That’s definitely true.
Audience question: Have you seen any results. Any…
Luis von Ahn: I don’t know of any.. I can tell you that the population is biased, but I have not see anything that really tell me “Oh.. Because we are using gamers it’s like this is happening instead of the general population. I have not seen that.”

This is a great video to start discussion. One problem for high school students is Luis von Ahn’s rate of speech, it’s a little too fast for many students. Also, there might not be enough context to understand the concepts from the video. It’s probably better if shown after learning the concepts.

There’s also many rate/work related data:
@5:05 Spam companies use of sweat shops to solve CAPTCHAs.
@7:00 Interesting data on human-hour (a more gender neutral term). Solitaire played in 2003 (9 billion human-hours). Empire Estate Building (7 million human-hours). Panama Canal (20 million human-hours).
@15:10 Estimate on labeling the entire web. Probably a good tie-in to estimation on URL Shortener Math.
I usually pick weird days when many students (usually at least half) are called out for testing or other various reasons to show these clips. Usually followed by us actually playing those games with a purpose to see how they work. This is not so much a video for a lesson than it is providing the motivation for learn the stuff that we do. Great example for #whenwilliusemath.

No comments:

Post a Comment