Source code serves a dual purpose. It communicates program instructions to machines and programmer intent to people. Unfortunately, it is common for a person to come to a different conclusion about the behavior of a piece of code than a machine does when executing the program. In other words, a programmer’s interpretation of a piece of code’s behavior differs from that of the machine. While a difference of interpretation can naturally happen in some situations (such as those involving randomness, poorly understood APIs, or undefined behavior), it can also occur in small, self-contained lines of code. These design patterns, which are easy to misinterpret, naturally lead to bugs in code.
We are studying these atoms of confusion using empirical, objective techniques. We have and will continue to run multiple human subject studies. We design our studies around gathering measures of performance and correctness over subjective data.
All of our experiments are performed specifically to make the data available to other researchers online. Many projects make their data available by request, but we have decided to make all of our data (once it’s been scrubbed of personally-identifying information) available online by default. It is our hope that this free exchange of data will make it easier for other researchers to participate alongside us. Additionally we hope that this may encourage others doing separate work to also share their data with the general public.