Atom | Example | Effect Size | p-value |
---|---|---|---|
Change of Literal Encoding | printf("%d", 013) | 0.63 | 2.93e-14 |
Preprocessor in Statement | int V1 = 1
#define M1 1
+1; | 0.54 | 8.53e-11 * |
Macro Operator Precedence | #define M1 64-1
2*M1 | 0.53 | 1.77e-07 * |
Assignment as Value | V1 = V2 = 3; | 0.52 | 3.78e-10 |
Logic as Control Flow | V1 && F2(); | 0.48 | 5.62e-09 |
Post-Increment/Decrement | V1 = V2++; | 0.45 | 6.98e-08 |
Type Conversion | (double)(3/2) | 0.42 | 5.17e-07 |
Reversed Subscripts | 1["abc"] | 0.40 | 1.52e-06 |
Conditional Operator | V2 = (V1==3)?2:V2 | 0.36 | 1.74e-05 * |
Infix Operator Precedence | 0 && 1 || 2 | 0.33 | 5.90e-05 |
Comma Operator | V3 = (V1+=1, V1) | 0.30 | 2.46e-04 |
Pre-Increment/Decrement | V1 = ++V2; | 0.28 | 6.89e-04 |
Implicit Predicate | if (4 % 2) | 0.24 | 4.27e-03 |
Repurposed Variable | argc = 7; | 0.22 | 6.66e-03 |
Omitted Curly Braces | if (V) F(); G(); | 0.22 | 8.64e-03 |
Unaccepted Atom Candidates | |||
Dead, Unreachable, Repeated | V1 = 1;
V1 = 2; | 0.16 | 0.059 |
Arithmetic as Logic | (V1-3) * (V2-4) | 0.10 | 0.248 |
Pointer Arithmetic | "abcdef"+3 | 0.03 | 0.752 * |
Constant Variables | int V1 = 5;
printf("%d", V1); | 0.00 | 1.000 |
Contingency Tables
In our Existence experiment we analyzed our results by comparing pairs of obfuscated and transformed code. For each pair we categorized whether subjects got both snippets correct, only one, or neither. From these counts we could apply a clustering-adjusted variation of McNemar's test (supplied by clust.bin.pair). Below are these data.
Atom | Both Correct | Obfuscated Correct | Transformed Correct | Neither Correct |
---|---|---|---|---|
Change of Literal Encoding | 35 | 2 | 89 | 20 |
Preprocessor in Statement | 39 | 5 | 73 | 28 |
Macro Operator Precedence | 53 | 2 | 37 | 5 |
Assignment as Value | 59 | 6 | 68 | 13 |
Logic as Control Flow | 32 | 12 | 72 | 30 |
Post-Increment/Decrement | 75 | 4 | 54 | 13 |
Type Conversion | 75 | 10 | 52 | 9 |
Reversed Subscripts | 70 | 6 | 40 | 30 |
Conditional Operator | 109 | 1 | 34 | 1 |
Infix Operator Precedence | 105 | 4 | 25 | 11 |
Comma Operator | 53 | 17 | 50 | 26 |
Pre-Increment/Decrement | 82 | 11 | 35 | 18 |
Implicit Predicate | 108 | 6 | 20 | 12 |
Repurposed Variables | 54 | 15 | 33 | 44 |
Omitted Curly Braces | 77 | 13 | 33 | 23 |
Dead, Unreachable, Repeated | 138 | 1 | 6 | 1 |
Arithmetic as Logic | 133 | 4 | 8 | 1 |
Pointer Arithmetic | 67 | 21 | 23 | 35 |
Constant Variables | 142 | 2 | 2 | 0 |
Atom descriptions
All numbers are stored in binary inside of a computer, but for human convenience we tend to represent numbers in decimal, and occasionally hexadecimal or octal for certain uses. Even though different representations can hold the same number, their accessibility to humans for different computations can be very different.
Confusing: 208 & 13 Non-confusing: 0xD0 & 0x0DPreprocessor directives must stand alone on their own line. After the preprocessor runs, however, that line is treated as whitespace. As a result, preprocessor directives may be present in the middle of an expression as long as they are on their own lines. Since the preprocessor directive and the source code are processed in different compiler phases, they are processed independently. Yet, to the casual reader, they appear to interact with each other.
Confusing: int V1 = 1 #define M1 1 +1; Non-confusing: #define M1 1 int V1 = 1 + 1;Macros can be used to add many features to C, including guaranteed inlining, duck-typing, and adding metadata like line number and file name to program output. Unfortunately, macro references are impossible to distinguish from other identifiers and can often act in ways that variables and functions can not. This can cause readers to be misled.
Confusing: #define M1 64 - 1 2 * M1 Non-confusing: 2 * 64 - 1The assignment expression changes the underlying state of the machine when it executes. However, it also returns a value. Often when reading an assignment expression people will forget one of the two effects of the expression.
Confusing: V1 = V2 = 3; Non-confusing: V2 = 3; V1 = V2;Traditionally, the && and || operators are used for logical conjunction and disjunction, respectively, in predicates. Due to short-circuiting, they can also be used for conditional execution.
Confusing: V1 && F2(); Non-confusing: if (V1) F2();The post-increment (and decrement) operator increases the value of its operand variable by 1, while returning the original value of the variable. Confusion here arises because the value of the expression is different from the resultant value of the variable.
Confusing: V1 = V2++; Non-confusing: V1 = V2; V2 += 1;The C compiler will implicitly convert types in various situations when there is a mismatch, but sometimes this conversion also results in an implicit change of outcome from what the author may have intended.
Confusing: 3/2; Non-confusing: trunc(3.0/2.0);Arrays can be indexed using the subscript operator, but underneath ``E1[E2] is identical to (*((E1)+(E2)))''. Since addition is commutative, so too is the subscript operator.
Confusing: 1["abc"]; Non-confusing: "abc"[1]The conditional operator is the only ternary operator in C, and functions similarly to an if/else block. However, the conditional operator is an expression for which the value is that of the executed branch.
Confusing: V2 = V1 == 3 ? 2 : 1; Non-confusing: if (V1 == 3) { V2 = 2; } else { V2 = 1; }There are 32 binary operators (operators which accept one operand before and one operand after) in C. Each of these operators falls into one of 15 precedence classes and has either right-to-left or left-to-right associativity. Needless to say, the average programmer knows only a functional subset of the information needed to correctly parse complicated expressions of binary operations.
Our preferred method for removing precedence confusion is with parenthesis. Other removal transformations are possible, such as introducing intermediate identifiers. These other strategies can have larger impacts on the structure of the code and so were avoided when possible.
Confusing: 0 && 1 || 2 Non-confusing: (0 && 1) || 2The comma operator is used to sequence an otherwise ambiguous series of computations. Whether due to its eccentricity, or its odd precedence, the comma operator is commonly misinterpreted.
Confusing: V3 = (V1++, V1); Non-confusing: V1++; V3 = V1;Similar to post-increment and post-decrement, the pre-increment and pre-decrement operators change a variables value by one. In contrast to the other operators, pre-increment and pre-decrement first update the variable then return the new value, instead of the old.
Confusing: V1 = ++V; Non-confusing: V2 += 1; V1 = V2;By convention, variables tend to have a single conceptual identify and represent one idea. When a variable is used in different roles across the lifetime of the program, its current purpose can be difficult to follow.
Confusing: int main(int argc, char **argv) { argc = 7; ... Non-confusing: int main(int argc, char **argv) { int V1 = 7; ...C looping and selection exhibit dynamic behavior over a trailing statement. The trailing statement, optionally, can be enclosed in braces for clarity, or to extend the number of sub-statements modified by the loop or conditional. Confusion may arise when the braces are omitted for brevity.
Confusing: if (V1) F1(); F2(); Non-confusing: if (V1) { F1(); } F2();Redundant code is code that will either never be executed, or it's effects are immediately invalidated. It can be counter-intuitive that code exists to have no impact on the output of the program.
Confusing: V1 = 1; V1 = 2; Non-confusing: V1 = 2;Arithmetic operators are capable of mimicking any predicate formulated with logical operators. Arithmetic, however, implies a non-Boolean range, which may be confusing to a reader.
Confusing: (V1 - 3) * (V2 - 4) Non-confusing: V1 != 3 && V2 != 4Pointers admit several operations like integer addition/subtraction, but, in many cases, these operations are interpreted by the reader to update the target data instead of the pointer data.
Confusing: "abcdef"+3 Non-confusing: "abcdef"[3]Constant variables are a layer of abstraction that, in the context of a complex system, let us focus on the concept a value represents rather than the value itself. When simply trying to determine the output of a piece of code, having a layer of indirection that hides the value of your data can cause difficulty.
Confusing: V1 = V2; Non-confusing: V1 = 5;