Tainted Data and Format String Attack Strike Again

A recent code execution vulnerability (we also call this a code injection vulnerably) was discovered in Palo Alto Networks’ GlobalProtect SSL VPN, a product that handles SSL handshakes and in particular, certain versions of the software running on these products, PAN-OS. The vulnerability was discovered by security researchers Orange Tsai and Meh Chang and documented on their blog. What they discovered is a remote control execution (RCE) vulnerability in certain versions of the PAN-OS software which has been reported as CVE-2019-1579.

This is an interesting vulnerability in that it includes several related vulnerabilities that are both exploitable and high risk. It’s also includes the types of vulnerabilities that can be detected by static analysis; tainted data used unchecked in potential dangerous code constructs and a format string attack that leads to a code injection which, in turn, allows for remote control execution by the attacker. Although vulnerability CVE-2019-1579 is categorized as belonging to a software weakness, CWE-20, ‘improper input validation”, the attack itself exploits another, “use of an externally-controlled format string,” CWE-134, which is a form of format string vulnerability. There is also CWE-94 “Improper Control of Generation of Code” or simply, code injection, which is the dangerous part of this attack – the allowance of executable code or syntax to change the behavior of the software and eventually lead to arbitrary code execution. Let’s look at the anatomy of the vulnerabilities exploited for this attack.

Use of Tainted Data

In terms of secure programming, it’s a best practice to consider any and all unchecked input values as “tainted.” In this, a tainted data source is a location in the program where data is being read from a risky source. For instance, in C, a call to the function getenv(). A tainted data sink is a location to which tainted data should not ﬂow, unless it has been checked for validity. An example of a poor place to input tainted data would be to the function strcpy(). However, once a value has been checked, it is said to have been cleansed and no longer tainted.

Tainted data vulnerabilities should always be a concern for developers. Any software that reads input from any type of sensor, file, environment variable or user input should treat all values as potentially dangerous. Good, secure programming practices dictate that these input values should be checked for correctness before use in the rest of the program. Static analysis tools can help detect these vulnerabilities, more on that later.

Format String Attacks

A format string attack occurs when an attacker is able to manipulate the formatting options in string formatting functions, usually those in the C library. Examples of vulnerable functions would be sprintf(), fprintf(), etc. If a string used as a parameter to these functions is used, unchecked, from user input it’s possible to include format string syntax in the string, causing unintended behavior.

Format string functions have a variable number of arguments and must use the format string (the tainted data input!) to determine the number of arguments is intended. All these functions assume that there is some number of arguments to the function was pushed on the stack. Because no mechanism in the C runtime exists to let it know there are no more arguments, printf(), for example, will simply pick the next item that happens to be on the stack, interpret that as an integer, and print it. It’s easy to see that this can be used to print an arbitrary amount of information from the stack. If in the input string contained “%d %d %d %d”, for example, then it would print the values of the next four words on the stack. Consider this example from CWE-134:

#include <stdio.h>

void printWrapper(char *string) {

printf(string);

}

int main(int argc, char **argv) {

char buf[5012];

memcpy(buf, argv[1], 5012);

printWrapper(argv[1]);

return (0);

}

In this case, command line parameters are passed directly to the printf() function in printWrapper(). A simple attack can expose the data on the stack which is a data leakage risk but it’s also possible to inject data onto the stack, creating a more severe code injection vulnerability.

Code Injection

Code injection occurs whenever an attacker is able to input code or interpretable syntax into the input of a program and then trick it into executing that code. It’s one of the most severe vulnerabilities when an exploit is possible since it gives free reign to the attacker on executing code on your systems. The well-known SQL injection is a form of this vulnerability and despite its notoriety, it is still one of the most commonly exploited vulnerabilities.

Going back to the format string attack, the end result of these attacks is either data leakage or remote code injection and execution. The code injection is the final step: Tainted data from outside the system is used in a vulnerable data sink, in this case a format string function, and a well crafted string manipulates the stack in such a way that the injected code is executed as part of function call return.

The format specifier that makes this possible is “%n”. Normally, the corresponding argument is a pointer to an integer. As the format string is being interpreted to build up the result string, when the %n is seen, the number of bytes written so far is placed in the memory location indicated by this pointer. For example, after the printf() below has completed, the value in i will be 4:

printf(“1234%n”, &i);

Static Analysis to the Rescue

Static analysis tools like GrammaTech CodeSonar are effective in detecting the vulnerabilities discussed above. Almost any static analysis tool will warn developers about using unsafe format string functions, however, this can generate a lot of unwanted reports and false positives. The real strength of advanced static analysis is tainted data checking which relies of sophisticated data flow analysis in order to detect the use of tainted input data through the program logic and its eventual use in a vulnerable data sink. This analysis weeds out the unwanted reports and focuses on the cases where use of format string functions are truly dangerous – when they can be exploited by external data.

We have a great video which explains this in detail, rather than repeat it here:

Summary

The Palo Alto Networks vulnerability in their PAN-OS software is another example of a tainted data attack that leads to a code injection via a format string weakness. It’s a severe security issue for vulnerable customers since arbitrary remote execution is possible on infected systems. It’s an interesting case study because it’s the kind of vulnerability that is detectable with advanced static analysis. It speaks to the need for preventive approach to security where these types of exploits are mitigated with proper coding standards, automated testing and static code analysis.