In order to verify the quality of software, you have to use a lot of different tools, including static and dynamic analyzers. In this article, we'll try to figure out why only one type of analysis, whether static or dynamic, may not be enough for comprehensive software analysis and why it's preferable to use both.
Our team writes a lot about the usefulness of static analysis and the benefits it brings to your projects. We like to run our tool on various open-source projects to find possible bugs, which is our way to popularize the static code analysis method. In its turn, static analysis helps to make programs more high-quality and reliable and reduce the number of potential vulnerabilities. Perhaps everyone who is directly involved in work on source code has that feeling of satisfaction at having bugs fixed. But even if the process of successfully spotting (and fixing) bugs doesn't trigger your endorphins, you surely enjoy the thought of having development expenses reduced thanks to the static analyzer, which has helped your programmers use their time more effectively and efficiently. To find out more about how you can benefit from the use of static analysis in terms of money, see this article. It gives an approximate estimate for PVS-Studio, but those results can be extrapolated to other static analysis tools available on market.
All said above seems to suggest that the purpose of static analysis is to find bugs in the source code as early as possible, thus reducing the expenses on bug fixing. But why do we need dynamic analysis then, and why sticking only to one of the two techniques may be insufficient? Let's give more formal and clear definitions of static and dynamic analyses and try to answer these questions.
Static code analysis is the process of detecting errors and code smells in software's source code. To analyze a program, you don't need to execute it; the analysis will be performed on the available code base. The closest analogy to static analysis is the so called code review except that static analysis is an automated version of code review (i.e. performed by a bot program).
The main pros of static analysis:
- Bug detection at the early development stages. This helps to make bug fixing much cheaper because the earlier a defect is detected, the easier — and, therefore, the cheaper — it is to fix.
- It allows you to precisely locate the potential bug in the source code.
- Full code coverage. No matter how often one block of code or another gets control while executing, static analysis checks the entire code base.
- Easy to use. You don't need to prepare any input data sets to do a check.
- Static analyzers detect typos and copy-paste related mistakes fairly quickly and easily.
The objective cons of static analysis:
- Inevitable false positives. A static analyzer can get angry about code fragments that actually don't have any bugs in them. Only the programmer can solve this problem and mark a warning as a false positive, which means it will take some of their working time.
- Static analysis is generally bad at detecting memory leaks and concurrency related errors. To detect such errors, you'd in fact have to execute some part of the program in virtual mode, which is an extremely difficult task. Besides, such algorithms would require too much memory and CPU time. Static analyzers typically don't go any farther than analyzing some simple cases. Dynamic analyzers are more fit to diagnose memory leaks and concurrency related errors.
It should be noted that static analyzers don't focus exclusively on bug catching. For instance, they can provide recommendations on code formatting. Some tools allow you to check your code for compliance with the coding standard your company sticks to. This includes indentation of various constructs, the use of space/tabulation characters, and so on. In addition, static analysis can be helpful for measuring metrics. A software metric is a quantitative measure of the degree to which a program or its specifications possess some property. See this article to learn about other uses of static analysis.
Dynamic code analysis is the analysis performed on a program at execution time. This means you must have your source code converted into an executable file first. In other words, code containing compilation or build errors can't be checked by this type of analysis. The check is done with a set of input data fed to the program under analysis. That's why the effectiveness of dynamic analysis directly depends on the quality and quantity of the test input data. It is this data that determines the extent of code coverage at the end of the test.
With dynamic testing, you can get the following metrics and warnings:
- Resources used: execution time of the entire program or its individual parts, the number of external queries (for instance, to a database), the amount of RAM and other resources used by the program.
- The extent of code coverage by tests and other metrics.
- Software bugs: division by zero, null dereference, memory leaks, race conditions.
- Some security vulnerabilities.
The main pros of dynamic analysis:
- You don't have to have access to the program's source code to analyze it. It should be noted, however, that dynamic analysis tools are differentiated by the way they interact with the program under analysis (this is discussed in more detail here). For example, one quite common dynamic analysis technique involves code instrumentation before the check, i.e. the addition of special code fragments to the application's source code for the analyzer to be able to diagnose errors. In that case, you do need to have the source code of the program at hand.
- It can detect complex memory handling errors such as indexing beyond array bounds and memory leaks.
- It can analyze multithreaded code at execution time, thus detecting potential problems that have to do with access to shared resources or possible deadlocks.
- Most implementations of dynamic analyzers don't generate false positives since errors get caught as they occur. Therefore, a warning issued by a dynamic analyzer is not a prediction made by the tool based on the analysis of the program model but a mere statement of the fact that an error has occurred.
The cons of dynamic analysis:
- Full code coverage is not guaranteed. That is, you are very unlikely to get 100% coverage by dynamic testing.
- Dynamic analyzers are bad at detecting logic errors. For example, an always true condition is not a bug from a dynamic analyzer's perspective since such an incorrect check simply disappears earlier at the compilation step.
- It's more difficult to precisely locate the error in the code.
- Dynamic analysis is more difficult to use in comparison with static analysis as you need to feed enough data to the program to get better results and attain as full code coverage as possible.
Dynamic analysis is particularly useful in those areas where program reliability, response time, or resources consumed are the primary concern. A real-time system managing a critical production sector or a database server are some examples of such systems. Any error in these areas can be critical.
Getting back to the question why sticking only to one of the two types of analysis may not be sufficient, let's take a look at a couple of quite trivial examples of bugs that one analysis method has no problems diagnosing while the other is not fit to detect, and vice versa.
The following example is taken from the Clang project:
MapTy PerPtrTopDown;
MapTy PerPtrBottomUp;
void clearBottomUpPointers() {
PerPtrTopDown.clear();
}
void clearTopDownPointers() {
PerPtrTopDown.clear();
}
A static analyzer would point out that the bodies of the two functions are identical. Of course, two functions having identical bodies aren't necessarily a definite sign of a bug, but it is very likely that they have resulted from using the copy-paste technique combined with carelessness on the programmer's side — and that leads to unexpected behavior. In this case, the clearBottomUpPointers method should call the PerPtrBottomUp.clear method. Dynamic analysis wouldn't notice anything wrong in this example because it's an absolutely legitimate piece of code from its point of view.
Another example. Suppose we have the following function:
void OutstandingIssue(const char *strCount)
{
unsigned nCount;
sscanf_s(strCount, "%u", &nCount);
int array[10];
memset(array, 0, nCount * sizeof(int));
}
In theory, a static analyzer could suspect there's something wrong with this code, but implementing such a diagnostic is a very difficult and pointless task. The example is taken from this article, which also elaborates on why it's a bad idea to teach static analyzers how to diagnose errors like that. In brief, static analyzers are very bad at figuring out that a call of the memset function may result in indexing beyond array bounds as they cannot foresee what number will be read from the strCount string; and if the value of strCount is read from a file, it becomes an impossible task for static analysis altogether. On the other hand, a dynamic analyzer would have no trouble noticing and pointing out the memory handling error in this code (given that the program is fed the right data).
This article doesn't aim at comparing static and dynamic analyses. There's no single technique that could diagnose the whole variety of software defects. Neither type of analysis can completely replace the other. To improve the quality of your programs, you'll have to use different types of tools so that they complement each other. I hope the examples shown above are persuading enough.
I don't wish to look too biased toward static analysis, but it is this technique that's being most spoken of and, more importantly, included by companies into their CI processes lately. Static analysis acts as one of the steps of the so called quality gates to building a reliable and high-quality software product. We believe static analysis is going to become a standard software development practice in a couple of years, just like unit testing once did.
To wrap up, I'd like to point out once again that dynamic analysis and static analysis are just two different methods, which complement each other. In the end, all these techniques serve the single purpose of increasing software quality and reducing development expenses.
References:
- Terminology. Static code analysis.
- Terminology. Dynamic code analysis.
- Andrey Karpov. Static and Dynamic Code Analysis.
- Andrey Karpov. Myths about static analysis. The third myth — dynamic analysis is better than static analysis.
- Andrey Karpov. PVS-Studio ROI.
Автор: Ilya_Gainulin