简体   繁体   中英

What are the gaps between symbolic execution and taint analysis?

I recently read a paper titling "All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask)" by Dr. EJ Schwartz . In the paper, he mainly talked about their applications in binary level security context.

I'm curious about the exact differences between dynamic taint analysis and forward symbolic execution .

From what I can see, taint analysis tracks information flows from object x( source ) to object y( sink ), whenever information stored in x is transferred to object y. So the major concern is what object can be transitively affected by the source. While symbolic execution treats some inputs as symbolic values and tries to express other variables with symbolic ones; thereby it answers on what conditions the symbolic input affects the succeeding programs.

I can see that at the binary level, taint analysis is often mentioned with the vulnerability caused by return address overwritten ; while symbolic execution can deal with more types of vulnerable issues such as integer overflow , runtime assertion errors , resource leak (eg, memory leak, file open/close), buffer overflow .

However it seems that modern taint analysis does not only involves data flow analysis , most of them will track the control flow conditions; and in several vulnerability detection scenarios the tainted input is also represented as the symbolic value and is propagated like the way symbolic execution does. On the other side, symbolic execution engines cannot fully uses symbolic values separated by different path conditions due to the limitations of the underlying constraint solvers and the execution/interpretation runtime; thereby they cannot achieve the high branch or path coverage as expected.

So in general cases, can we say that taint analysis is a kind of coarse symbolic execution, or symbolic execution is a kind of precise taint analysis?

Interesting question! Here's my 2 cents: symbolic execution uses a kind of taint analysis to construct path constraints . Symbolic execution also employs an SMT/SAT solver to generate concrete values for variables and/or inputs, such that a certain path constraint is satisfied.

Since taint analysis does not employ an SMT/SAT solver, I would say it is not a kind of symbolic execution. Maybe one could say taint analysis is a part of symbolic execution .

This is just an opinion. Please feel free to challenge it.

I agree with @Benny, it is a really interesting question. You probably learn a lot by formulating these kind of questions and even more when you try answering them.

I would like add to Benny's answer:

In order to implement taint tracking and symbolic execution, one has to define semantics of the language (eg x86 assembly in case of binaries). For example, one has to describe what

add eax, ebx

'means', ie does to the state. The definition of the taint tracking semantics could maybe seen as a kind of a subset of the symbolic execution semantics. The taint tracking semantics are encoded in the symbolic execution semantics. The common part is

  • If ebx is tainted, then eax is tainted.
  • If ebx is symbolic (iow contains an SMT formula containing one or more symbolic vars) then eax is symbolic

Yet, the semantics for symbolic execution have to contain further info (eg the exact arithmetic operation): - eax is "whatever was in eax before" + "whatever was in ebx before"

Please comment or correct me!

In my opinion, to answer your question we have to answer the following one: i) does symbolic execution have the potential to find all execution paths that taint analysis can, and more; ii) does taint analysis have the potential to find all execution paths that symbolic execution can, and more; iii) do they both have the same potential of finding the same execution paths; iv) can they both compute execution paths that the other cannot.

In my opinion, iv) is correct, meaning that one is not a subset. However, I do agree that there is indeed a big overlap.

We can eliminate options i) and iii) because symbolic execution only finds feasible execution paths, whereas taint analysis may find infeasible ones at it does not resort to constraint solving.

To eliminate option ii), I think (correct me if I'm wrong) that there are execution paths that symbolic execution can expose and taint analysis cannot. For example:

for(int i=0;i<3;i++) {
   if(someString.charAt(i)=='4')
       //do something
   else
       //do something else
}

In such case symbolic execution exposes all eight possible execution paths, whereas taint analysis (if I'm not mistaken) does not.

I think the key difference is whether the execution is concrete or symbolic ---whether you are interested in the taint propagation (for checking an information leak or control flow hijack) on a single concrete execution , or if you want to explore other possibilities of such propagation by harnessing the power of the solvers. The merit of dynamic taint analysis is in its low overhead and thus suitable for runtime monitoring . On the other hand, (pure/dynamic) symbolic execution is capable of exploring paths other than the concrete one, and thus suitable for offline analysis on the security properties of your interest.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM