简体   繁体   English

在PHP文件中解析SQL的最佳方法?

[英]Best Approach to Parse for SQL in PHP Files?

For my senior thesis, I developed a program that would automatically detect and suggest fixes to SQL injection vulnerabilities using prepared statements. 对于我的高级论文,我开发了一个程序,可以使用预准备语句自动检测并建议修复SQL注入漏洞。 Specifically the mysqli extension for PHP. 特别是PHP的mysqli扩展。 My question for the SO community is this: What would your preferred approach be to detect the SQL in PHP source code? 我对SO社区的问题是:在PHP源代码中检测SQL的首选方法是什么?

I used an enum containg the SQL keywords (SELECT, INSERT, ...) and basically parsed each line, iterating over the enum to determine if any SQL was present. 我使用了包含SQL keywords (SELECT, INSERT, ...)的枚举,并基本解析了每一行,迭代枚举以确定是否存在任何SQL。 Additionally, I had to make sure that the parser was not erroneously detecting html (for example <\\select>). 另外,我必须确保解析器没有错误地检测到html(例如<\\ select>)。

For me this solution worked fine, but now I have a little more time on my hands now and have thought about refactoring the code to use a more elegant (and efficient) solution. 对我来说这个解决方案运行良好,但现在我现在有更多的时间在我的手上,并考虑重构代码以使用更优雅(和有效)的解决方案。 Please limit your solutions to using C# as that is what I wrote my program in. 请限制您的解决方案使用C#,因为这是我编写程序的内容。

Your solution seems fine to me. 你的解决方案似乎很好。 The other way would be to parse the PHP file with a Lex/Yacc parser using the grammar for PHP, there is one good C# grammar parsing tool called Coco/R http://www.ssw.uni-linz.ac.at/coco/ . 另一种方法是使用PHP的语法用Lex / Yacc解析器解析PHP文件,有一个很好的C#语法解析工具叫做Coco / R http://www.ssw.uni-linz.ac.at/可可/

However I believe if you do parse the language, you will end up consuming too much time (in development and in computing) for no additional results. 但是我相信如果你解析语言,你最终会消耗太多时间(在开发和计算中)而没有额外的结果。

I would stick with your opportunistic approach, but test it against various PHP code and tweak it to cover all possible cases. 我会坚持你的机会主义方法,但要针对各种PHP代码进行测试,并调整它以涵盖所有可能的情况。

Maybe theres some milage in parsing text lines against the BNF for, say, SQL92 , and scoring each line on how closely the fragments match the grammar. 也许在对比BNF的文本行中,比如说SQL92 ,并对每一行评分片段与语法的匹配程度。

Sounds like some heavy lifting though. 虽然听起来有点沉重。 Your simple approach will catch such a large percentage of real-world cases already. 您的简单方法已经捕获了如此大比例的实际案例。

I do not know the specifics of variables in C# so you will have to forgive or down-vote me for using PHP but 70% of the time my SQL query goes into a variable like so 我不知道C#中变量的具体细节,所以你不得不原谅或拒绝使用PHP投票,但70%的时间我的SQL查询会变成这样的变量

$sql = "SELECT * FROM table;";

Beyond that I am unable to think of anything you can do to improve on what you already have. 除此之外,我无法想到你可以做些什么来改善你已经拥有的东西。

Do you take into account statements that are created over several lines and use variables within the string? 您是否考虑了通过多行创建的语句并在字符串中使用变量? (Example below) (以下示例)

$sql = "SELECT * FROM table WHERE fname = $fname OR snmae = $sname";

I would say it would be best to look for function calls instead of looking for SQL itself. 我会说最好寻找函数调用而不是寻找SQL本身。 Possibly modify the PHP parser to look for function calls that result in running an SQL query which is not a prepared query. 可能修改PHP解析器以查找导致运行SQL查询的函数调用,该查询不是准备好的查询。

I do not know the specifics of variables in C# so you will have to forgive or down-vote me for using PHP but 70% of the time my SQL query goes into a variable like so .. 我不知道C#中变量的具体细节,所以你不得不原谅我或者使用PHP投票,但是70%的时间我的SQL查询会变成这样的变量。

Yeah, my original approach was to just look for the $sql vars since that is what most people use, but after testing against a few PHP apps I quickly threw that solution out because some developers use some funky variable names ... 是的,我最初的方法是只查找$ sql变量,因为这是大多数人使用的,但在对几个PHP应用程序进行测试后,我很快就抛出了该解决方案,因为一些开发人员使用了一些时髦的变量名...

Do you take into account statements that are created over several lines and use variables within the string? 您是否考虑了通过多行创建的语句并在字符串中使用变量? (Example below) (以下示例)

Yep. 是的。 I also attempted to handle statements that were generated conditionally, but that didn't always work so well. 我还尝试处理有条件生成的语句,但这并不总是很有效。 ;) ;)

A simple regex to detect all CRUD sql statements used with functions (assuming $script contains the whole php script) 一个简单的正则表达式,用于检测与函数一起使用的所有CRUD sql语句(假设$ script包含整个php脚本)

preg_match_all('/\(\s*?"(?:SELECT|INSERT|UPDATE|DELETE) .*?"\s*?\)\s*?;/is', 
               $script, $matches);

It should match all possible SELECT, INSERT, UPDATE, DELETE statements, if they're placed within parentheses and double quotes. 它应匹配所有可能的SELECT,INSERT,UPDATE,DELETE语句,如果它们放在括号和双引号内。 It's case insensetive and should match statements that span across multiple lines too. 这是个例子,并且应该匹配跨越多行的语句。

edit #1: Regex for matching CRUD statement like string assignments; 编辑#1:正则表达式匹配CRUD语句,如字符串赋值;

preg_match_all('/\$\w+\s*?=\s*?"(?:SELECT|INSERT|UPDATE|DELETE) .*?"\s*?;/is', 
               $script, $matches);

edit #2: 编辑#2:

// $variable detecting version of #1 regex
preg_match_all('/\(\s*?"(?:SELECT|INSERT|UPDATE|DELETE) .*?(?:\$\w+){1}.*?"\s*?\)\s*?;/is', 
                   $script, $matches);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM