简体   繁体   English

RegExp 捕获文字

[英]RegExp Capture literals

I need a way to strip all literals from PHP files.我需要一种方法来从 PHP 文件中去除所有文字。 My current regexp solution works fine when there is no nested quotes in the string.当字符串中没有嵌套引号时,我当前的正则表达式解决方案工作正常。 Tried updating it to handle escaped quotes as well, which did work in most cases, except when there are escaped escape characters in the string.尝试更新它以处理转义引号,这在大多数情况下都有效,除非字符串中有转义字符。

This is what it should be able to handle, if this should be done correctly这是它应该能够处理的,如果这应该正确完成

"text" “文本”
"\\"text\\"" “\\“文本\\””
"\\\\" “\\”
"\\"\\\\\\"" "\\"\\\\\\""

So as I see it, it needs to handle cases where there are an even amount of escape characters and cases where there are an uneven amount.因此,在我看来,它需要处理转义字符数量均匀和数量不均匀的情况。 But how do you get this into regexp?但是你怎么把它变成正则表达式呢?

Update更新

I want to clean up PHP files to make them easier to search through and index different parts, something for a small project that I am playing with.我想清理 PHP 文件,使它们更容易搜索和索引不同的部分,这是我正在玩的一个小项目的东西。 Since literals can contain mostly anything, they can also contain data similar to some of the searches.由于文字几乎可以包含任何内容,因此它们也可以包含类似于某些搜索的数据。 So I want to remove anything in the files that is wrapped in " or '.所以我想删除包含在“或”中的文件中的任何内容。

"/\\"[^\\"]*\\"/" "/\\"[^\\"]*\\"/"

This will work unless there is a nested quote "\\"data\\"".除非有嵌套的引号“\\”data\\””,否则这将起作用。

"/\\"(\\\\\\\\\\"|[^\\"])*\\"/" "/\\"(\\\\\\\\\\\\"|[^\\"])*\\"/"

This will work unless there is "\\\\"除非有“\\\\”,否则这将起作用

This is what I need这就是我需要的

$var = "..."; $var = "...";

Becomes成为

$var = ; $var = ;

You could use this regular expression based substitution:您可以使用这个基于正则表达式的替换:

Find: ((?<!\\\\)(?:\\\\.)*)(["'])(?:\\\\.|(?!\\2).)*?\\2查找: ((?<!\\\\)(?:\\\\.)*)(["'])(?:\\\\.|(?!\\2).)*?\\2
Replace: $1更换: $1

Note that if you are going to use this regular expression in PHP (where you encode it as a string literal) you need to escape the backslashes and quote in that regular expression, so like this:请注意,如果您打算在 PHP 中使用此正则表达式(将其编码为字符串文字),则需要在该正则表达式中转义反斜杠和引号,如下所示:

preg_replace("~((?<!\\\\)(?:\\\\.)*)([\"'])(?:\\\\.|(?!\\2).)*?\\2~s", "$1", $input);

As PHP string literals can span multiple lines, the s modifier is added so that .由于 PHP 字符串文字可以跨越多行,因此添加了s修饰符,以便. matches newline characters also.也匹配换行符。

See it run on eval.in查看它在eval.in 上运行

NB: You'll need to think about heredoc notation also...注意:您还需要考虑heredoc表示法......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM