简体   繁体   English

正则表达式:删除“双引号”中的所有文本(包括多行)

[英]regex: remove all text within “double-quotes” (multiline included)

I'm having a hard time removing text within double-quotes, especially those spread over multiple lines: 我很难在双引号中删除文本,特别是那些分布在多行中的文本:

$file=file_get_contents('test.html');

$replaced = preg_replace('/"(\n.)+?"/m','', $file);

I want to remove ALL text within double-quotes (included). 我想删除双引号内的所有文本(包括在内)。 Some of the text within them will be spread over multiple lines. 其中的一些文本将分布在多行中。

I read that newlines can be \\r\\n and \\n as well. 我读到新行可以是\\r\\n\\n

Try this expression: 试试这个表达式:

"[^"]+"

Also make sure you replace globally (usually with a g flag - my PHP is rusty so check the docs). 还要确保全局替换(通常使用g标志 - 我的PHP生锈,所以请检查文档)。

Another edit: daalbert's solution is best: a quote followed by one or more non-quotes ending with a quote. 另一个编辑:daalbert的解决方案是最好的:引用后跟一个或多个以引号结尾的非引号。

I would make one slight modification if you're parsing HTML: make it 0 or more non-quote characters...so the regex will be: 如果您正在解析HTML,我会做一个小修改:将它设为0或更多非引号字符...所以正则表达式将是:

"[^"]*"

EDIT: 编辑:

On second thought, here's a better one: 再想一想,这里有一个更好的:

"[\S\s]*?"

This says: "a quote followed by either a non-whitespace character or white-space character any number of times, non-greedily, ending with a quote" 这句话说:“引用后跟非空白字符或空白字符的任意次数,非贪婪,以引号结尾”

The one below uses capture groups when it isn't necessary...and the use of a wildcard here isn't explicit about showing that wildcard matches everything but the new-line char...so it's more clear to say: "either a non-whitespace char or whitespace char" :) -- not that it makes any difference in the result. 下面的那个在没有必要时使用捕获组......并且在这里使用通配符并不明确表示通配符匹配除了新行char之外的所有内容...所以更明确地说:“或者一个非空白字符或空白字符“” - 不是它对结果有任何影响。


there are many regexes that can solve your problem but here's one: 有很多正则表达式可以解决你的问题,但这里有一个:

"(.*?(\s)*?)*?"

this reads as: 这读作:

find a quote optionally followed by: (any number of characters that are not new-line characters non-greedily, followed by any number of whitespace characters non-greedily), repeated any number of times non-greedily 找到一个引用,可选地后跟:(任意数量的字符不是非贪婪的新行字符,后面是非贪婪的任意数量的空白字符),重复任意次数非贪婪

greedy means it will go to the end of the string and try matching it. 贪婪意味着它会到达字符串的末尾并尝试匹配它。 if it can't find the match, it goes one from the end and tries to match, and so on. 如果它找不到匹配,则从最后开始并尝试匹配,依此类推。 so non-greedy means it will find as little characters as possible to try matching the criteria. 所以非贪婪意味着它会找到尽可能少的字符来尝试匹配标准。

great link on regex: http://www.regular-expressions.info 关于正则表达式的重要链接: http//www.regular-expressions.info
great link to test regexes: http://regexpal.com/ 测试正则表达式的重要链接: http//regexpal.com/

Remember that your regex may have to change slightly based on what language you're using to search using regex. 请记住,根据您使用正则表达式搜索的语言,您的正则表达式可能需要稍微更改一下。

You can use single line mode (also know as dotall) and the dot will match even newlines (whatever they are): 您可以使用单线模式(也称为dotall),点将匹配甚至换行符(无论它们是什么):

/".+?"/s

You are using multiline mode which simply changes the meaning of ^ and $ from beginning/end of string to beginning/end of text. 您正在使用多行模式,它只是将^$的含义从字符串的开头/结尾更改为文本的开头/结尾。 You don't need it here. 你在这里不需要它。

"[^"]+"

Something like below. 像下面的东西。 s is dotall mode where . sdotall模式,其中. will match even newline: 将匹配甚至换行:

/".+?"/s
$replaced = preg_replace('/"[^"]*"/s','', $file);

will do this for you. 会为你做这件事。 However note it won't allow for any quoted double quotes (eg A "test \\" quoted string" B will result in A quoted string" B with a leading space, not in AB as you might expect. 但请注意,它不允许使用任何带引号的双引号(例如A "test \\" quoted string" B将导致带A quoted string" B带有前导空格,而不是像您预期的那样在AB中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM