简体   繁体   English

正则表达式:匹配除一个特定字符之外的所有内容

[英]Regular Expression: Match everything except one specific character

I try to make a specific expression but I am not able to solve my actual problem.我尝试做一个特定的表达,但我无法解决我的实际问题。 Maybe some of you can...也许你们中的一些人可以...

I have the string:我有字符串:

<!ENTITY a0 "dosdosdosdosdosdosdosdosdosdosdosdosdosdos"

and now I want to match everything (at least 10 characters) inbetween the quotation marks, except a quotation mark.现在我想匹配引号之间的所有内容(至少 10 个字符),但引号除外。

I started with:我开始于:

"(.{10,}?)" 

at the first sight, this matches very well, but this also matches the following string which is wrong for me.乍一看,这匹配得很好,但这也匹配以下对我来说是错误的字符串。

<!ENTITY a0 "dosd" 
<!ENTITY a0 "osdos"

The match starts with the first quotation mark of the first line and ends with the last quotation marks of the last line.匹配以第一行的第一个引号开始,以最后一行的最后一个引号结束。

I understand why this happens, but I am not able to build a regular expression which matches every character 10 times except a quotation mark.我明白为什么会发生这种情况,但我无法构建一个正则表达式,除了引号外,每个字符都匹配 10 次。 The dot is just too generic.点太笼统了。

edit: new problem编辑:新问题

new string:新字符串:

<data>&a0;&a0;asddd&a0;&a0;&a0; 234324&a0;&a0;&a0;&a0;&a0;&a0;</data>

now i tried to match specific expressions inbetween those two XML-tags till the first "<" occurs.现在我尝试匹配这两个 XML 标签之间的特定表达式,直到出现第一个“<”。 Inbetween these tags I need the appeareance of at least 10 "&a0" which results in在这些标签之间,我需要至少 10 个“&a0”的外观,这导致

&[a-zA-Z0-9]+;

The problem is, that different other character strings (except < ) also may occur inbetween those tags.问题是,不同的其他字符串(除了 < )也可能出现在这些标签之间。 Is this possible to solve?这有可能解决吗?

I tried with:我试过:

<[a-zA-Z0-9]+>([^<]{10,}?)<\/[a-zA-Z0-9]+>

But can now it matches everything and not the wanted &[a-zA-Z0-9]+;但是现在它可以匹配所有内容而不是想要的 &[a-zA-Z0-9]+;

Thanks, guys!谢谢你们!

You may use您可以使用

"([^"\r\n]{10,})"

See the regex demo .请参阅正则表达式演示

The [^"\\r\\n]{10,} pattern matches 10 or more occurrences of any char but " , CR and LF. [^"\\r\\n]{10,}模式匹配 10 次或更多出现的任何字符,但" 、 CR 和 LF 除外。

Note you may use a greedy limiting (range/interval) quantifier here.请注意,您可以在此处使用贪婪的限制(范围/间隔)量词。

To restrict a generic pattern, a good idea is to check your requirements.要限制通用模式,一个好主意是检查您的要求。 If you actually plan to match letters, digits and _ , you may replace the [^"\\r\\n] negated character class with a \\w shorthand character class.如果你真的打算匹配字母、数字和_ ,你可以用\\w速记字符类替换[^"\\r\\n]否定字符类。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM