I try to make a specific expression but I am not able to solve my actual problem. Maybe some of you can...
I have the string:
<!ENTITY a0 "dosdosdosdosdosdosdosdosdosdosdosdosdosdos"
and now I want to match everything (at least 10 characters) inbetween the quotation marks, except a quotation mark.
I started with:
"(.{10,}?)"
at the first sight, this matches very well, but this also matches the following string which is wrong for me.
<!ENTITY a0 "dosd"
<!ENTITY a0 "osdos"
The match starts with the first quotation mark of the first line and ends with the last quotation marks of the last line.
I understand why this happens, but I am not able to build a regular expression which matches every character 10 times except a quotation mark. The dot is just too generic.
edit: new problem
new string:
<data>&a0;&a0;asddd&a0;&a0;&a0; 234324&a0;&a0;&a0;&a0;&a0;&a0;</data>
now i tried to match specific expressions inbetween those two XML-tags till the first "<" occurs. Inbetween these tags I need the appeareance of at least 10 "&a0" which results in
&[a-zA-Z0-9]+;
The problem is, that different other character strings (except < ) also may occur inbetween those tags. Is this possible to solve?
I tried with:
<[a-zA-Z0-9]+>([^<]{10,}?)<\/[a-zA-Z0-9]+>
But can now it matches everything and not the wanted &[a-zA-Z0-9]+;
Thanks, guys!
You may use
"([^"\r\n]{10,})"
See the regex demo .
The [^"\\r\\n]{10,}
pattern matches 10 or more occurrences of any char but "
, CR and LF.
Note you may use a greedy limiting (range/interval) quantifier here.
To restrict a generic pattern, a good idea is to check your requirements. If you actually plan to match letters, digits and _
, you may replace the [^"\\r\\n]
negated character class with a \\w
shorthand character class.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.