简体   繁体   中英

Need a little help on this regular expression

I have a Java string which looks like this, it is actually an XML tag:

"article-idref="527710" group="no" height="267" href="pc011018.pct" id="pc011018" idref="169419" print-rights="yes" product="wborc" rights="licensed" type="photo" width="322" "

Now I want to remove the article-idref="52770" segment by using regular expression, I came up with the following one:

trimedString.replaceAll("\\article-idref=.*?\"","");

but it doesn't seem to work, could anybody give me an idea on where I got wrong in my regular expression? I need this to be represented as a String in my Java class, so probably HTMLParser won't help me a lot here. Thanks in advance!

尝试这个:

trimedString.replaceAll("article-idref=\"[^\"]*\" *","");

I corrected the regular expression by adding quotes and a word boundary (to prevent false matches). Also, in case you didn't, remember to reassign to your string after the replacement:

trimmedString = trimmedString.replaceAll("\\barticle-idref=\".*?\"", "");

See it working at ideone .

Also since this is from an XML document it might be better to use an XML parser to extract the correct attributes instead of a regular expression. This is because XML is quite a complex data format to parse correctly. The example in your question is simple enough. However a regular expression could break on a more complex case, such as a document that includes XML comments. This could be an issue if you are reading data from an untrusted source.

if you are sure the article-idref is allways at the beginning try this:

// removes everything from the beginning to the first whitespace
trimedString = trimedString.replaceFirst("^\\s","");

Be sure to assign the result to trimedString again, since replace does not midify the string itself but returns another string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM