简体   繁体   中英

RegExp to match visible non-letter characters before line break

I am working on a vbs regexp that will detect a tag which contains text and a CRLF character before closing tag.

I am currently using \\w+[:;?!.,""\\)\\]-~]*(\\s)*(\\r\\n\\s*)(<\\/.*>)

Looking from the end of the expression, I am matching any closing tag, CRLF plus optionally blank spaces, an optional spaces before CRLF and it should optionally match any other visible non-letter character which occurs after any word.

This is to match things like

myword! CRLF</tag>
mywordCRLF</tag>
myword    CRLF</tag>
myword...CRLF     </tag>

etc.

However, I do not want to match below, as I need to detect tags containing TEXT and linebreaks.

</otherclosingtag>   CRLF </tag>

I am concerned about the \\w+[:;?!.,""\\)\\]-~]* bit as it doesn't look right to me, as I would need to insert quite a large number of characters here.

I tried replacing it with \\S, \\W but they all seem to match CRLF characters as well.

Any ideas?

Cheers!

How about using non-greedy modifier:

\w+\W*?\r\n\s*(<\/.*>)

or

\w+[^\r\n]*\r\n\s*(<\/.*>)

The solution that I used:

\\w+[^\\r\\n<>]*(\\r\\n\\s*)(<\\/.*>)

It matches a word (so not ) then anything that is not the CR, LF or > (so it doesn't match openingtag> CRLF</closingtag> )

This is a modified version of what M42 has proposed, I had added <> to make sure we won't match a tag.

Thanks for suggestions!

Try this:

^.*[\n\t\s]*</.*>$ --> BAD

^.*[\r\n\t\s]*</.*>$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM