简体   繁体   中英

Regex in Notepad++ to select on string length between specific XML tags

I'm working with Emergency Services data in the NEMSIS XSD. I have a field, which is constrained to only 50 characters. I've searched this site extensively, and tried many solutions - Notepad++ rejects all of them, saying not found.

Here's an XML Sample:

<E09>
        <E09_01>-5</E09_01>
        <E09_02>-5</E09_02>
        <E09_03>-5</E09_03>
        <E09_04>-5</E09_04>
        <E09_05>this one is too long Non-Emergency - PT IS BEING DISCHARGED FROM H AFTER BEING ADMITTED FOR FAILURE TO THRIVE AND ALCOHOL WITHDRAWAL</E09_05>
</E09>
<E09>
        <E09_01>-5</E09_01>
        <E09_02>-5</E09_02>
        <E09_03>-5</E09_03>
        <E09_04>-5</E09_04>
        <E09_05>this one is is okay</E09_05>
</E09>

I've tried solutions naming the E09_05 tag in different ways, using <\\/E09_05> for the closing tag as I've seen in some examples, and as just </E09_05> as I've seen in others. I've tried ^.{50,}$ between them, or [a-zA-Z]{50,}$ between them, I've tried wrapping those in-between expressions in () and without. I even tried just [\\s\\S]*? in between the tags. The only thing that Notepad++ finds is when I use ^.{50,}$ by itself with no XML tags ... but then I wind up hitting on all the E13_01 tags (which are EMS narratives, and always > 50 characters) -- making for painstaking and wrist-aching clicks.

I wanted to XSLT this, but there is too much individual, hands on tweeking of each E09_05 field for automating it. Perl is not an option in this environment (and not a tool I know at all anyway).

To be truly sublime, both E09_05 and E09_08 fields with string lengths >50 need to be what is selected on the search ... but no other elements of any kind or length.

Thanks in advance. I'm sure I'm just missing some subtle \\ , or () or [] somewhere ... hopefully ...

The following regex will find the text content of <E09_05> elements with more than 50 characters.

(?<=<E09_05>).{51,}?(?=</E09_05>)

Explanation

(?<=<E09_05>)     Start matching right after <E09_05>

.{51,}?           Match 51 or more characters (in a single line)
                  The ? makes it reluctant, so it'll stop at first </E09_05>

(?=</E09_05>)     Stop matching right before </E09_05>

For truly sublime matching, ie both E09_05 and E09_08 fields with string lengths >50, use:

(?<=<(E09_0[58])>).{51,}?(?=</\\1>)

Explanation

<(E09_0[58])>     Match <E09_05> or <E09_08>, and capture the name as group 1

</\1>             Use \1 backreference to match name inside </name>

If you want to shorten the text with ellipsis at the end, eg Hello World with max length 8 becomes Hello... , use:

Find what: (?<=<(E09_0[58])>)(.{47}).{4,}(?=</\\1>)
Replace with: \\2...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM