简体   繁体   中英

python regex command to extract data excluding comment line

I need to extract data in a data file beginning with the letter "U" or "L" and exclude comment lines beginning with character "/" .

Example:

/data file FLG.dat
UAB-AB      LRD1503     / reminder latches

I used a regex pattern in the python program which results in only capturing the comment lines. I'm only getting comment lines but not the identity beginning with character.

You can use ^([UL].+?)(?:/.*|)$ . Code:

import re

s = """/data file FLG.dat
UAB-AB      LRD1503     / reminder latches
LAB-AB      LRD1503     / reminder latches
SAB-AB      LRD1503     / reminder latches"""
lines = re.findall(r"^([UL].+?)(?:/.*|)$", s, re.MULTILINE)

If you want to delete spaces at the end of string you can use list comprehension with same regular expression:

lines = [match.group(1).strip() for match in re.finditer(r"^([UL].+)/.*$", s, re.MULTILINE)]

OR you can edit regular expression to not include spaces before slash ^([UL].+?)(?:\\s*/.*|)$ :

lines = re.findall(r"^([UL].+?)(?:\s*/.*|)$", s, re.MULTILINE)

In case the comments in your data lines are optional here's a regular expression that covers both types, lines with or without a comment.

The regular expression for that is R"^([UL][^/]*)" (edited, original RE was R"^([UL][^/]*)(/.*)?$" ) The first group is the data you want to extract, the 2nd (optional group) would catch the comment if any.

This example code prints only the 2 valid data lines.

import re

lines=["/data file FLG.dat",
       "UAB-AB      LRD1503     / reminder latches",
       "UAB-AC      LRD1600",
       "MAB-AD      LRD1700     / does not start with U or L"
       ]

datare=re.compile(R"^([UL][^/]*)")

matches = ( match.group(1).strip() for match in ( datare.match(line) for line in lines) if match)

for match in matches:
    print(match)

Note how match.group(1).strip() extracts the first group of your RE and strip() removes any trailing spaces in your match

Also note that you can replace lines in this example with a file handle and it would work the same way

If the matches = line looks too complicated, it's an efficient way for writing this:

for line in lines:
    match = datare.match(line)
    if match:
        print(match.group(1).strip())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM