python regex command to extract data excluding comment line

Question

I need to extract data in a data file beginning with the letter "U" or "L" and exclude comment lines beginning with character "/" .

Example:

/data file FLG.dat
UAB-AB      LRD1503     / reminder latches

I used a regex pattern in the python program which results in only capturing the comment lines. I'm only getting comment lines but not the identity beginning with character.

Answer 1

You can use ^([UL].+?)(?:/.*|)$ . Code:

import re

s = """/data file FLG.dat
UAB-AB      LRD1503     / reminder latches
LAB-AB      LRD1503     / reminder latches
SAB-AB      LRD1503     / reminder latches"""
lines = re.findall(r"^([UL].+?)(?:/.*|)$", s, re.MULTILINE)

If you want to delete spaces at the end of string you can use list comprehension with same regular expression:

lines = [match.group(1).strip() for match in re.finditer(r"^([UL].+)/.*$", s, re.MULTILINE)]

OR you can edit regular expression to not include spaces before slash ^([UL].+?)(?:\\s*/.*|)$ :

lines = re.findall(r"^([UL].+?)(?:\s*/.*|)$", s, re.MULTILINE)

Answer 2

In case the comments in your data lines are optional here's a regular expression that covers both types, lines with or without a comment.

The regular expression for that is R"^([UL][^/]*)" (edited, original RE was R"^([UL][^/]*)(/.*)?$" ) The first group is the data you want to extract, the 2nd (optional group) would catch the comment if any.

This example code prints only the 2 valid data lines.

import re

lines=["/data file FLG.dat",
       "UAB-AB      LRD1503     / reminder latches",
       "UAB-AC      LRD1600",
       "MAB-AD      LRD1700     / does not start with U or L"
       ]

datare=re.compile(R"^([UL][^/]*)")

matches = ( match.group(1).strip() for match in ( datare.match(line) for line in lines) if match)

for match in matches:
    print(match)

Note how match.group(1).strip() extracts the first group of your RE and strip() removes any trailing spaces in your match

Also note that you can replace lines in this example with a file handle and it would work the same way

If the matches = line looks too complicated, it's an efficient way for writing this:

for line in lines:
    match = datare.match(line)
    if match:
        print(match.group(1).strip())

python regex command to extract data excluding comment line

Question

2 answers

solution1
1 2019-08-31 19:37:04

solution2
1 2019-08-31 20:26:09

python regex command to extract data excluding comment line

Question

2 answers

solution1 1 2019-08-31 19:37:04

solution2 1 2019-08-31 20:26:09

solution1
1 2019-08-31 19:37:04

solution2
1 2019-08-31 20:26:09