I need to extract data in a data file beginning with the letter "U" or "L" and exclude comment lines beginning with character "/" .
Example:
/data file FLG.dat
UAB-AB LRD1503 / reminder latches
I used a regex pattern in the python program which results in only capturing the comment lines. I'm only getting comment lines but not the identity beginning with character.
You can use ^([UL].+?)(?:/.*|)$
. Code:
import re
s = """/data file FLG.dat
UAB-AB LRD1503 / reminder latches
LAB-AB LRD1503 / reminder latches
SAB-AB LRD1503 / reminder latches"""
lines = re.findall(r"^([UL].+?)(?:/.*|)$", s, re.MULTILINE)
If you want to delete spaces at the end of string you can use list comprehension with same regular expression:
lines = [match.group(1).strip() for match in re.finditer(r"^([UL].+)/.*$", s, re.MULTILINE)]
OR you can edit regular expression to not include spaces before slash ^([UL].+?)(?:\\s*/.*|)$
:
lines = re.findall(r"^([UL].+?)(?:\s*/.*|)$", s, re.MULTILINE)
In case the comments in your data lines are optional here's a regular expression that covers both types, lines with or without a comment.
The regular expression for that is R"^([UL][^/]*)"
(edited, original RE was R"^([UL][^/]*)(/.*)?$"
) The first group is the data you want to extract, the 2nd (optional group) would catch the comment if any.
This example code prints only the 2 valid data lines.
import re
lines=["/data file FLG.dat",
"UAB-AB LRD1503 / reminder latches",
"UAB-AC LRD1600",
"MAB-AD LRD1700 / does not start with U or L"
]
datare=re.compile(R"^([UL][^/]*)")
matches = ( match.group(1).strip() for match in ( datare.match(line) for line in lines) if match)
for match in matches:
print(match)
Note how match.group(1).strip()
extracts the first group of your RE and strip() removes any trailing spaces in your match
Also note that you can replace lines
in this example with a file handle and it would work the same way
If the matches =
line looks too complicated, it's an efficient way for writing this:
for line in lines:
match = datare.match(line)
if match:
print(match.group(1).strip())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.