Suppose I have a data file:
# cat 1.txt
#$$!#@#VM - This is VM$^#^#$^$^
%#%$%^SAS - This is SAS&%^#$^$
!@#!@%^$^MD - This is MD!@$!@%$
Now I want to filter the words starting with VM and SAS (excluding MD)
Expected results:
VM - This is VM
SAS - This is SAS
I am using this code but all lines are shown.
import re
f = open("1.txt", "r")
for line in f:
p = re.match(r'.+?((SAS|VM)[-a-zA-Z0-9 ]+).+?', line)
if p:
print (p.groups()[0])
In regular expression, I can use (pattern1|pattern2) to match either pattern1 or pattern2 But in re.match, parenthesis is used for matching the pattern.
How to specify "Either Match" in re.match() function?
This is one approach.
Ex:
import re
with open(filename) as infile:
for line in infile:
line = re.sub(r"[^A-Za-z\-\s]", "", line.strip())
if line.startswith(("VM", "SAS")):
print(line)
Output:
VM - This is VM
SAS - This is SAS
Try it like this:
with open('1.txt') as f:
for line in f:
extract = re.match('.+?((SAS|VM)[-a-zA-Z0-9 ]+).+?', line)
if extract:
print(extract.group(1))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.