Similar question to alternative of "grep" in python; but the complexity here is what is grepped is variable (lines) from another file. I cannot figure out how to do this using functions like re.findall()
file1:
1 20 200
1 30 300
file2:
1 20 200 0.1 0.5
1 20 200 0.3 0.1
1 30 300 0.2 0.6
1 40 400 0.9 0.6
2 50 300 0.5 0.7
Each line from file1 is my pattern; and I need to search such pattern from file2. Then result should be:
1 20 200 0.1 0.5
1 20 200 0.3 0.1
1 30 300 0.2 0.6
I've been trying to solve the problem using either bash or python,but cannot figure out. thx
Here's a non-regex based solution:
with open('/tmp/file1') as f:
lines1 = f.readlines()
with open('/tmp/file2') as f:
for line in f:
if any(line.startswith(x.strip()) for x in lines1):
print line,
You can take advantage of the fact the the |
character in a regular expression means to match either the pattern on its left, or the pattern on its right:
import re
with open('file1') as file1:
patterns = "|".join(re.escape(line.rstrip()) for line in file1)
regexp = re.compile(patterns)
with open('file2') as file2:
for line in file2:
if regexp.search(line):
print line.rstrip()
When I tried this on your sample files, it output:
1 20 200 0.1 0.5
1 20 200 0.3 0.1
1 30 300 0.2 0.6
Incidentally, if you want to solve this problem in bash, the following should do it:
grep -f file1 file2
I think you'll need your own loop
file1patterns = [ re.Pattern(l) for l in f1.readlines() ]
lineToMatch = 0
matchedLines = []
for line in f2.readlines():
if file1patterns[lineToMatch].matches(line):
matchedLines += line
lineToMatch += 1
else:
lineToMatch = 0
matchedLines = []
if len(matchedLines) == len(file1patterns)
print matchedLines
lineToMatch = 0
matchedLines = []
(Not actual compiling Python, but hopefully enough for you to move forward)
Step 1: Read in all lines from file 1, split them and add them as tuples to a set. This will help us to do faster lookups in the next step.
with open('file1', 'r') as f:
file1_lines = set([tuple(line.strip().split()) for line in f])
Step 2: Filter lines from file2 that meet your criteria ie if they start with any of the lines in file1:
with open('file2', 'r') as f2:
for line in itertools.ifilter(lambda x: tuple(x.split()[:3]) in file1_lines, f2):
print line
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.