So I have 2 files, file1
and file2
, of unequal size and at least a million return separated
lines each. I want to match content from file1
with file2
and if a match exists, remove the same from file1
. Example:
+------------+-----------+--------------------------+
| file1 | file2 | after processing - file1 |
+------------+-----------+--------------------------+
| google.com | in.com | google.com |
+------------+-----------+--------------------------+
| apple.com | quora.com | apple.com |
+------------+-----------+--------------------------+
| me.com | apple.com | |
+------------+-----------+--------------------------+
My code looks viz.
with open(file2) as fin:
exclude = set(line.rstrip() for line in fin)
for line in fileinput.input(file1, inplace=True):
if line.rstrip() not in exclude:
print
line,
Which just deletes all contents of file1
. How can I fix that? Thanks.
Your print
statement and its argument are on separate lines. Do print line,
instead.
If the working memory is not a problem, I'd suggest a crude solution - load up file2
and then iterate over the file1
writing down the matching lines:
import os
import shutil
FILE1 = "file1" # path to file1
FILE2 = "file2" # path to file2
# first load up FILE2 in the memory
with open(FILE2, "r") as f: # open FILE2 for reading
file2_lines = {line.rstrip() for line in f} # use a set for FILE2 for fast matching
# open FILE1 for reading and a FILE1.tmp file for writing
with open(FILE1, "r") as f_in, open(FILE1 + ".tmp", "w") as f_out:
for line in f_in: # loop through the FILE1 lines
if line.rstrip() in file2_lines: # match found, write to a temporary file
f_out.write(line)
# finally, overwrite the FILE1 with temporary FILE1.tmp
os.remove(FILE1)
shutil.move(FILE1 + ".tmp", FILE1)
EDIT : Apparently, fileinput.input()
is doing pretty much the same so your problem was indeed a typo. Oh well, leaving the answer for posterity as this gives you more control over the whole process.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.