I am fairly new to the Python environment and gradually working my way forward.
We got about 10,000 files in a folder containing similar information, but with one major difference. Some files contain a string 'string1' and the other set contains 'string2'. Just to clarify the string is not in the filename but in the file itself. The file content is character-delimited.
I tried to create two separate lists with string1 respectively string2 and got various lines of code but getting nowhere. Both list should only contain the filename.
I often use grep
for those kind of things. In this case I would use
Edited to add file extensions:
grep -l string1 *.txt > string1_files.txt && grep -l string2 *.txt> string2_files.txt
This oneliner would search string1
in txt
files in the current dir, writing output to string1_files.txt
and similarly for string2
copying from man grep
-l, --files-with-matches
Only the names of files containing selected lines are written to
standard output. grep will only search a file until a match has
been found, making searches potentially less expensive. Path-
names are listed once per file searched. If the standard input
is searched, the string ``(standard input)'' is written.
Hope this helps a bit but you might want to grep only certain file extensions
Edit for no file extensions: (in case they are not available as in the question comments
grep -l string1 * > string1_files.txt && grep -l string2 *> string2_files.txt
Assuming your file just have the string that you want to compare, you just need to do
folder = 'foo'
files = glob.glob(os.path.join(folder, "*"))
list1 = []
list2 = []
for file in files:
with open(file, 'r') as f:
if(f.readlines().strip() == 'string1'):
list1.append(file)
else
list2.append(file)
If your files have more data, you just need to process f.readlines()
and compare properly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.