[英]Python - glob.glob with grep?
I am fairly new to the Python environment and gradually working my way forward.我对 Python 环境还很陌生,并且正在逐步前进。
We got about 10,000 files in a folder containing similar information, but with one major difference.我们在一个包含类似信息的文件夹中获得了大约 10,000 个文件,但有一个主要区别。 Some files contain a string 'string1' and the other set contains 'string2'.一些文件包含字符串“string1”,另一组包含“string2”。 Just to clarify the string is not in the filename but in the file itself.只是为了澄清字符串不在文件名中,而是在文件本身中。 The file content is character-delimited.文件内容以字符分隔。
I tried to create two separate lists with string1 respectively string2 and got various lines of code but getting nowhere.我尝试使用 string1 分别创建两个单独的列表 string2 并获得了多行代码但一无所获。 Both list should only contain the filename.两个列表都应该只包含文件名。
I often use grep
for those kind of things.我经常使用grep
来处理这类事情。 In this case I would use在这种情况下,我会使用
Edited to add file extensions:编辑添加文件扩展名:
grep -l string1 *.txt > string1_files.txt && grep -l string2 *.txt> string2_files.txt
This oneliner would search string1
in txt
files in the current dir, writing output to string1_files.txt
and similarly for string2
这个 oneliner 将在当前目录中的txt
文件中搜索string1
,将 output 写入string1_files.txt
并类似地用于string2
copying from man grep
从man grep
复制
-l, --files-with-matches
Only the names of files containing selected lines are written to
standard output. grep will only search a file until a match has
been found, making searches potentially less expensive. Path-
names are listed once per file searched. If the standard input
is searched, the string ``(standard input)'' is written.
Hope this helps a bit but you might want to grep only certain file extensions希望这会有所帮助,但您可能只想 grep 某些文件扩展名
Edit for no file extensions: (in case they are not available as in the question comments编辑无文件扩展名:(如果它们在问题评论中不可用
grep -l string1 * > string1_files.txt && grep -l string2 *> string2_files.txt
Assuming your file just have the string that you want to compare, you just need to do假设您的文件只有您要比较的字符串,您只需要执行
folder = 'foo'
files = glob.glob(os.path.join(folder, "*"))
list1 = []
list2 = []
for file in files:
with open(file, 'r') as f:
if(f.readlines().strip() == 'string1'):
list1.append(file)
else
list2.append(file)
If your files have more data, you just need to process f.readlines()
and compare properly.如果您的文件有更多数据,您只需要处理f.readlines()
并正确比较。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.