Python - glob.glob 与 grep？

Question

我对 Python 环境还很陌生，并且正在逐步前进。

我们在一个包含类似信息的文件夹中获得了大约 10,000 个文件，但有一个主要区别。 一些文件包含字符串“string1”，另一组包含“string2”。 只是为了澄清字符串不在文件名中，而是在文件本身中。 文件内容以字符分隔。

我尝试使用 string1 分别创建两个单独的列表 string2 并获得了多行代码但一无所获。 两个列表都应该只包含文件名。

Answer 1

我经常使用grep来处理这类事情。 在这种情况下，我会使用

编辑添加文件扩展名：

grep -l string1 *.txt > string1_files.txt && grep -l string2 *.txt> string2_files.txt

这个 oneliner 将在当前目录中的txt文件中搜索string1 ，将 output 写入string1_files.txt并类似地用于string2

从man grep复制

 -l, --files-with-matches
         Only the names of files containing selected lines are written to
         standard output.  grep will only search a file until a match has
         been found, making searches potentially less expensive.  Path-
         names are listed once per file searched.  If the standard input
         is searched, the string ``(standard input)'' is written.

希望这会有所帮助，但您可能只想 grep 某些文件扩展名

编辑无文件扩展名：（如果它们在问题评论中不可用

grep -l string1 * > string1_files.txt && grep -l string2 *> string2_files.txt

Answer 2

假设您的文件只有您要比较的字符串，您只需要执行

folder = 'foo'
files = glob.glob(os.path.join(folder, "*"))

list1 = []
list2 = []
for file in files:
  with open(file, 'r') as f:
    if(f.readlines().strip() == 'string1'):
      list1.append(file)
    else
      list2.append(file)

如果您的文件有更多数据，您只需要处理f.readlines()并正确比较。

Python - glob.glob 与 grep？

问题描述

2 个解决方案

解决方案1
2 2020-04-27 19:18:25

解决方案2
-1 2020-04-27 19:13:17

Python - glob.glob 与 grep？

问题描述

2 个解决方案

解决方案1 2 2020-04-27 19:18:25

解决方案2 -1 2020-04-27 19:13:17

解决方案1
2 2020-04-27 19:18:25

解决方案2
-1 2020-04-27 19:13:17