使用正则表达式搜索 Python 文件

Question

I have a file that has many lines.我有一个有很多行的文件。 Each line starts with {"id": followed by the id number in quotes.每行以 {"id": 开头，后跟引号中的 ID 号。 (ie {"id": "106"). （即{“id”：“106”）。 I am trying to use regex to search the whole document line by line and print the lines that match 5 different id values.我正在尝试使用正则表达式逐行搜索整个文档并打印匹配 5 个不同 id 值的行。 To do this I made a list with the ids and want to iterate through the list only matching lines that start with {"id": "(id number from list)".为此，我创建了一个带有 id 的列表，并且只想遍历列表中以 {"id": "(id number from list)" 开头的匹配行。 I am really confused on how to do this.我真的很困惑如何做到这一点。 Here is what I have so far:这是我到目前为止所拥有的：

f= "bdata.txt"    
statids = ["85", "106", "140", "172" , "337"] 
x= re.findall('{"id":', statids, 'f')
for line in open(file):
            print(x)

The error code I keep getting is: TypeError: unsupported operand type(s) for &: 'str' and 'int'我不断收到的错误代码是：TypeError: 不支持的操作数类型 &: 'str' 和 'int'

I need to whole line to be matched so I can split it and put it into a class.我需要整行进行匹配，以便我可以将其拆分并将其放入一个类中。

Any advice?有什么建议吗？ Thanks for your time.谢谢你的时间。

Answer 1

You can retrieve the id from the line using the regex , ^\\{\\"id\\": \\"(\\d+)\\" where the value of group#1 will give you the id.您可以使用正则表达式^\\{\\"id\\": \\"(\\d+)\\"从行中检索 id，其中 group#1 的值将为您提供 id。 Then, you can check if the id is present in statids .然后，您可以检查statids是否存在该 id。

Demo:演示：

import re

statids = ["85", "106", "140", "172", "337"]

with open("bdata.txt") as file:
    for line in file:
        search = re.search('^\{\"id\": \"(\d+)\"', line)
        if search:
            id = search.group(1)
            if id in statids:
                print(line.rstrip())

For the following sample content in the file:对于文件中的以下示例内容：

{"id": "100" hello
{"id": "106" world
{"id": "2" hi
{"id": "85" bye
{"id": "10" ok
{"id": "140" good
{"id": "165" fine
{"id": "172" great
{"id": "337" morning
{"id": "16" evening

the output will be:输出将是：

{"id": "106" world
{"id": "85" bye
{"id": "140" good
{"id": "172" great
{"id": "337" morning

Answer 2

I the issue here is the way you're using re.findall, according to the docs you have to pass a regular expression as the first argument and the string that you want to match the expression to as the second argument.我这里的问题是您使用 re.findall 的方式，根据文档，您必须将正则表达式作为第一个参数传递，并将要与表达式匹配的字符串作为第二个参数传递。 In your case I think this is how you should do it:在您的情况下，我认为您应该这样做：

pattern = f'id: ({"|".join(statsids)})'
with open(f) as file:
  for line in file:
      match = re.findall(pattern, line)
      print(match.group(0))

in the regex the pipe operator "|"在正则表达式中管道运算符“|” works same as or so by joining all the ids as an string with |通过将所有 id 作为字符串加入 | in between them will find all the cases where it matches one id or the other.在它们之间将找到它匹配一个或另一个 ID 的所有情况。 the match.group line returns where it was found. match.group 行返回找到它的位置。

使用正则表达式搜索 Python 文件

问题描述

2 个解决方案

解决方案1
0 2021-10-19 21:08:33

解决方案2
0 2021-10-19 21:17:39

使用正则表达式搜索 Python 文件

问题描述

2 个解决方案

解决方案1 0 2021-10-19 21:08:33

解决方案2 0 2021-10-19 21:17:39

解决方案1
0 2021-10-19 21:08:33

解决方案2
0 2021-10-19 21:17:39