[英]Python file search using regex
I have a file that has many lines.我有一个有很多行的文件。 Each line starts with {"id": followed by the id number in quotes.
每行以 {"id": 开头,后跟引号中的 ID 号。 (ie {"id": "106").
(即{“id”:“106”)。 I am trying to use regex to search the whole document line by line and print the lines that match 5 different id values.
我正在尝试使用正则表达式逐行搜索整个文档并打印匹配 5 个不同 id 值的行。 To do this I made a list with the ids and want to iterate through the list only matching lines that start with {"id": "(id number from list)".
为此,我创建了一个带有 id 的列表,并且只想遍历列表中以 {"id": "(id number from list)" 开头的匹配行。 I am really confused on how to do this.
我真的很困惑如何做到这一点。 Here is what I have so far:
这是我到目前为止所拥有的:
f= "bdata.txt"
statids = ["85", "106", "140", "172" , "337"]
x= re.findall('{"id":', statids, 'f')
for line in open(file):
print(x)
The error code I keep getting is: TypeError: unsupported operand type(s) for &: 'str' and 'int'我不断收到的错误代码是:TypeError: 不支持的操作数类型 &: 'str' 和 'int'
I need to whole line to be matched so I can split it and put it into a class.我需要整行进行匹配,以便我可以将其拆分并将其放入一个类中。
Any advice?有什么建议吗? Thanks for your time.
谢谢你的时间。
You can retrieve the id from the line using the regex , ^\\{\\"id\\": \\"(\\d+)\\"
where the value of group#1 will give you the id.您可以使用正则表达式
^\\{\\"id\\": \\"(\\d+)\\"
从行中检索 id,其中 group#1 的值将为您提供 id。 Then, you can check if the id is present in statids
.然后,您可以检查
statids
是否存在该 id。
Demo:演示:
import re
statids = ["85", "106", "140", "172", "337"]
with open("bdata.txt") as file:
for line in file:
search = re.search('^\{\"id\": \"(\d+)\"', line)
if search:
id = search.group(1)
if id in statids:
print(line.rstrip())
For the following sample content in the file:对于文件中的以下示例内容:
{"id": "100" hello
{"id": "106" world
{"id": "2" hi
{"id": "85" bye
{"id": "10" ok
{"id": "140" good
{"id": "165" fine
{"id": "172" great
{"id": "337" morning
{"id": "16" evening
the output will be:输出将是:
{"id": "106" world
{"id": "85" bye
{"id": "140" good
{"id": "172" great
{"id": "337" morning
I the issue here is the way you're using re.findall, according to the docs you have to pass a regular expression as the first argument and the string that you want to match the expression to as the second argument.我这里的问题是您使用 re.findall 的方式,根据文档,您必须将正则表达式作为第一个参数传递,并将要与表达式匹配的字符串作为第二个参数传递。 In your case I think this is how you should do it:
在您的情况下,我认为您应该这样做:
pattern = f'id: ({"|".join(statsids)})'
with open(f) as file:
for line in file:
match = re.findall(pattern, line)
print(match.group(0))
in the regex the pipe operator "|"在正则表达式中管道运算符“|” works same as or so by joining all the ids as an string with |
通过将所有 id 作为字符串加入 | in between them will find all the cases where it matches one id or the other.
在它们之间将找到它匹配一个或另一个 ID 的所有情况。 the match.group line returns where it was found.
match.group 行返回找到它的位置。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.