简体   繁体   English

从列表中提取匹配的子字符串到 Python 中的新列表

[英]Extract matching substrings from a list to new list in Python

I have a text file that looks like this:我有一个如下所示的文本文件:

garbage
moregarbaged89849843
MDeduri09ri44830
Some short sentence
Whatever ... key: d11001bfa937eee2f84f55a11b207356 (KID=01002d737832455680cffbadf1092baf)
Whatever2 ... key: a0ee2d0f8272355f750c5434db85291a (KID=0101bfa0ab9641a0b863ef76519a48d3)
Whatever3 ... key: fe216ba17e5af807ce5af8e43cf3c031 (KID=0102900a2bc54111833631ea7bb855ed)
77EB0A2C7C42EDC27A3D26E72A02BB29:01002d737832455680cffbadf1092baf status 'garbage'
blah blah:0101bfa0ab9641a0b863ef76519a48d3 has status 'usable'
77EB0A2C7C42EDC27A3D26E72A02BB29:blah blah

I only care about the key and KID parts, and want to extract them to separate lists我只关心 key 和 KID 部分,并想将它们提取到单独的列表中

My regex for that is key: (\w|\d){30,} and KID=(\w|\d){30,} respectively.我的正则表达式是key: (\w|\d){30,}KID=(\w|\d){30,}分别。

Code I'm using is我正在使用的代码是

matchkid = re.compile('KID=(\w|\d){30,}')
matchkey = re.compile('key: (\w|\d){30,}')

filteredkids = [a for a in lis if matchkid.search(a)]
filteredkeys = [b for b in lis if matchkey.search(b)]

print(filteredkids)
print('\n')
print(filteredkeys)

Where lis is a list made from the lines of the text document其中lis是由文本文档的行组成的列表

The output is output 是

['Whatever ... key: d11001bfa937eee2f84f55a11b207356 (KID=01002d737832455680cffbadf1092baf)', 'Whatever2 ... key: a0ee2d0f8272355f750c5434db85291a (KID=0101bfa0ab9641a0b863ef76519a48d3)', 'Whatever3 ... key: fe216ba17e5af807ce5af8e43cf3c031 (KID=0102900a2bc54111833631ea7bb855ed)']


['Whatever ... key: d11001bfa937eee2f84f55a11b207356 (KID=01002d737832455680cffbadf1092baf)', 'Whatever2 ... key: a0ee2d0f8272355f750c5434db85291a (KID=0101bfa0ab9641a0b863ef76519a48d3)', 'Whatever3 ... key: fe216ba17e5af807ce5af8e43cf3c031 (KID=0102900a2bc54111833631ea7bb855ed)']

Which is wrong, the desired output is这是错误的,所需的 output 是

['KID=01002d737832455680cffbadf1092baf', 'KID=0101bfa0ab9641a0b863ef76519a48d3', 'KID=0102900a2bc54111833631ea7bb855ed']

['key: d11001bfa937eee2f84f55a11b207356', 'key: a0ee2d0f8272355f750c5434db85291a', 'key: fe216ba17e5af807ce5af8e43cf3c031']

I have tried tweaking my regex and looking at other similar questions, but nothing seems to work and most of the time I just get empty lists.我曾尝试调整我的正则表达式并查看其他类似的问题,但似乎没有任何效果,而且大多数时候我只是得到空列表。

Hoping to find some guidance here, thanks in advance希望在这里找到一些指导,在此先感谢

The (\w|\d){30,} is not a good pattern as it creates a repeated capturing group, and is redundant itself: \w matches digits, too, so \w{30,} is enough. (\w|\d){30,}不是一个好的模式,因为它创建了一个重复的捕获组,并且本身是多余的: \w也匹配数字,所以\w{30,}就足够了。

Next, you are using re.search that only returns a Match data object, and you use listeneing comprehension to iterate over that object, while you need to grab all matches from your strings.接下来,您将使用仅返回匹配数据 object 的re.search ,并使用侦听理解来迭代该 object,同时您需要从字符串中获取所有匹配项。

You can fix the code by using您可以使用以下方法修复代码

filteredkids = re.findall(r'KID=\w{30,}', text)
filteredkeys = re.findall(r'key: \w{30,}', text)

See the Python demo :请参阅Python 演示

import re
text = """garbage
moregarbaged89849843
MDeduri09ri44830
Some short sentence
Whatever ... key: d11001bfa937eee2f84f55a11b207356 (KID=01002d737832455680cffbadf1092baf)
Whatever2 ... key: a0ee2d0f8272355f750c5434db85291a (KID=0101bfa0ab9641a0b863ef76519a48d3)
Whatever3 ... key: fe216ba17e5af807ce5af8e43cf3c031 (KID=0102900a2bc54111833631ea7bb855ed)
77EB0A2C7C42EDC27A3D26E72A02BB29:01002d737832455680cffbadf1092baf status 'garbage'
blah blah:0101bfa0ab9641a0b863ef76519a48d3 has status 'usable'
77EB0A2C7C42EDC27A3D26E72A02BB29:blah blah"""
filteredkids = re.findall(r'KID=\w{30,}', text)
filteredkeys = re.findall(r'key: \w{30,}', text)
print( filteredkids )
print( filteredkeys )

Output: Output:

['KID=01002d737832455680cffbadf1092baf', 'KID=0101bfa0ab9641a0b863ef76519a48d3', 'KID=0102900a2bc54111833631ea7bb855ed']
['key: d11001bfa937eee2f84f55a11b207356', 'key: a0ee2d0f8272355f750c5434db85291a', 'key: fe216ba17e5af807ce5af8e43cf3c031']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM