简体   繁体   English

将正则表达式匹配保存在列表中

[英]Saving regex matches on a list

I have a file of accounts that looks like this我有一个看起来像这样的帐户文件


<A0001><$241><div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1535.png"width="64" height="64"></div><1231>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1510.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1403.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1388.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1323.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1322.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1172.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1069.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/0966.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/0796.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1430.png"width="64" height="64"></div>


<A0002><$111><div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1535.png"width="64" height="64"></div><3112>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1510.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1403.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1388.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1323.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1322.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1172.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1069.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/0966.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/0796.png"width="64" height="64"></div>
<div class="parent"><img class="img" title="" src="/static/assets/images/thumb/1430.png"width="64" height="64"></div>
...

As you can see the images aren't a fixed number, they are different from one to another.如您所见,图像不是固定数字,它们彼此不同。 I already have a script that uses regex to find the image's name, but how can I find all the images on the file and save them to a list with every index having all the image names of the specific account.我已经有一个使用正则表达式来查找图像名称的脚本,但是我怎样才能找到文件中的所有图像并将它们保存到一个列表中,其中每个索引都具有特定帐户的所有图像名称。 Like this像这样


List = [

'src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"'

,

'src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"'


,
'src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"
src="resources/images/thumb/1129.png"'

] # And so on


EDITED: I also edited how the file may really look, so the split() method may not work at all, sorry for all the misunderstanding已编辑:我还编辑了文件的实际外观,因此 split() 方法可能根本不起作用,抱歉所有的误解

If I understand you correctly you want to "group" images under specific section.如果我理解正确,您想在特定部分下“分组”图像。 For example:例如:

import re

r1 = re.compile(r"A\d+")
r2 = re.compile(r'src="(.*)"')

out, key = {}, None
with open("your_file.txt", "r") as f_in:
    for line in f_in:
        if r1.match(line):
            key = line.strip()
        elif (m := r2.match(line)) :
            out.setdefault(key, []).append(m.group(1))

print(out)

Prints:印刷:

{
    "A0001": [
        "resources/images/thumb/1634.png",
        "resources/images/thumb/1234.png",
        "resources/images/thumb/1145.png",
        "resources/images/thumb/1243.png",
    ],
    "A0002": [
        "resources/images/thumb/1129.png",
        "resources/images/thumb/1235.png",
    ],
}

EDIT: To get only images:编辑:仅获取图像:

import re

r = re.compile(r'src="(.*)"')

out = []
with open("your_file.txt", "r") as f_in:
    for line in f_in:
        if (m := r.match(line)) :
            out.append(m.group(1))

print(out)

Prints:印刷:

[
    "resources/images/thumb/1634.png",
    "resources/images/thumb/1234.png",
    "resources/images/thumb/1145.png",
    "resources/images/thumb/1243.png",
    "resources/images/thumb/1129.png",
    "resources/images/thumb/1235.png",
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM