[英]How to use regular expression to retrieve data in python?
I have a string defined as,我有一个字符串定义为,
content = "f(1, 4, 'red', '/color/down1.html');
f(2, 5, 'green', '/color/colorpanel/down2.html');
f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"
Here is the code I tried but it doesn't work:这是我尝试过但不起作用的代码:
results = re.findall(r"f(.*?)", content)
for each in results:
print each
How to use regular expression to retrieve the links within the content?如何使用正则表达式来检索内容中的链接? Thanks.谢谢。
You can learn the basic regexes on https://regex101.com/ and http://regexr.com/您可以在https://regex101.com/和http://regexr.com/上学习基本的正则表达式
In [4]: import re
In [5]: content = "f(1, 4, 'red', '/color/down1.html'); \
...: f(2, 5, 'green', '/color/colorpanel/down2.html'); \
...: f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"
In [6]: p = re.compile(r'(?=/).*?(?<=.html)')
In [7]: p.findall(content)
Out[7]:
['/color/down1.html',
'/color/colorpanel/down2.html',
'/color/colorpanel/colorlibrary/down3.html']
.*? .*? matches any character (except for line匹配任何字符(除了行
*? *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)量词——匹配零次和无限次,尽可能少,按需扩展(懒惰)
You can also just get the last /
你也可以只得到最后一个/
In [8]: p2 = re.compile(r'[^/]*.html')
In [9]: p2.findall(content)
Out[9]: ['down1.html', 'down2.html', 'down3.html']
[^/]* Match a single character not present in the list below [^/]*匹配下面列表中不存在的单个字符
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) * 量词——在零次和无限次之间匹配,尽可能多次,根据需要回馈(贪婪)
/ matches the character / literally (case sensitive) /匹配字符 / 字面意思(区分大小写)
. . matches any character (except for line terminators) html matches the characters html literally (case sensitive).匹配任何字符(行终止符除外) html 按字面意思匹配字符 html(区分大小写)。
Or, you can extract all the data in f()
或者,您可以提取f()
所有数据
In [15]: p3 = re.compile(r"(?=f\().*?(?<=\);)")
In [16]: p3.findall(content)
Out[16]:
["f(1, 4, 'red', '/color/down1.html');",
"f(2, 5, 'green', '/color/colorpanel/down2.html');",
"f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"]
你可以这样做:
re.findall(r"f\(.*,.*,.*, '(.*)'", content)
You can try like so:你可以这样尝试:
import re
content = """f(1, 4, 'red', '/color/down1.html');
f(2, 5, 'green', '/color/colorpanel/down2.html');
f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"""
print re.findall(r"(\/[^']+?)'", content)
Output:输出:
['/color/down1.html', '/color/colorpanel/down2.html', '/color/colorpanel/colorlibrary/down3.html']
Regex:正则表达式:
(\\/[^']+?)'
- match /
followed by 1 or more non '
characters till first occurence of '
and capture in group1. (\\/[^']+?)'
- 匹配/
后跟 1 个或多个非'
字符,直到第一次出现'
并在 group1 中捕获。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.