如何使用正则表达式在python中检索数据？

Question

I have a string defined as,我有一个字符串定义为，

content = "f(1, 4, 'red', '/color/down1.html');    
f(2, 5, 'green', '/color/colorpanel/down2.html');    
f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"

Here is the code I tried but it doesn't work:这是我尝试过但不起作用的代码：

results = re.findall(r"f(.*?)", content)
for each in results:
    print each

How to use regular expression to retrieve the links within the content?如何使用正则表达式来检索内容中的链接？ Thanks.谢谢。

Answer 1

You can learn the basic regexes on https://regex101.com/ and http://regexr.com/您可以在https://regex101.com/和http://regexr.com/上学习基本的正则表达式

In [4]: import re

In [5]: content = "f(1, 4, 'red', '/color/down1.html');    \
   ...: f(2, 5, 'green', '/color/colorpanel/down2.html');   \
   ...: f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"

In [6]: p = re.compile(r'(?=/).*?(?<=.html)')

In [7]: p.findall(content)
Out[7]: 
['/color/down1.html',
 '/color/colorpanel/down2.html',
 '/color/colorpanel/colorlibrary/down3.html']

.*? .*? matches any character (except for line匹配任何字符（除了行

*? *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)量词——匹配零次和无限次，尽可能少，按需扩展（懒惰）

You can also just get the last /你也可以只得到最后一个/

In [8]: p2 = re.compile(r'[^/]*.html')

In [9]: p2.findall(content)
Out[9]: ['down1.html', 'down2.html', 'down3.html']

[^/]* Match a single character not present in the list below [^/]*匹配下面列表中不存在的单个字符

* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) * 量词——在零次和无限次之间匹配，尽可能多次，根据需要回馈（贪婪）

/ matches the character / literally (case sensitive) /匹配字符 / 字面意思（区分大小写）

. . matches any character (except for line terminators) html matches the characters html literally (case sensitive).匹配任何字符（行终止符除外） html 按字面意思匹配字符 html（区分大小写）。

Or, you can extract all the data in f()或者，您可以提取f()所有数据

In [15]: p3 = re.compile(r"(?=f\().*?(?<=\);)")

In [16]: p3.findall(content)
Out[16]: 
["f(1, 4, 'red', '/color/down1.html');",
 "f(2, 5, 'green', '/color/colorpanel/down2.html');",
 "f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"]

Answer 2

你可以这样做：

re.findall(r"f\(.*,.*,.*, '(.*)'", content)

Answer 3

You can try like so:你可以这样尝试：

import re

content = """f(1, 4, 'red', '/color/down1.html');    
    f(2, 5, 'green', '/color/colorpanel/down2.html');    
    f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"""

print re.findall(r"(\/[^']+?)'", content)

Output:输出：

['/color/down1.html', '/color/colorpanel/down2.html', '/color/colorpanel/colorlibrary/down3.html']

Regex:正则表达式：

(\\/[^']+?)' - match / followed by 1 or more non ' characters till first occurence of ' and capture in group1. (\\/[^']+?)' - 匹配/后跟 1 个或多个非'字符，直到第一次出现'并在 group1 中捕获。

如何使用正则表达式在python中检索数据？

问题描述

3 个解决方案

解决方案1
1 已采纳 2017-02-11 08:20:34

解决方案2
0 2017-02-11 08:30:46

解决方案3
0 2017-02-11 09:57:26

如何使用正则表达式在python中检索数据？

问题描述

3 个解决方案

解决方案1 1 已采纳 2017-02-11 08:20:34

解决方案2 0 2017-02-11 08:30:46

解决方案3 0 2017-02-11 09:57:26

解决方案1
1 已采纳 2017-02-11 08:20:34

解决方案2
0 2017-02-11 08:30:46

解决方案3
0 2017-02-11 09:57:26