在python中匹配字符串的开头和结尾与正则表达式

Question

我正在尝试使用python从这个网页中提取可解析的引用信息。 例如，对于列出的页面，我会拉pl / 111/148和pl / 111/152。 我现在的正则表达式列在下面，但似乎在可解析引用后返回所有内容。 它可能很简单，但我对正则表达式相对较新。 提前致谢。

re.findall(r'^parsable-cite=.*>$',page)

Answer 1

我强烈建议使用这个正则表达式来捕获你想要的东西：

re.findall(r'parsable-cite=\\\"(.*?)\\\"\>',page)

说明：

parsable-cite= matches the characters parsable-cite= literally (case sensitive)
  \\ matches the character \ literally
  \" matches the character " literally
  1st Capturing group (.*?)
  .*? matches any character (except newline)
      Quantifier: Between zero and unlimited times, as few times as possible,
           expanding as needed
  \\ matches the character \ literally
  \" matches the character " literally
  \> matches the character > literally

用？ 是关键;）

希望这可以帮助。

Answer 2

让你的正则表达式懒惰：

re.findall(r'^parsable-cite=.*?>$',page)
                              ^

或者使用否定的类（最好）：

re.findall(r'^parsable-cite=[^>]*>$',page)

.*默认情况下是贪婪的，并且会在结束比赛之前尽可能地匹配。

regex101演示

如果您只想获得所需的零件，可以使用捕获组：

re.findall(r'^parsable-cite=([^>]*)>$',page)

regex101演示

但是，从您的网页布局来看，您似乎不需要锚点（ ^和$ ）（除非在网站上以某种方式移除换行符...）

Answer 3

.*你有“贪婪”，这意味着它将尽可能多地匹配，包括任意数量的>字符以及它们之后的任何内容。

如果你真正想要的是“一切都是下一个> ”那么你应该说[^>]*> ，意思是“任意数量的非>字符，然后是> ”。

Answer 4

也许是这样的：

(?<=parsable-cite=\\\")\w{2}\/\d{3}\/\d{3}

http://regex101.com/r/kE9uE3

Answer 5

虽然这是一个json字符串，里面嵌入了html，但你仍然可以使用BeautifulSoup来达到这个目的：

soup = BeautifulSoup(htmls);
tags = soup.findAll("external-xref", {"parsable-cite":re.compile("")})
for t in tags:
    print t['parsable-cite']

Answer 6

如果它在\\"分隔符之间\\"这可能会起作用

 #  \bparsable-cite\s*=\s*\"((?s:(?!\").)*)\"

 \b 
 parsable-cite
 \s* = \s* 
 \"
 (                             # (1 start)
      (?s:
           (?! \" )
           . 
      )*
 )                             # (1 end)
 \"

要不就

 #  (?s)\bparsable-cite\s*=\s*\"(.*?)\"

 (?s)
 \b 
 parsable-cite
 \s* = \s* 
 \"
 ( .*? )                 # (1)
 \"

Answer 7

如果您认为每次都非常相似：

re.findall(r"pl/\d+/\d+", page)

在python中匹配字符串的开头和结尾与正则表达式

问题描述

7 个解决方案

解决方案1
2 已采纳 2014-03-27 21:30:57

解决方案2
1 2014-03-27 21:00:40

解决方案3
1 2014-03-27 21:00:56

解决方案4
1 2014-03-27 21:07:05

解决方案5
1 2014-03-27 21:07:09

解决方案6
1 2014-03-27 21:35:14

解决方案7
1 2014-03-27 21:35:42

在python中匹配字符串的开头和结尾与正则表达式

问题描述

7 个解决方案

解决方案1 2 已采纳 2014-03-27 21:30:57

解决方案2 1 2014-03-27 21:00:40

解决方案3 1 2014-03-27 21:00:56

解决方案4 1 2014-03-27 21:07:05

解决方案5 1 2014-03-27 21:07:09

解决方案6 1 2014-03-27 21:35:14

解决方案7 1 2014-03-27 21:35:42

解决方案1
2 已采纳 2014-03-27 21:30:57

解决方案2
1 2014-03-27 21:00:40

解决方案3
1 2014-03-27 21:00:56

解决方案4
1 2014-03-27 21:07:05

解决方案5
1 2014-03-27 21:07:09

解决方案6
1 2014-03-27 21:35:14

解决方案7
1 2014-03-27 21:35:42