ValueError：要解压的值太多（预期为 3） - 正则表达式匹配 - Python

Question

在我的Python代码中，我有一个字符串并试图查找字符串是否包含特定模式（名称后跟数字）。 为此，我使用re.match然后groups()它来获得这样的所需结果。

authors_and_year = re.match('(.*)\. (\d{4})\.', line)
texts, authors, year = authors_and_year.groups()

所以如果我有这样的字符串

里贾纳·巴兹莱和莉莲·李。 2004. 赶上潮流：概率内容模型，适用于生成和总结。 在 NAACL-HLT 会议记录中。

它会返回给我这个（如预期的那样）；

('Regina Barzilay and Lillian Lee. 2004.', 'Regina Barzilay and Lillian Lee', '2004')

但在某些情况下，我有这样的字符串；

J.科恩。 1968a。 加权 kappa：标称规模协议，规定规模分歧或部分信用。 第 70 卷，第 213-220 页

或这个;

Ralph Weischedel、Jinxi Xu 和 Ana Licuanan。 1968b。 一种回答传记问题的混合方法。 摘自 Mark Maybury，编辑，问答新方向，第 5 章。AAAI 出版社

哪里 Year 有一个字母表，所以上面的正则表达式在这里失败了。 为了处理这种情况，我试图像这样添加一个新的正则表达式；

authors_and_year = re.match('((.*)\. (\d{4})\.|(.*)\. (\d{4})(a-z){1}\.)', line)
texts, authors, year = authors_and_year.groups()

但它给了我这个错误；

ValueError：要解包的值太多（预期为 3）

当我检查authors_and_year值时，是这样的；

('Regina Barzilay and Lillian Lee. 2004.', 'Regina Barzilay and Lillian Lee', '2004', None, None, None)

我不知道最后 3 个None值是从哪里来的。 谁能指导我在这里做错了什么？

Answer 1

您的正则表达式可以简化为((.*)\.[ ](\d{4})[az]?\.)
这使得年份之后的字母是可选的，同时保留了 3 个捕获组。

Answer 2

这就是小组合作的方式| . None来自第二种选择。 看：

>>> re.match('(foo)|(bar)', 'foo').groups()
('foo', None)
>>> re.match('(foo)|(bar)', 'bar').groups()
(None, 'bar')

您可以过滤掉不匹配的内容：

>>> [group for group in re.match('(foo)|(bar)', 'foo').groups() if group is not None]
['foo']
>>> [group for group in re.match('(foo)|(bar)', 'bar').groups() if group is not None]
['bar']

或者您可以使用命名组：

>>> match = re.match('(?P<first>foo)|(?P<second>bar)', 'foo')
>>> res = match.groupdict()["first"] or match.groupdict()["second"]
>>> res
'foo'
>>> match = re.match('(?P<first>foo)|(?P<second>bar)', 'bar')
>>> res = match.groupdict()["first"] or match.groupdict()["second"]
>>> res
'bar'

如果可能有空匹配（组 = 空字符串），则此代码将不起作用； 你需要做类似的事情

...
res = match.groupdict()["first"]
if res is None:
    res = match.groupdict()["second"]

ValueError：要解压的值太多（预期为 3） - 正则表达式匹配 - Python

问题描述

2 个解决方案

解决方案1
2 已采纳

解决方案2
1 2020-07-08 21:37:28

ValueError：要解压的值太多（预期为 3） - 正则表达式匹配 - Python

问题描述

2 个解决方案

解决方案1 2 已采纳

解决方案2 1 2020-07-08 21:37:28

解决方案1
2 已采纳

解决方案2
1 2020-07-08 21:37:28