如何在 Python 中修复我的重新编译语句

Question

I have a text file and I am using re to locate a specific section of text (a list containing water usage in different towns) and putting the information into a pandas dataframe.我有一个文本文件，我正在使用 re 来定位文本的特定部分（包含不同城镇用水情况的列表）并将信息放入熊猫数据框中。 The text list is ordered using letters eg (a), (b), (c) etc. The code works fine and returns all the information I need into the dataframe up until the ordering switches to double letters eg (aa), (ab), (ac) etc.文本列表使用字母排序，例如 (a)、(b)、(c) 等。代码工作正常并将我需要的所有信息返回到数据框中，直到排序切换为双字母，例如 (aa), (ab ), (ac) 等。

How can I fix my re statement so that it also works for double lettered indexes in the text list?如何修复我的 re 语句，使其也适用于文本列表中的双字母索引？

Here is the code:这是代码：

pattern = regex.compile('\d+ (?=ML\/year)|(?<= in the |the )[\w \/\(\)]+')
    columns = ('Water Usage', 'Town')

    res = [dict(zip(columns, pattern.findall(line))) for line in finalText.splitlines() if pattern.match(line)]
    df = pd.DataFrame(res)

    return df

And here is an example of the text:这是文本的示例：

(w) 218 ML/year in the Murrumbidgee I Water Source,
(x) 133 ML/year in the Murrumbidgee II Water Source,
(y) 116 ML/year in the Murrumbidgee III Water Source,
(z) 73 ML/year in the Murrumbidgee North Water Source,
(aa) 476 ML/year in the Murrumbidgee Western Water Source,
(ab) 92 ML/year in the Muttama Water Source,
(ac) 150 ML/year in the Numeralla East Water Source,

As I said, it works for all the rows with single letter indexes but doesn't for double letters.正如我所说，它适用于所有具有单字母索引的行，但不适用于双字母。

Answer 1

You can use https://regex101.com/ or https://regexr.com/ to troubleshoot your regular expression.您可以使用https://regex101.com/或https://regexr.com/对正则表达式进行故障排除。 Here's one that matches the key components.这是与关键组件匹配的一个。

^\\([^)]+\\)\\s+(\\S+)\\s+(.*\\/year)\\s+in the\\s+(.*),

Answer 2

Python re module doesn't allow variable width pattern in look behind assertions. Python re模块不允许在断言后面查看可变宽度模式。
correcting it, if you had used search() instead of match() it would have worked.纠正它，如果您使用search()而不是match()它会起作用。

def create_df(finalText):
    pattern = re.compile('\d+ (?=ML\/year)|(?<= in the)[\w \/\(\)]+')
    columns = ('Water Usage', 'Town')
    res = [dict(zip(columns, pattern.findall(line))) for line in finalText.splitlines() if pattern.search(line)]
    df = pd.DataFrame(res)
    return df

如何在 Python 中修复我的重新编译语句

问题描述

2 个解决方案

解决方案1
0 2020-09-08 02:38:04

解决方案2
0 2020-09-28 10:19:05

如何在 Python 中修复我的重新编译语句

问题描述

2 个解决方案

解决方案1 0 2020-09-08 02:38:04

解决方案2 0 2020-09-28 10:19:05

解决方案1
0 2020-09-08 02:38:04

解决方案2
0 2020-09-28 10:19:05