[英]How can I fix my re compile statement in Python
I have a text file and I am using re to locate a specific section of text (a list containing water usage in different towns) and putting the information into a pandas dataframe.我有一个文本文件,我正在使用 re 来定位文本的特定部分(包含不同城镇用水情况的列表)并将信息放入熊猫数据框中。 The text list is ordered using letters eg (a), (b), (c) etc. The code works fine and returns all the information I need into the dataframe up until the ordering switches to double letters eg (aa), (ab), (ac) etc.
文本列表使用字母排序,例如 (a)、(b)、(c) 等。代码工作正常并将我需要的所有信息返回到数据框中,直到排序切换为双字母,例如 (aa), (ab ), (ac) 等。
How can I fix my re statement so that it also works for double lettered indexes in the text list?如何修复我的 re 语句,使其也适用于文本列表中的双字母索引?
Here is the code:这是代码:
pattern = regex.compile('\d+ (?=ML\/year)|(?<= in the |the )[\w \/\(\)]+')
columns = ('Water Usage', 'Town')
res = [dict(zip(columns, pattern.findall(line))) for line in finalText.splitlines() if pattern.match(line)]
df = pd.DataFrame(res)
return df
And here is an example of the text:这是文本的示例:
(w) 218 ML/year in the Murrumbidgee I Water Source,
(x) 133 ML/year in the Murrumbidgee II Water Source,
(y) 116 ML/year in the Murrumbidgee III Water Source,
(z) 73 ML/year in the Murrumbidgee North Water Source,
(aa) 476 ML/year in the Murrumbidgee Western Water Source,
(ab) 92 ML/year in the Muttama Water Source,
(ac) 150 ML/year in the Numeralla East Water Source,
As I said, it works for all the rows with single letter indexes but doesn't for double letters.正如我所说,它适用于所有具有单字母索引的行,但不适用于双字母。
You can use https://regex101.com/ or https://regexr.com/ to troubleshoot your regular expression.您可以使用https://regex101.com/或https://regexr.com/对正则表达式进行故障排除。 Here's one that matches the key components.
这是与关键组件匹配的一个。
^\\([^)]+\\)\\s+(\\S+)\\s+(.*\\/year)\\s+in the\\s+(.*),
Python re
module doesn't allow variable width pattern in look behind assertions. Python
re
模块不允许在断言后面查看可变宽度模式。
correcting it, if you had used search()
instead of match()
it would have worked.纠正它,如果您使用
search()
而不是match()
它会起作用。
def create_df(finalText):
pattern = re.compile('\d+ (?=ML\/year)|(?<= in the)[\w \/\(\)]+')
columns = ('Water Usage', 'Town')
res = [dict(zip(columns, pattern.findall(line))) for line in finalText.splitlines() if pattern.search(line)]
df = pd.DataFrame(res)
return df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.