使用正则表达式在 pandas dataframe 中的单词列表之前提取数字

Question

I want to extract only the numbers before a list of specific words.我只想提取特定单词列表之前的数字。 Then put the extracted numbers in a new column.然后将提取的数字放入新列中。

The list of words is: l = ["car", "truck", "van"] .单词列表是： l = ["car", "truck", "van"] 。 I only put singular form here, but it should also apply to plural.我在这里只放了单数形式，但它也应该适用于复数形式。

df = pd.DataFrame(columns=["description"], data=[["have 3 cars"], ["a 1-car situation"], ["may be 2 trucks"]])

We can call the new column for extracted number df["extracted_num"]我们可以将提取数字的新列称为df["extracted_num"]

Thank you!谢谢！

Answer 1

You can use Series.str.extract您可以使用Series.str.extract

l = ["car", "truck", "van"]

pat = f"(\d+)[\s-](?:{'|'.join(l)})"
df['extracted_num'] = df['description'].str.extract(pat)

Output: Output：

>>> print(pat)
(\d+)[\s-](?:car|truck|van)

>>> df

         description extracted_num
0        have 3 cars             3
1  a 1-car situation             1
2    may be 2 trucks             2

Explanation:解释：

(\d+) - Matches one or more digits and captures the group; (\d+) - 匹配一个或多个数字并捕获组；
[\s-] - Matches a single space or hyphen; [\s-] - 匹配单个空格或连字符；
(?:{'|'.join(l)})" - Matches any word from the list l without capturing it. (?:{'|'.join(l)})" - 匹配列表l中的任何单词而不捕获它。

使用正则表达式在 pandas dataframe 中的单词列表之前提取数字

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-08-09 22:20:32

使用正则表达式在 pandas dataframe 中的单词列表之前提取数字

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-08-09 22:20:32

解决方案1
2 已采纳 2022-08-09 22:20:32