[英]Use regex to extract number before a list of words in pandas dataframe
I want to extract only the numbers before a list of specific words.我只想提取特定单词列表之前的数字。 Then put the extracted numbers in a new column.
然后将提取的数字放入新列中。
The list of words is: l = ["car", "truck", "van"]
.单词列表是:
l = ["car", "truck", "van"]
。 I only put singular form here, but it should also apply to plural.我在这里只放了单数形式,但它也应该适用于复数形式。
df = pd.DataFrame(columns=["description"], data=[["have 3 cars"], ["a 1-car situation"], ["may be 2 trucks"]])
We can call the new column for extracted number df["extracted_num"]
我们可以将提取数字的新列称为
df["extracted_num"]
Thank you!谢谢!
You can use Series.str.extract
您可以使用
Series.str.extract
l = ["car", "truck", "van"]
pat = f"(\d+)[\s-](?:{'|'.join(l)})"
df['extracted_num'] = df['description'].str.extract(pat)
Output: Output:
>>> print(pat)
(\d+)[\s-](?:car|truck|van)
>>> df
description extracted_num
0 have 3 cars 3
1 a 1-car situation 1
2 may be 2 trucks 2
Explanation:解释:
(\d+)
- Matches one or more digits and captures the group; (\d+)
- 匹配一个或多个数字并捕获组;[\s-]
- Matches a single space or hyphen; [\s-]
- 匹配单个空格或连字符;(?:{'|'.join(l)})"
- Matches any word from the list l
without capturing it. (?:{'|'.join(l)})"
- 匹配列表l
中的任何单词而不捕获它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.