[英]how to return matched keywords in the pandas str.contains using regex parameter?
This is my sample code:这是我的示例代码:
import pandas as pd
df = pd.DataFrame({'A':
['btcrr',
'You have crypto here',
'coinbase.com was there ',
'hotwalletint']
})
regex = r"(^|\W)(?:btc|crypto|coinbase|hotwallet)[^A-Za-z0-9]"
tagged_df = df[df['A'].str.contains(regex, na=False, regex=True, case=False)]
The output of tagged_df
: tagged_df
的 output :
A
1 You have crypto here
2 coinbase.com was there
In this case, this will return only if it matches the regex that I gave.在这种情况下,只有当它与我给出的正则表达式匹配时才会返回。 But I want the pandas to return the matched keyword.
但我希望 pandas 返回匹配的关键字。 I am expecting something like this to return in
tagged_df
我期待这样的东西会在
tagged_df
中返回
The Expected output of tagged_df
: tagged_df
的预期 output :
A
1 crypto
2 coinbase.com
If pandas do not have the ability, Please suggest alternates that can solve this case.如果 pandas 没有能力,请建议可以解决这种情况的替代方案。
Use pandas.Series.str.extract()
.使用
pandas.Series.str.extract()
。 For each capture group in the regular expession (a non-capture group is just a group with ?:
at the beginning, eg (?:abc)
), a new colum will be created containing the matched value for that group, for that row.对于正则表达式中的每个捕获组(非捕获组只是一个以
?:
开头的组,例如(?:abc)
),将为该行创建一个包含该组匹配值的新列. You can also Add ?P<your_name>
to the very beginning of a capture group to name the outputted column associated with that group:您还可以将
?P<your_name>
添加到捕获组的开头,以命名与该组关联的输出列:
new_df = df['A'].str.extract(r'(?:^|\W)(?P<A>btc|crypto|coinbase|hotwallet)[^A-Za-z0-9]')
Output: Output:
>>> new_df
A
0 NaN
1 crypto
2 coinbase
3 NaN
>>> new_df.dropna()
A
1 crypto
2 coinbase
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.