[英]Pandas DataFrame string replace followed by split and set intersection
I have following pandas
DataFrame
我有以下
pandas
DataFrame
data = ['18#38#123#23=>21', '18#38#23#55=>35']
d = pd.DataFrame(data, columns = ['rule'])
and I have list of integers 我有整数列表
r = [18, 55]
and I want to filter rules from above DataFrame
if all integers in the list r
are present in the rule too. 如果列表
r
中的所有整数也存在于规则中,我想从DataFrame
上方过滤规则。 I tried the following code and failed 我尝试了以下代码,但失败了
d[d['rule'].str.replace('=>','#').split('#').astype(set).issuperset(set(r))]
How can I achieve the desired filtering with pandas
如何使用
pandas
实现所需的过滤
您朝着正确的方向前进,只需要使用apply
函数即可:
d[d['rule'].str.replace('=>','#').str.split('#').apply(lambda x: set(x).issuperset(set(map(str,r))))]
My initial instinct would be to use a list
comprehension: 我最初的直觉是使用
list
理解:
df = pd.DataFrame(['18#38#123#23=>21', '188#38#123#23=>21', '#18#38#23#55=>35'], columns = ['rule'])
def wrap(n):
return r'(?<=[^|^\d]){}(?=[^\d])'.format(n)
patterns = [18, 55]
pd.concat([df['rule'].str.contains(wrap(pattern)) for pattern in patterns], axis=1).all(axis=1)
Output: 输出:
0 False
1 False
2 True
Using str.get_dummies
使用
str.get_dummies
d.rule.str.replace('=>','#').str.get_dummies(sep='#').loc[:, map(str, r)].all(1)
Outputs 输出
0 False
1 True
dtype: bool
Detail: 详情:
get_dummies
+ loc
returns get_dummies
+ loc
返回
18 55
0 1 0
1 1 1
My approach is similar to @RafaelC's answer, but convert all string
into int
: 我的方法类似于@RafaelC的答案,但是将所有
string
转换为int
:
new_df = d.rule.str.replace('=>','#').str.get_dummies(sep='#')
new_df.columns = new_df.columns.astype(int)
has_all = new_df[r].all(1)
# then you can assign new column for initial data frame
d['new_col'] = 10
d.loc[has_all, 'new_col'] = 100
Output: 输出:
+-------+-------------------+------------+
| | rule | new_col |
+-------+-------------------+------------+
| 0 | 18#38#123#23=>21 | 10 |
| 1 | 188#38#23#55=>35 | 10 |
| 2 | 18#38#23#55=>35 | 100 |
+-------+-------------------+------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.