简体   繁体   English

熊猫DataFrame字符串替换,然后分割并设置交集

[英]Pandas DataFrame string replace followed by split and set intersection

I have following pandas DataFrame 我有以下pandas DataFrame

data = ['18#38#123#23=>21', '18#38#23#55=>35']
d = pd.DataFrame(data, columns = ['rule'])

and I have list of integers 我有整数列表

r = [18, 55]

and I want to filter rules from above DataFrame if all integers in the list r are present in the rule too. 如果列表r中的所有整数也存在于规则中,我想从DataFrame上方过滤规则。 I tried the following code and failed 我尝试了以下代码,但失败了

d[d['rule'].str.replace('=>','#').split('#').astype(set).issuperset(set(r))]

How can I achieve the desired filtering with pandas 如何使用pandas实现所需的过滤

您朝着正确的方向前进,只需要使用apply函数即可:

d[d['rule'].str.replace('=>','#').str.split('#').apply(lambda x: set(x).issuperset(set(map(str,r))))]

My initial instinct would be to use a list comprehension: 我最初的直觉是使用list理解:

df = pd.DataFrame(['18#38#123#23=>21', '188#38#123#23=>21', '#18#38#23#55=>35'], columns = ['rule'])

def wrap(n):
    return r'(?<=[^|^\d]){}(?=[^\d])'.format(n)

patterns = [18, 55]
pd.concat([df['rule'].str.contains(wrap(pattern)) for pattern in patterns], axis=1).all(axis=1)

Output: 输出:

0    False
1    False
2     True

Using str.get_dummies 使用str.get_dummies

d.rule.str.replace('=>','#').str.get_dummies(sep='#').loc[:, map(str, r)].all(1)

Outputs 输出

0    False
1     True
dtype: bool

Detail: 详情:

get_dummies + loc returns get_dummies + loc返回

    18  55
0   1   0
1   1   1

My approach is similar to @RafaelC's answer, but convert all string into int : 我的方法类似于@RafaelC的答案,但是将所有string转换为int

new_df = d.rule.str.replace('=>','#').str.get_dummies(sep='#')
new_df.columns = new_df.columns.astype(int)
has_all = new_df[r].all(1)

# then you can assign new column for initial data frame
d['new_col'] = 10
d.loc[has_all, 'new_col'] = 100

Output: 输出:

+-------+-------------------+------------+
|       |    rule           |   new_col  |
+-------+-------------------+------------+
|    0  | 18#38#123#23=>21  |      10    |
|    1  | 188#38#23#55=>35  |      10    |
|    2  | 18#38#23#55=>35   |     100    |
+-------+-------------------+------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM