[英]how to create a DF from a DF based on a condition
My current DF looks like this我现在的 DF 看起来像这样
Combinations Count
1 ('IDLY', 'VADA') 3734
6 ('DOSA', 'IDLY') 2020
9 ('CHAPPATHI', 'DOSA') 1297
10 ('IDLY', 'POORI') 1297
11 ('COFFEE', 'TEA') 1179
13 ('DOSA', 'VADA') 1141
15 ('CHAPPATHI', 'IDLY') 1070
16 ('COFFEE', 'SAMOSA') 1061
17 ('COFFEE', 'IDLY') 1016
18 ('POORI', 'VADA') 1008
Lets say I filter by the keyword 'DOSA' from above data frame I get the below OP假设我从上面的数据框中按关键字“DOSA”过滤我得到下面的 OP
Combinations Count
6 ('DOSA', 'IDLY') 2020
9 ('CHAPPATHI', 'DOSA') 1297
13 ('DOSA', 'VADA') 1141
But I would like the output to be like the df below(which has ignored the filter key word as its common,但我希望 output 像下面的 df 一样(它忽略了过滤器关键字作为其常见,
Combinations Count
6 IDLY 2020
9 CHAPPATHI 1297
13 VADA 1141
What concept of pandas needs to be used here?这里需要用到pandas的什么概念? How can this be achieved?
如何做到这一点?
you can also try creating a dataframe as a reference, then mask where keyword matches with stack
for dropping NaN:您也可以尝试创建一个 dataframe 作为参考,然后将关键字与
stack
匹配的位置屏蔽以删除 NaN:
keyword = 'DOSA'
m = pd.DataFrame(df['Combinations'].tolist(),index=df.index)
c = m.eq(keyword).any(1)
df[m.eq(keyword).any(1)].assign(Combinations=
m[c].where(m[c].ne(keyword)).stack().droplevel(1))
Combinations Count
6 IDLY 2020
9 CHAPPATHI 1297
13 VADA 1141
For string type, you can convert into tuple by:对于字符串类型,您可以通过以下方式转换为元组:
import ast
df['Combinations'] = df['Combinations'].apply(ast.literal_eval)
In general, it's not ideal to have list, tuples, sets,
etc inside a dataframe.通常,在 dataframe 中包含
list, tuples, sets,
等并不理想。 It's better to have multiple records for each instance when needed.最好在需要时为每个实例设置多个记录。
You can use explode
turn Combinations
into this form and filter on that您可以使用
explode
将Combinations
转换为这种形式并对其进行过滤
keyword = 'DOSA'
s = df.explode('Combinations')
s.loc[s.Combinations.eq('keyword').groupby(level=0).transform('any') & s.Combinations.ne('keyword')]
Or chain the two commands with .loc[lambda ]
:或者用
.loc[lambda ]
链接这两个命令:
(df.explode('Combinations')
.loc[lambda x: x.Combinations.ne(keyword) &
x.Combinations.eq(keyword).groupby(level=0).transform('any')]
)
Output: Output:
Combinations Count
6 IDLY 2020
9 CHAPPATHI 1297
13 VADA 1141
What I will do我将要做的
x=df.explode('Combinations')
x=x.loc[x.index[x.Combinations=='DOSA']].query('Combinations !="DOSA"')
x
Combinations Count
6 IDLY 2020
9 CHAPPATHI 1297
13 VADA 1141
d = df[df['Combinations'].transform(lambda x: 'DOSA' in x)].copy()
d['Combinations'] = d['Combinations'].apply(lambda x: set(x).difference(['DOSA']).pop())
print(d)
Prints:印刷:
ID Combinations Count
1 6 IDLY 2020
2 9 CHAPPATHI 1297
5 13 VADA 1141
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.