My current DF looks like this
Combinations Count
1 ('IDLY', 'VADA') 3734
6 ('DOSA', 'IDLY') 2020
9 ('CHAPPATHI', 'DOSA') 1297
10 ('IDLY', 'POORI') 1297
11 ('COFFEE', 'TEA') 1179
13 ('DOSA', 'VADA') 1141
15 ('CHAPPATHI', 'IDLY') 1070
16 ('COFFEE', 'SAMOSA') 1061
17 ('COFFEE', 'IDLY') 1016
18 ('POORI', 'VADA') 1008
Lets say I filter by the keyword 'DOSA' from above data frame I get the below OP
Combinations Count
6 ('DOSA', 'IDLY') 2020
9 ('CHAPPATHI', 'DOSA') 1297
13 ('DOSA', 'VADA') 1141
But I would like the output to be like the df below(which has ignored the filter key word as its common,
Combinations Count
6 IDLY 2020
9 CHAPPATHI 1297
13 VADA 1141
What concept of pandas needs to be used here? How can this be achieved?
you can also try creating a dataframe as a reference, then mask where keyword matches with stack
for dropping NaN:
keyword = 'DOSA'
m = pd.DataFrame(df['Combinations'].tolist(),index=df.index)
c = m.eq(keyword).any(1)
df[m.eq(keyword).any(1)].assign(Combinations=
m[c].where(m[c].ne(keyword)).stack().droplevel(1))
Combinations Count
6 IDLY 2020
9 CHAPPATHI 1297
13 VADA 1141
For string type, you can convert into tuple by:
import ast
df['Combinations'] = df['Combinations'].apply(ast.literal_eval)
In general, it's not ideal to have list, tuples, sets,
etc inside a dataframe. It's better to have multiple records for each instance when needed.
You can use explode
turn Combinations
into this form and filter on that
keyword = 'DOSA'
s = df.explode('Combinations')
s.loc[s.Combinations.eq('keyword').groupby(level=0).transform('any') & s.Combinations.ne('keyword')]
Or chain the two commands with .loc[lambda ]
:
(df.explode('Combinations')
.loc[lambda x: x.Combinations.ne(keyword) &
x.Combinations.eq(keyword).groupby(level=0).transform('any')]
)
Output:
Combinations Count
6 IDLY 2020
9 CHAPPATHI 1297
13 VADA 1141
What I will do
x=df.explode('Combinations')
x=x.loc[x.index[x.Combinations=='DOSA']].query('Combinations !="DOSA"')
x
Combinations Count
6 IDLY 2020
9 CHAPPATHI 1297
13 VADA 1141
d = df[df['Combinations'].transform(lambda x: 'DOSA' in x)].copy()
d['Combinations'] = d['Combinations'].apply(lambda x: set(x).difference(['DOSA']).pop())
print(d)
Prints:
ID Combinations Count
1 6 IDLY 2020
2 9 CHAPPATHI 1297
5 13 VADA 1141
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.