简体   繁体   中英

how to create a DF from a DF based on a condition

My current DF looks like this

Combinations               Count
1   ('IDLY', 'VADA')       3734
6   ('DOSA', 'IDLY')        2020
9   ('CHAPPATHI', 'DOSA')   1297
10  ('IDLY', 'POORI')       1297
11  ('COFFEE', 'TEA')       1179
13  ('DOSA', 'VADA')        1141
15  ('CHAPPATHI', 'IDLY')   1070
16  ('COFFEE', 'SAMOSA')    1061
17  ('COFFEE', 'IDLY')      1016
18  ('POORI', 'VADA')       1008

Lets say I filter by the keyword 'DOSA' from above data frame I get the below OP

    Combinations           Count
6   ('DOSA', 'IDLY')        2020
9   ('CHAPPATHI', 'DOSA')   1297
13  ('DOSA', 'VADA')        1141

But I would like the output to be like the df below(which has ignored the filter key word as its common,

    Combinations    Count
6   IDLY            2020
9   CHAPPATHI       1297
13  VADA            1141

What concept of pandas needs to be used here? How can this be achieved?

you can also try creating a dataframe as a reference, then mask where keyword matches with stack for dropping NaN:

keyword = 'DOSA'

m = pd.DataFrame(df['Combinations'].tolist(),index=df.index)
c = m.eq(keyword).any(1)
df[m.eq(keyword).any(1)].assign(Combinations=
                         m[c].where(m[c].ne(keyword)).stack().droplevel(1))

   Combinations  Count
6          IDLY   2020
9     CHAPPATHI   1297
13         VADA   1141

For string type, you can convert into tuple by:

import ast
df['Combinations'] = df['Combinations'].apply(ast.literal_eval)

In general, it's not ideal to have list, tuples, sets, etc inside a dataframe. It's better to have multiple records for each instance when needed.

You can use explode turn Combinations into this form and filter on that

keyword = 'DOSA'

s = df.explode('Combinations')

s.loc[s.Combinations.eq('keyword').groupby(level=0).transform('any') & s.Combinations.ne('keyword')]

Or chain the two commands with .loc[lambda ] :

(df.explode('Combinations')
   .loc[lambda x: x.Combinations.ne(keyword) & 
            x.Combinations.eq(keyword).groupby(level=0).transform('any')]
)

Output:

   Combinations  Count
6          IDLY   2020
9     CHAPPATHI   1297
13         VADA   1141

What I will do

x=df.explode('Combinations')
x=x.loc[x.index[x.Combinations=='DOSA']].query('Combinations !="DOSA"')
x
   Combinations  Count
6          IDLY   2020
9     CHAPPATHI   1297
13         VADA   1141
d = df[df['Combinations'].transform(lambda x: 'DOSA' in x)].copy()
d['Combinations'] = d['Combinations'].apply(lambda x: set(x).difference(['DOSA']).pop())
print(d)

Prints:

   ID Combinations  Count
1   6         IDLY   2020
2   9    CHAPPATHI   1297
5  13         VADA   1141

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM