简体   繁体   中英

Removing dataframe from another dataframe which is a filtered subset of the first dataframe

I am fighting with the following issue:

Let's say I have the following input dataframe:

df
something library                                                     other_info
FOO       NaN                                                         blaa      
BAR       ['bar/libBAR.a', 'bar/libBAR.cpp.so', 'bar/libBARFIGHT.so'] bluu          
MEH       ['meh/libMEH.a', 'meh/libMEH.so', 'meh/libMEH.other.so']    blqq      

Then, using the explode dataframe functionality:

df1 = df.explode('library')

something library             other_info
FOO       NaN                 blaa      
BAR       bar/libBAR.a        bluu     
BAR       bar/libBAR.cpp.so   bluu      
BAR       bar/libBARFIGHT.so  bluu      
MEH       meh/libMEH.a        blqq      
MEH       meh/libMEH.so       blqq
MEH       meh/libMEH.other.so blqq

Afterwards I am applying filtering with a regex, to create a subset dataframe:

regex = '.*/lib.*\.a'
df2 = df1[df1.library.str.contains(regex, regex=True, na=False)]

something library      other_info
BAR       bar/libBAR.a bluu
MEH       meh/libMEH.a blqq

So now I am trying to remove the entries that that I've filtered from df1 using a "condition":

creating a condition (True/False serie)

condition = df1['library']isin(df2['library'])

something 
FOO       False
BAR       True
BAR       False
BAR       False
MEH       True
MEH       False
MEH       False

Whit this condition, I am trying to remove the entries that I want from d1 (without creating new dataframe):

d1.drop(d1[condition].index, inplace=True)

The result, though is quiet surprising:

something library             other_info
FOO       NaN                 blaa  

So, all of the entries for BAR and MEH have been dropped from the dataframe even though only one row per "something" matched.

What am I doing wrong? What is the correct way to filter only the "True" rows and can it be done withint the "filter" method usage?

You can just filter on not condition like so

df3 = df1[~condition]
df3

produces


    something   library             other_info
0   FOO         NaN                 blaa
1   BAR         bar/libBAR.cpp.so   bluu
1   BAR         bar/libBARFIGHT.so  bluu
2   MEH         meh/libMEH.so       blqq
2   MEH         meh/libMEH.other.so blqq

does that work for you?

Your original code will work if you update your explode statement like so:

df1 = df.explode('library', ignore_index = True)

which will re-index the dataframe, which means your subsequent manipulations use unique index values not the original ones (which are repeated for rows that were exploded from the same row)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM