I am fighting with the following issue:
Let's say I have the following input dataframe:
df
something library other_info
FOO NaN blaa
BAR ['bar/libBAR.a', 'bar/libBAR.cpp.so', 'bar/libBARFIGHT.so'] bluu
MEH ['meh/libMEH.a', 'meh/libMEH.so', 'meh/libMEH.other.so'] blqq
Then, using the explode
dataframe functionality:
df1 = df.explode('library')
something library other_info
FOO NaN blaa
BAR bar/libBAR.a bluu
BAR bar/libBAR.cpp.so bluu
BAR bar/libBARFIGHT.so bluu
MEH meh/libMEH.a blqq
MEH meh/libMEH.so blqq
MEH meh/libMEH.other.so blqq
Afterwards I am applying filtering with a regex, to create a subset dataframe:
regex = '.*/lib.*\.a'
df2 = df1[df1.library.str.contains(regex, regex=True, na=False)]
something library other_info
BAR bar/libBAR.a bluu
MEH meh/libMEH.a blqq
So now I am trying to remove the entries that that I've filtered from df1 using a "condition":
creating a condition (True/False serie)
condition = df1['library']isin(df2['library'])
something
FOO False
BAR True
BAR False
BAR False
MEH True
MEH False
MEH False
Whit this condition, I am trying to remove the entries that I want from d1 (without creating new dataframe):
d1.drop(d1[condition].index, inplace=True)
The result, though is quiet surprising:
something library other_info
FOO NaN blaa
So, all of the entries for BAR and MEH have been dropped from the dataframe even though only one row per "something" matched.
What am I doing wrong? What is the correct way to filter only the "True" rows and can it be done withint the "filter" method usage?
You can just filter on not condition
like so
df3 = df1[~condition]
df3
produces
something library other_info
0 FOO NaN blaa
1 BAR bar/libBAR.cpp.so bluu
1 BAR bar/libBARFIGHT.so bluu
2 MEH meh/libMEH.so blqq
2 MEH meh/libMEH.other.so blqq
does that work for you?
Your original code will work if you update your explode
statement like so:
df1 = df.explode('library', ignore_index = True)
which will re-index the dataframe, which means your subsequent manipulations use unique index values not the original ones (which are repeated for rows that were exploded from the same row)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.