简体   繁体   中英

Split a Pandas dataframe, keep both parts

I'm creating a dataframe by importing a.csv file. I then need to delete rows based on certain conditions. Because the number of rows deleted is quite small it is easier to validate the conditions by checking what has been deleted, instead of what remains. I end up doing something like this:

    dfcd=df.loc[(~df.Course_Code.str.contains('MG')) & (~df.Course_Code.str.contains('DE'))]
    df=df.loc[(df.Course_Code.str.contains('MG')) | (df.Course_Code.str.contains('DE'))]

But this feels very clumsy and as the conditions get more complex I worry that I am going to write the inverse condition incorrectly (reading another thread on SO I realise I could have simplified the above by using another set of parentheses with the ~ outside them, but anyway)

Is there a command that will create two dataframes, one where the condition is true and the other where it is false? Something like:

    df,dfcd=df.<another_command>[(df.Course_Code.str.contains('MG')) | (df.Course_Code.str.contains('DE'))]

Or is there another better way to do this?

You can use | for regex or , so possible simplify your solution by filter for condition and invert condition by ~ for match rows if condition get False s:

m = df.Course_Code.str.contains('MG|DE')
#same like
# m = (df.Course_Code.str.contains('MG')) | (df.Course_Code.str.contains('DE'))

df1, df2 = df[m], df[~m]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM