Split a Pandas dataframe, keep both parts

Question

I'm creating a dataframe by importing a.csv file. I then need to delete rows based on certain conditions. Because the number of rows deleted is quite small it is easier to validate the conditions by checking what has been deleted, instead of what remains. I end up doing something like this:

    dfcd=df.loc[(~df.Course_Code.str.contains('MG')) & (~df.Course_Code.str.contains('DE'))]
    df=df.loc[(df.Course_Code.str.contains('MG')) | (df.Course_Code.str.contains('DE'))]

But this feels very clumsy and as the conditions get more complex I worry that I am going to write the inverse condition incorrectly (reading another thread on SO I realise I could have simplified the above by using another set of parentheses with the ~ outside them, but anyway)

Is there a command that will create two dataframes, one where the condition is true and the other where it is false? Something like:

    df,dfcd=df.<another_command>[(df.Course_Code.str.contains('MG')) | (df.Course_Code.str.contains('DE'))]

Or is there another better way to do this?

Answer 1

You can use | for regex or , so possible simplify your solution by filter for condition and invert condition by ~ for match rows if condition get False s:

m = df.Course_Code.str.contains('MG|DE')
#same like
# m = (df.Course_Code.str.contains('MG')) | (df.Course_Code.str.contains('DE'))

df1, df2 = df[m], df[~m]

Split a Pandas dataframe, keep both parts

Question

1 answers

solution1
3 ACCPTED 2022-06-07 06:38:58

Split a Pandas dataframe, keep both parts

Question

1 answers

solution1 3 ACCPTED 2022-06-07 06:38:58

solution1
3 ACCPTED 2022-06-07 06:38:58