Split a pandas dataframe into two by columns

Question

I have a dataframe and I want to split it into two dataframes, one that has all the columns beginning with foo and one with the rest of the columns. Is there a quick way of doing this?

Answer 1

You can use list comprehensions for select all columns names:

df = pd.DataFrame({'fooA':[1,2,3],
                   'fooB':[4,5,6],
                   'fooC':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   D  E  F  fooA  fooB  fooC
0  1  5  7     1     4     7
1  3  3  4     2     5     8
2  5  6  3     3     6     9

foo = [col for col in df.columns if col.startswith('foo')]
print (foo)
['fooA', 'fooB', 'fooC']

other = [col for col in df.columns if not col.startswith('foo')]
print (other)
['D', 'E', 'F']

print (df[foo])
   fooA  fooB  fooC
0     1     4     7
1     2     5     8
2     3     6     9

print (df[other])
   D  E  F
0  1  5  7
1  3  3  4
2  5  6  3

Another solution with filter and difference :

df1 = df.filter(regex='^foo')
print (df1)
   fooA  fooB  fooC
0     1     4     7
1     2     5     8
2     3     6     9

print (df.columns.difference(df1.columns))
Index(['D', 'E', 'F'], dtype='object')

print (df[df.columns.difference(df1.columns)])
   D  E  F
0  1  5  7
1  3  3  4
2  5  6  3

Timings :

In [123]: %timeit a(df)
1000 loops, best of 3: 1.06 ms per loop

In [124]: %timeit b(df3)
1000 loops, best of 3: 1.04 ms per loop

In [125]: %timeit c(df4)
1000 loops, best of 3: 1.41 ms per loop

df3 = df.copy()
df4 = df.copy()

def a(df):
    df1 = df.filter(regex='^foo')
    df2 = df[df.columns.difference(df1.columns)]
    return df1, df2

def b(df):
    df1 = df[[col for col in df.columns if col.startswith('foo')]]
    df2 = df[[col for col in df.columns if not col.startswith('foo')]]
    return df1, df2

def c(df):
    df1 = df[df.columns[df.columns.str.startswith('foo')]]
    df2 = df[df.columns[~df.columns.str.startswith('foo')]]
    return df1, df2

df1, df2 = a(df)
print (df1)
print (df2)    

df1, df2 = b(df3)
print (df1)
print (df2)   

df1, df2 = c(df4)
print (df1)
print (df2)

Split a pandas dataframe into two by columns

Question

1 answers

solution1
2 2016-10-05 20:10:47

Split a pandas dataframe into two by columns

Question

1 answers

solution1 2 2016-10-05 20:10:47

solution1
2 2016-10-05 20:10:47