I have a dataframe and I want to split it into two dataframes, one that has all the columns beginning with foo
and one with the rest of the columns. Is there a quick way of doing this?
You can use list comprehensions
for select all columns names:
df = pd.DataFrame({'fooA':[1,2,3],
'fooB':[4,5,6],
'fooC':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
D E F fooA fooB fooC
0 1 5 7 1 4 7
1 3 3 4 2 5 8
2 5 6 3 3 6 9
foo = [col for col in df.columns if col.startswith('foo')]
print (foo)
['fooA', 'fooB', 'fooC']
other = [col for col in df.columns if not col.startswith('foo')]
print (other)
['D', 'E', 'F']
print (df[foo])
fooA fooB fooC
0 1 4 7
1 2 5 8
2 3 6 9
print (df[other])
D E F
0 1 5 7
1 3 3 4
2 5 6 3
Another solution with filter
and difference
:
df1 = df.filter(regex='^foo')
print (df1)
fooA fooB fooC
0 1 4 7
1 2 5 8
2 3 6 9
print (df.columns.difference(df1.columns))
Index(['D', 'E', 'F'], dtype='object')
print (df[df.columns.difference(df1.columns)])
D E F
0 1 5 7
1 3 3 4
2 5 6 3
Timings :
In [123]: %timeit a(df)
1000 loops, best of 3: 1.06 ms per loop
In [124]: %timeit b(df3)
1000 loops, best of 3: 1.04 ms per loop
In [125]: %timeit c(df4)
1000 loops, best of 3: 1.41 ms per loop
df3 = df.copy()
df4 = df.copy()
def a(df):
df1 = df.filter(regex='^foo')
df2 = df[df.columns.difference(df1.columns)]
return df1, df2
def b(df):
df1 = df[[col for col in df.columns if col.startswith('foo')]]
df2 = df[[col for col in df.columns if not col.startswith('foo')]]
return df1, df2
def c(df):
df1 = df[df.columns[df.columns.str.startswith('foo')]]
df2 = df[df.columns[~df.columns.str.startswith('foo')]]
return df1, df2
df1, df2 = a(df)
print (df1)
print (df2)
df1, df2 = b(df3)
print (df1)
print (df2)
df1, df2 = c(df4)
print (df1)
print (df2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.