[英]Split a pandas dataframe into two by columns
我有一個數據框,我想將其拆分為兩個數據框,一個包含所有列以foo
開頭的列,另一個包含其余列的列。 有快速的方法嗎?
您可以使用list comprehensions
選擇所有列名稱:
df = pd.DataFrame({'fooA':[1,2,3],
'fooB':[4,5,6],
'fooC':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
D E F fooA fooB fooC
0 1 5 7 1 4 7
1 3 3 4 2 5 8
2 5 6 3 3 6 9
foo = [col for col in df.columns if col.startswith('foo')]
print (foo)
['fooA', 'fooB', 'fooC']
other = [col for col in df.columns if not col.startswith('foo')]
print (other)
['D', 'E', 'F']
print (df[foo])
fooA fooB fooC
0 1 4 7
1 2 5 8
2 3 6 9
print (df[other])
D E F
0 1 5 7
1 3 3 4
2 5 6 3
具有filter
和difference
另一種解決方案:
df1 = df.filter(regex='^foo')
print (df1)
fooA fooB fooC
0 1 4 7
1 2 5 8
2 3 6 9
print (df.columns.difference(df1.columns))
Index(['D', 'E', 'F'], dtype='object')
print (df[df.columns.difference(df1.columns)])
D E F
0 1 5 7
1 3 3 4
2 5 6 3
時間 :
In [123]: %timeit a(df)
1000 loops, best of 3: 1.06 ms per loop
In [124]: %timeit b(df3)
1000 loops, best of 3: 1.04 ms per loop
In [125]: %timeit c(df4)
1000 loops, best of 3: 1.41 ms per loop
df3 = df.copy()
df4 = df.copy()
def a(df):
df1 = df.filter(regex='^foo')
df2 = df[df.columns.difference(df1.columns)]
return df1, df2
def b(df):
df1 = df[[col for col in df.columns if col.startswith('foo')]]
df2 = df[[col for col in df.columns if not col.startswith('foo')]]
return df1, df2
def c(df):
df1 = df[df.columns[df.columns.str.startswith('foo')]]
df2 = df[df.columns[~df.columns.str.startswith('foo')]]
return df1, df2
df1, df2 = a(df)
print (df1)
print (df2)
df1, df2 = b(df3)
print (df1)
print (df2)
df1, df2 = c(df4)
print (df1)
print (df2)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.