[英]Split pandas dataframe by String
我是使用Pandas數據框的新手。 我在.csv中有這樣的數據:
foo, 1234,
bar, 4567
stuff, 7894
New Entry,,
morestuff,1345
我正在將其讀入數據框
df = pd.read_csv
但是,我真正想要的是每當我有一個“ New Entry”行時(顯然不包括它),就需要一個新的數據框(或一種拆分當前數據框的方法)。 怎么辦呢?
因此,使用我連接了3次的示例數據,在加載后(為了方便起見,我將cols命名為'a','b','c'),我們然后找到了具有“ New Entry”的索引,並產生了一個列表將這些位置的元組逐步標記為乞討,終止范圍。
然后,我們可以遍歷此元組列表,並將orig df切片並追加到列表中:
In [22]:
t="""foo,1234,
bar,4567
stuff,7894
New Entry,,
morestuff,1345"""
df = pd.read_csv(io.StringIO(t),header=None,names=['a','b','c'] )
df = pd.concat([df]*3, ignore_index=True)
df
Out[22]:
a b c
0 foo 1234 NaN
1 bar 4567 NaN
2 stuff 7894 NaN
3 New Entry NaN NaN
4 morestuff 1345 NaN
5 foo 1234 NaN
6 bar 4567 NaN
7 stuff 7894 NaN
8 New Entry NaN NaN
9 morestuff 1345 NaN
10 foo 1234 NaN
11 bar 4567 NaN
12 stuff 7894 NaN
13 New Entry NaN NaN
14 morestuff 1345 NaN
In [30]:
import itertools
idx = df[df['a'] == 'New Entry'].index
idx_list = [(0,idx[0])]
idx_list = idx_list + list(zip(idx, idx[1:]))
idx_list
Out[30]:
[(0, 3), (3, 8), (8, 13)]
In [31]:
df_list = []
for i in idx_list:
print(i)
if i[0] == 0:
df_list.append(df[i[0]:i[1]])
else:
df_list.append(df[i[0]+1:i[1]])
df_list
(0, 3)
(3, 8)
(8, 13)
Out[31]:
[ a b c
0 foo 1234 NaN
1 bar 4567 NaN
2 stuff 7894 NaN, a b c
4 morestuff 1345 NaN
5 foo 1234 NaN
6 bar 4567 NaN
7 stuff 7894 NaN, a b c
9 morestuff 1345 NaN
10 foo 1234 NaN
11 bar 4567 NaN
12 stuff 7894 NaN]
1)一種方法是在讀取文件時逐行讀取並檢查NewEntry
中斷情況,這是一種快速的方法。
2)另一種方法是,如果數據幀已經存在,則是找到NewEntry
並將其切成多個,以dff = {}
df
col1 col2
0 foo 1234
1 bar 4567
2 stuff 7894
3 NewEntry NaN
4 morestuff 1345
找到NewEntry
行,為邊界條件添加[-1]
和[len(df.index)]
rows = [-1] + np.where(df['col1']=='NewEntry')[0].tolist() + [len(df.index)]
[-1, 3L, 5]
創建數據幀的字典
dff = {}
for i, r in enumerate(rows[:-1]):
dff[i] = df[r+1: rows[i+1]]
數據幀的字典{0:datafram1、1:dataframe2}
dff
{0: col1 col2
0 foo 1234
1 bar 4567
2 stuff 7894, 1: col1 col2
4 morestuff 1345}
數據框1
dff[0]
col1 col2
0 foo 1234
1 bar 4567
2 stuff 7894
數據框2
dff[1]
col1 col2
4 morestuff 1345
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.