簡體   English   中英

按字符串拆分熊貓數據框

[英]Split pandas dataframe by String

我是使用Pandas數據框的新手。 我在.csv中有這樣的數據:

foo, 1234,
bar, 4567
stuff, 7894
New Entry,,
morestuff,1345

我正在將其讀入數據框

 df = pd.read_csv

但是,我真正想要的是每當我有一個“ New Entry”行時(顯然不包括它),就需要一個新的數據框(或一種拆分當前數據框的方法)。 怎么辦呢?

因此,使用我連接了3次的示例數據,在加載后(為了方便起見,我將cols命名為'a','b','c'),我們然后找到了具有“ New Entry”的索引,並產生了一個列表將這些位置的元組逐步標記為乞討,終止范圍。

然后,我們可以遍歷此元組列表,並將orig df切片並追加到列表中:

In [22]:

t="""foo,1234,
bar,4567
stuff,7894
New Entry,,
morestuff,1345"""
df = pd.read_csv(io.StringIO(t),header=None,names=['a','b','c'] )
df = pd.concat([df]*3, ignore_index=True)
df
Out[22]:
            a     b   c
0         foo  1234 NaN
1         bar  4567 NaN
2       stuff  7894 NaN
3   New Entry   NaN NaN
4   morestuff  1345 NaN
5         foo  1234 NaN
6         bar  4567 NaN
7       stuff  7894 NaN
8   New Entry   NaN NaN
9   morestuff  1345 NaN
10        foo  1234 NaN
11        bar  4567 NaN
12      stuff  7894 NaN
13  New Entry   NaN NaN
14  morestuff  1345 NaN
In [30]:

import itertools
idx = df[df['a'] == 'New Entry'].index
idx_list = [(0,idx[0])]
idx_list = idx_list + list(zip(idx, idx[1:]))
idx_list

​
Out[30]:
[(0, 3), (3, 8), (8, 13)]
In [31]:

df_list = []
for i in idx_list:  
    print(i)
    if i[0] == 0:
        df_list.append(df[i[0]:i[1]])
    else:
        df_list.append(df[i[0]+1:i[1]])
df_list
(0, 3)
(3, 8)
(8, 13)
Out[31]:
[       a     b   c
 0    foo  1234 NaN
 1    bar  4567 NaN
 2  stuff  7894 NaN,            a     b   c
 4  morestuff  1345 NaN
 5        foo  1234 NaN
 6        bar  4567 NaN
 7      stuff  7894 NaN,             a     b   c
 9   morestuff  1345 NaN
 10        foo  1234 NaN
 11        bar  4567 NaN
 12      stuff  7894 NaN]

1)一種方法是在讀取文件時逐行讀取並檢查NewEntry中斷情況,這是一種快速的方法。

2)另一種方法是,如果數據幀已經存在,則是找到NewEntry並將其切成多個,以dff = {}

df                                                                 
        col1  col2  
0        foo  1234    
1        bar  4567                
2      stuff  7894                                                        
3   NewEntry   NaN                       
4  morestuff  1345 

找到NewEntry行,為邊界條件添加[-1][len(df.index)]

rows = [-1] + np.where(df['col1']=='NewEntry')[0].tolist() + [len(df.index)]
[-1, 3L, 5]

創建數據幀的字典

dff = {}                                                                            
for i, r in enumerate(rows[:-1]):                                                   
    dff[i] = df[r+1: rows[i+1]]                                                     

數據幀的字典{0:datafram1、1:dataframe2}

dff                           
{0:     col1  col2            
 0    foo  1234               
 1    bar  4567               
 2  stuff  7894, 1:         col1  col2  
 4  morestuff  1345}

數據框1

dff[0]              
    col1  col2      
0    foo  1234      
1    bar  4567      
2  stuff  7894      

數據框2

dff[1]              
        col1  col2  
4  morestuff  1345 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM