簡體   English   中英

如何將具有多種類型信息的 dataframe 拆分為基於字符串的單獨數據幀?

[英]How to split dataframe with multiple types of information into separate dataframes based on string?

我有以下 dataframe ,其中一個 CSV 文件中有多個數據集。 在這種情況下,“現金余額”、“賬戶訂單歷史記錄”和“股票”(故意留空信息)。 我想將現金余額信息創建到一個 dataframe 和帳戶訂單歷史記錄到另一個中。 我的想法是查看第一列的索引,看看它是否等於“現金余額”,然后讀取每一行,直到索引 =“賬戶訂單歷史”等等,但不確定這是否正確方法。

如何使用 python 對此進行編碼? 請幫忙謝謝!

Cash Balance
Date        Time     Type    ID#    Commission  Amount   Balance
11/9/20     9:30am   Single  1234   2%          $200     $2500
11/9/20     9:40am   Single  1234   2%          $200     $2500
11/9/20     9:45am   Single  2234   2%          $200     $2500
Account Order History
Notes                Time    Spread Side        Qty      Price   Symbol Order
                     9:30am  STOCK  BUY         10       $42.87  NIO    Filled
                     9:30am  STOCK  Sell        10       $43.87  NIO    Filled
Equities

這是你想要的?

import pandas as pd
df = pd.read_csv("new.csv",header=None)
df

0   1   2   3   4   5   6   7
0   Cash Balance    NaN NaN NaN NaN NaN NaN NaN
1   Date    Time    Type    ID# Commission  Amount  Balance NaN
2   11/09/2020  9:30am  Single  1234    2%  $200    $2,500  NaN
3   11/09/2020  9:40am  Single  1234    2%  $200    $2,500  NaN
4   11/09/2020  9:45am  Single  2234    2%  $200    $2,500  NaN
5   Account Order History   NaN NaN NaN NaN NaN NaN NaN
6   Notes   Time    Spread  Side    Qty Price   Symbol  Order
7   NaN 9:30am  STOCK   BUY 10  $42.87  NIO Filled
8   NaN 9:30am  STOCK   Sell    10  $43.87  NIO Filled

table_names = ["Cash Balance", "Account Order History"]
groups = df[0].isin(table_names).cumsum()
df_combined = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}


cash_balance = df_combined['Cash Balance'].reset_index(drop=True)
cash_balance.columns = cash_balance.iloc[0]
cash_balance.drop(cash_balance.index[0], inplace = True)

cash_balance

Date    Time    Type    ID# Commission  Amount  Balance NaN
1   11/09/2020  9:30am  Single  1234    2%  $200    $2,500  NaN
2   11/09/2020  9:40am  Single  1234    2%  $200    $2,500  NaN
3   11/09/2020  9:45am  Single  2234    2%  $200    $2,500  NaN

acct_order_hist = df_combined['Account Order History'].reset_index(drop=True)
acct_order_hist.columns = acct_order_hist.iloc[0]
acct_order_hist.drop(acct_order_hist.index[0], inplace = True)
acct_order_hist

Notes   Time    Spread  Side    Qty Price   Symbol  Order
1   NaN 9:30am  STOCK   BUY 10  $42.87  NIO Filled
2   NaN 9:30am  STOCK   Sell    10  $43.87  NIO Filled

有多種方法可以完成它。

使用numpy.split()

idx = [(df.index=='Account Order History').tolist().index(True)]
idx.append((df.index=='Equities').tolist().index(True))
np.split(df, idx)

list comprehension

idx = [0] + idx + [df.shape[0]] # needed to use the code chunk above
[df.iloc[idx[i]:idx[i+1]] for i in range(len(idx)-1)]

使用 pandas categorygroupby

cat = pd.Categorical(df.index, categories=['Cash Balance','Account Order History','Equities'])
cat = cat.fillna('Account Order History')
[v for k, v in df.groupby(cat)]

作為更復雜情況的替代方案,您可以首先按索引將 dataframe 拆分為數據幀字典,如本問題所述:

dict(tuple(df.groupby(level=0)))

其中鍵是索引,值是數據幀。 僅此一項不能解決您的問題,但可以幫助您按預期管理數據框。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM