[英]How to split dataframe with multiple types of information into separate dataframes based on string?
我有以下 dataframe ,其中一個 CSV 文件中有多個數據集。 在這種情況下,“現金余額”、“賬戶訂單歷史記錄”和“股票”(故意留空信息)。 我想將現金余額信息創建到一個 dataframe 和帳戶訂單歷史記錄到另一個中。 我的想法是查看第一列的索引,看看它是否等於“現金余額”,然后讀取每一行,直到索引 =“賬戶訂單歷史”等等,但不確定這是否正確方法。
如何使用 python 對此進行編碼? 請幫忙謝謝!
Cash Balance
Date Time Type ID# Commission Amount Balance
11/9/20 9:30am Single 1234 2% $200 $2500
11/9/20 9:40am Single 1234 2% $200 $2500
11/9/20 9:45am Single 2234 2% $200 $2500
Account Order History
Notes Time Spread Side Qty Price Symbol Order
9:30am STOCK BUY 10 $42.87 NIO Filled
9:30am STOCK Sell 10 $43.87 NIO Filled
Equities
這是你想要的?
import pandas as pd
df = pd.read_csv("new.csv",header=None)
df
0 1 2 3 4 5 6 7
0 Cash Balance NaN NaN NaN NaN NaN NaN NaN
1 Date Time Type ID# Commission Amount Balance NaN
2 11/09/2020 9:30am Single 1234 2% $200 $2,500 NaN
3 11/09/2020 9:40am Single 1234 2% $200 $2,500 NaN
4 11/09/2020 9:45am Single 2234 2% $200 $2,500 NaN
5 Account Order History NaN NaN NaN NaN NaN NaN NaN
6 Notes Time Spread Side Qty Price Symbol Order
7 NaN 9:30am STOCK BUY 10 $42.87 NIO Filled
8 NaN 9:30am STOCK Sell 10 $43.87 NIO Filled
table_names = ["Cash Balance", "Account Order History"]
groups = df[0].isin(table_names).cumsum()
df_combined = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}
cash_balance = df_combined['Cash Balance'].reset_index(drop=True)
cash_balance.columns = cash_balance.iloc[0]
cash_balance.drop(cash_balance.index[0], inplace = True)
cash_balance
Date Time Type ID# Commission Amount Balance NaN
1 11/09/2020 9:30am Single 1234 2% $200 $2,500 NaN
2 11/09/2020 9:40am Single 1234 2% $200 $2,500 NaN
3 11/09/2020 9:45am Single 2234 2% $200 $2,500 NaN
acct_order_hist = df_combined['Account Order History'].reset_index(drop=True)
acct_order_hist.columns = acct_order_hist.iloc[0]
acct_order_hist.drop(acct_order_hist.index[0], inplace = True)
acct_order_hist
Notes Time Spread Side Qty Price Symbol Order
1 NaN 9:30am STOCK BUY 10 $42.87 NIO Filled
2 NaN 9:30am STOCK Sell 10 $43.87 NIO Filled
有多種方法可以完成它。
使用numpy.split()
:
idx = [(df.index=='Account Order History').tolist().index(True)]
idx.append((df.index=='Equities').tolist().index(True))
np.split(df, idx)
或list comprehension
:
idx = [0] + idx + [df.shape[0]] # needed to use the code chunk above
[df.iloc[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
cat = pd.Categorical(df.index, categories=['Cash Balance','Account Order History','Equities'])
cat = cat.fillna('Account Order History')
[v for k, v in df.groupby(cat)]
作為更復雜情況的替代方案,您可以首先按索引將 dataframe 拆分為數據幀字典,如本問題所述:
dict(tuple(df.groupby(level=0)))
其中鍵是索引,值是數據幀。 僅此一項不能解決您的問題,但可以幫助您按預期管理數據框。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.