[英]Split one dataframe into multiple dataframes with same column header based on values
我有一個如下所示的數據框
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 1 | 0.25 | 2 | 1 | 5 |
+------+------+---+---+---+
| 2 | 1.1 | 4 | 2 | 5 |
+------+------+---+---+---+
| 3 | 1.5 | 6 | 3 | 5 |
+------+------+---+---+---+
| 4 | 0.32 | 3 | 4 | 5 |
+------+------+---+---+---+
| 5 | 1.45 | 5 | 5 | 5 |
+------+------+---+---+---+
| 6 | 1.9 | 7 | 6 | 5 |
+------+------+---+---+---+
| 7 | 0.5 | 3 | 4 | 5 |
+------+------+---+---+---+
| 8 | 1.49 | 5 | 5 | 5 |
+------+------+---+---+---+
我想將它們拆分為具有相同列標題值名稱的 3 個數據幀,拆分基於Column A
值,即第一個數據幀應從0.25
開始並以1.5
結束,第二個數據幀應從0.32
開始並以1.9
結束,第三個數據幀應從0.5
開始到1.49
結束。 即當column A
的值介於0-1
之間時,應該開始拆分,它們都應該保留相同的列標題值。 預期輸出如下,由於我是新手,我不知道如何正確完成這項工作,對此的任何幫助將不勝感激。
數據框 1:
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 1 | 0.25 | 2 | 1 | 5 |
+------+------+---+---+---+
| 2 | 1.1 | 4 | 2 | 5 |
+------+------+---+---+---+
| 3 | 1.5 | 6 | 3 | 5 |
+------+------+---+---+---+
數據框 2:
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 4 | 0.32 | 3 | 4 | 5 |
+------+------+---+---+---+
| 5 | 1.45 | 5 | 5 | 5 |
+------+------+---+---+---+
| 6 | 1.9 | 7 | 6 | 5 |
+------+------+---+---+---+
數據框 3:
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 7 | 0.5 | 3 | 4 | 5 |
+------+------+---+---+---+
| 8 | 1.49 | 5 | 5 | 5 |
+------+------+---+---+---+
讓我們做cumsum
d={x: y for x , y in df.groupby(df.A.between(0,1).cumsum())}
您首先確定值介於 0 和 1 between
index
。這是通過between
和index
的組合完成的。 獲得索引后,您可以開始使用iloc
方法拆分數據幀
#Identifies indices based on variable A
splitIndices = df.index[df.A.between(0,1)].tolist()
dfList = []
for i in range(len(splitIndices)-1):
startIndex = splitIndices[i]
endIndex = splitIndices[i+1]
tempDf = df.iloc[startIndex : endIndex]
#Appends the dataframe subset to the output list
dfList.append(tempDf.copy())
根據您提供的解釋,您包含了一個條件,例如:
第一個數據幀應從 0.25 開始並以 1.5 結束
這意味着像0.32
這樣的值應該包含在數據框中
使用該邏輯,您可以執行以下操作:
l=[.25,1.5,.32,1.9,.5,1.49]
r=[(a,b) for a,b in zip(l[::2],l[1::2])]
for i in r:
r i in r:
print(df[df['A'].between(*i,inclusive=True)].sort_values('A'))
print("----------------------------------")
S.No A B C D
0 1.0 0.25 2.0 1.0 5.0
3 4.0 0.32 3.0 4.0 5.0
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0
2 3.0 1.50 6.0 3.0 5.0
----------------------------------
S.No A B C D
3 4.0 0.32 3.0 4.0 5.0
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0
2 3.0 1.50 6.0 3.0 5.0
5 6.0 1.90 7.0 6.0 5.0
----------------------------------
S.No A B C D
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.