[英]Split one dataframe into multiple dataframes with same column header based on values
I have a dataframe that looks like below我有一个如下所示的数据框
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 1 | 0.25 | 2 | 1 | 5 |
+------+------+---+---+---+
| 2 | 1.1 | 4 | 2 | 5 |
+------+------+---+---+---+
| 3 | 1.5 | 6 | 3 | 5 |
+------+------+---+---+---+
| 4 | 0.32 | 3 | 4 | 5 |
+------+------+---+---+---+
| 5 | 1.45 | 5 | 5 | 5 |
+------+------+---+---+---+
| 6 | 1.9 | 7 | 6 | 5 |
+------+------+---+---+---+
| 7 | 0.5 | 3 | 4 | 5 |
+------+------+---+---+---+
| 8 | 1.49 | 5 | 5 | 5 |
+------+------+---+---+---+
I want to split them into 3 dataframes with same column header value name, the split is based on Column A
value ie 1st dataframe should start from 0.25
and end in 1.5
, the second dataframe should start from 0.32
and end in 1.9
and 3rd dataframe should start from 0.5
and end in 1.49
.我想将它们拆分为具有相同列标题值名称的 3 个数据帧,拆分基于Column A
值,即第一个数据帧应从0.25
开始并以1.5
结束,第二个数据帧应从0.32
开始并以1.9
结束,第三个数据帧应从0.5
开始到1.49
结束。 ie when the value in column A
is between 0-1
, the split should start, They all should retain the same column header value.即当column A
的值介于0-1
之间时,应该开始拆分,它们都应该保留相同的列标题值。 Expected Output is as follows, Since i am new to this, i dont know how to get this done properly, any help in this would be appreciated.预期输出如下,由于我是新手,我不知道如何正确完成这项工作,对此的任何帮助将不胜感激。
Dataframe 1:数据框 1:
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 1 | 0.25 | 2 | 1 | 5 |
+------+------+---+---+---+
| 2 | 1.1 | 4 | 2 | 5 |
+------+------+---+---+---+
| 3 | 1.5 | 6 | 3 | 5 |
+------+------+---+---+---+
Dataframe 2:数据框 2:
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 4 | 0.32 | 3 | 4 | 5 |
+------+------+---+---+---+
| 5 | 1.45 | 5 | 5 | 5 |
+------+------+---+---+---+
| 6 | 1.9 | 7 | 6 | 5 |
+------+------+---+---+---+
Dataframe 3:数据框 3:
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 7 | 0.5 | 3 | 4 | 5 |
+------+------+---+---+---+
| 8 | 1.49 | 5 | 5 | 5 |
+------+------+---+---+---+
让我们做cumsum
d={x: y for x , y in df.groupby(df.A.between(0,1).cumsum())}
You start off by identifying the indices where the values are between 0 and 1. This is done with a combination of between
and index
.您首先确定值介于 0 和 1 between
index
。这是通过between
和index
的组合完成的。 Once you have the indices, you can start splitting the dataframe using the iloc
method获得索引后,您可以开始使用iloc
方法拆分数据帧
#Identifies indices based on variable A
splitIndices = df.index[df.A.between(0,1)].tolist()
dfList = []
for i in range(len(splitIndices)-1):
startIndex = splitIndices[i]
endIndex = splitIndices[i+1]
tempDf = df.iloc[startIndex : endIndex]
#Appends the dataframe subset to the output list
dfList.append(tempDf.copy())
According to the explanation you have provided, you include a between condition, eg:根据您提供的解释,您包含了一个条件,例如:
1st dataframe should start from 0.25 and end in 1.5第一个数据帧应从 0.25 开始并以 1.5 结束
this means values like 0.32
should be included in the dataframe这意味着像0.32
这样的值应该包含在数据框中
With that logic you can do the below:使用该逻辑,您可以执行以下操作:
l=[.25,1.5,.32,1.9,.5,1.49]
r=[(a,b) for a,b in zip(l[::2],l[1::2])]
for i in r:
r i in r:
print(df[df['A'].between(*i,inclusive=True)].sort_values('A'))
print("----------------------------------")
S.No A B C D
0 1.0 0.25 2.0 1.0 5.0
3 4.0 0.32 3.0 4.0 5.0
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0
2 3.0 1.50 6.0 3.0 5.0
----------------------------------
S.No A B C D
3 4.0 0.32 3.0 4.0 5.0
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0
2 3.0 1.50 6.0 3.0 5.0
5 6.0 1.90 7.0 6.0 5.0
----------------------------------
S.No A B C D
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.