简体   繁体   English

根据值将一个数据帧拆分为多个具有相同列标题的数据帧

[英]Split one dataframe into multiple dataframes with same column header based on values

I have a dataframe that looks like below我有一个如下所示的数据框

+------+------+---+---+---+
| S.No | A    | B | C | D |
+------+------+---+---+---+
| 1    | 0.25 | 2 | 1 | 5 |
+------+------+---+---+---+
| 2    | 1.1  | 4 | 2 | 5 |
+------+------+---+---+---+
| 3    | 1.5  | 6 | 3 | 5 |
+------+------+---+---+---+
| 4    | 0.32 | 3 | 4 | 5 |
+------+------+---+---+---+
| 5    | 1.45 | 5 | 5 | 5 |
+------+------+---+---+---+
| 6    | 1.9  | 7 | 6 | 5 |
+------+------+---+---+---+
| 7    | 0.5  | 3 | 4 | 5 |
+------+------+---+---+---+
| 8    | 1.49 | 5 | 5 | 5 |
+------+------+---+---+---+

I want to split them into 3 dataframes with same column header value name, the split is based on Column A value ie 1st dataframe should start from 0.25 and end in 1.5 , the second dataframe should start from 0.32 and end in 1.9 and 3rd dataframe should start from 0.5 and end in 1.49 .我想将它们拆分为具有相同列标题值名称的 3 个数据帧,拆分基于Column A值,即第一个数据帧应从0.25开始并以1.5结束,第二个数据帧应从0.32开始并以1.9结束,第三个数据帧应从0.5开始到1.49结束。 ie when the value in column A is between 0-1 , the split should start, They all should retain the same column header value.即当column A的值介于0-1之间时,应该开始拆分,它们都应该保留相同的列标题值。 Expected Output is as follows, Since i am new to this, i dont know how to get this done properly, any help in this would be appreciated.预期输出如下,由于我是新手,我不知道如何正确完成这项工作,对此的任何帮助将不胜感激。

Dataframe 1:数据框 1:

+------+------+---+---+---+
| S.No | A    | B | C | D |
+------+------+---+---+---+
| 1    | 0.25 | 2 | 1 | 5 |
+------+------+---+---+---+
| 2    | 1.1  | 4 | 2 | 5 |
+------+------+---+---+---+
| 3    | 1.5  | 6 | 3 | 5 |
+------+------+---+---+---+

Dataframe 2:数据框 2:

+------+------+---+---+---+
| S.No | A    | B | C | D |
+------+------+---+---+---+
| 4    | 0.32 | 3 | 4 | 5 |
+------+------+---+---+---+
| 5    | 1.45 | 5 | 5 | 5 |
+------+------+---+---+---+
| 6    | 1.9  | 7 | 6 | 5 |
+------+------+---+---+---+

Dataframe 3:数据框 3:

+------+------+---+---+---+
| S.No | A    | B | C | D |
+------+------+---+---+---+
| 7    | 0.5  | 3 | 4 | 5 |
+------+------+---+---+---+
| 8    | 1.49 | 5 | 5 | 5 |
+------+------+---+---+---+

让我们做cumsum

d={x: y for x , y in df.groupby(df.A.between(0,1).cumsum())}

You start off by identifying the indices where the values are between 0 and 1. This is done with a combination of between and index .您首先确定值介于 0 和 1 between index 。这是通过betweenindex的组合完成的。 Once you have the indices, you can start splitting the dataframe using the iloc method获得索引后,您可以开始使用iloc方法拆分数据帧

#Identifies indices based on variable A
splitIndices = df.index[df.A.between(0,1)].tolist()


dfList = []

for i in range(len(splitIndices)-1):
    startIndex = splitIndices[i]
    endIndex = splitIndices[i+1]

    tempDf = df.iloc[startIndex : endIndex]

    #Appends the dataframe subset to the output list
    dfList.append(tempDf.copy())

According to the explanation you have provided, you include a between condition, eg:根据您提供的解释,您包含了一个条件,例如:

1st dataframe should start from 0.25 and end in 1.5第一个数据帧应从 0.25 开始并以 1.5 结束

this means values like 0.32 should be included in the dataframe这意味着像0.32这样的值应该包含在数据框中

With that logic you can do the below:使用该逻辑,您可以执行以下操作:

l=[.25,1.5,.32,1.9,.5,1.49]
r=[(a,b) for a,b in zip(l[::2],l[1::2])]
for i in r:
    r i in r:
    print(df[df['A'].between(*i,inclusive=True)].sort_values('A'))
    print("----------------------------------")

   S.No     A    B    C    D
0   1.0  0.25  2.0  1.0  5.0
3   4.0  0.32  3.0  4.0  5.0
6   7.0  0.50  3.0  4.0  5.0
1   2.0  1.10  4.0  2.0  5.0
4   5.0  1.45  5.0  5.0  5.0
7   8.0  1.49  5.0  5.0  5.0
2   3.0  1.50  6.0  3.0  5.0
----------------------------------
   S.No     A    B    C    D
3   4.0  0.32  3.0  4.0  5.0
6   7.0  0.50  3.0  4.0  5.0
1   2.0  1.10  4.0  2.0  5.0
4   5.0  1.45  5.0  5.0  5.0
7   8.0  1.49  5.0  5.0  5.0
2   3.0  1.50  6.0  3.0  5.0
5   6.0  1.90  7.0  6.0  5.0
----------------------------------
   S.No     A    B    C    D
6   7.0  0.50  3.0  4.0  5.0
1   2.0  1.10  4.0  2.0  5.0
4   5.0  1.45  5.0  5.0  5.0
7   8.0  1.49  5.0  5.0  5.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据标题行将数据帧拆分为多个数据帧 - How to split dataframe into multiple dataframes based on header rows 循环通过 Pandas Dataframe 并根据同一列名称拆分为多个数据帧 - Loop Through Pandas Dataframe and split into multiple dataframes based on same column Name 将数据框列标题和值拆分为多个列 - split dataframe column header and values into multiple columns pandas:在 dataframe 中根据一列中的相似值填充来自多个数据帧的值的空列 - pandas: populate an empty column in a dataframe with values from multiple dataframes based on similar values in one column 根据列的值将 Pandas dataframe 拆分为多个数据帧 - Split a Pandas dataframe into multiple dataframes based on the value of a column 根据列名将大数据帧拆分为多个数据帧 - Split Large Dataframe into multiple dataframes based on Column names 根据另一个 dataframe 中的值将 dataframe 拆分为 6 个数据帧 - Split dataframe into 6 dataframes based on values in another dataframe 根据相同的标题前缀从多个 DataFrame 中选择列 - Select column from multiple DataFrames based on same header prefix 根据列标题前缀拆分dataFrames - Split dataFrames based on column header prefix 根据一列python将数据框拆分为较小的数据框 - Split a dataframe into smaller dataframes based on a column python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM