简体   繁体   English

Python Pandas 数据帧拆分

[英]Python pandas dataframe splitting

I have this kind of dataFrame which I would like to split into seperate dataframes:我有这种数据帧,我想将其拆分为单独的数据帧:

A B C Mark
3 5 6 T
4 5 2 T
3 4 5 B
5 6 7 B
3 4 5 T
2 5 2 T

For instance the table above should be split into three pandas dataframes.例如,上面的表格应该被分成三个熊猫数据框。 First dataframe the two rows with Mark "T" as one dataframe, the second dataframe the next two rows with Mark "B" and the third dataframe the last two rows with Mark "T".第一个数据帧标记为“T”的两行作为一个数据帧,第二个数据帧标记为“B”的下两行,第三个数据帧标记为“T”的最后两行。

df1 df1

A B C Mark
3 5 6 T
4 5 2 T

df2 df2

 A B C Mark
   3 4 5 B
   5 6 7 B

df3 df3

A B C Mark
3 4 5 T
2 5 2 T

Create dictionary of DataFrames with consecutive counter by shift and cumsum with convert groupby object to tuples and then to dictionary:通过将 groupby 对象转换为元组然后转换为字典,通过shiftcumsum创建具有连续计数器的cumsum字典:

dfs = dict(tuple(df.groupby(df['Mark'].ne(df['Mark'].shift()).cumsum())))
print (dfs)
{1:    A  B  C Mark
0  3  5  6    T
1  4  5  2    T, 2:    A  B  C Mark
2  3  4  5    B
3  5  6  7    B, 3:    A  B  C Mark
4  3  4  5    T
5  2  5  2    T}

Select each DataFrame:选择每个数据帧:

print (dfs[1])
print (dfs[2])
print (dfs[3])

Create a dictionary as below:创建字典如下:

frames = {}
for i, grp in df.groupby(df.Mark.ne(df.Mark.shift()).cumsum()):
    frames.update([('df_'+str(i),grp)])

{'df_1':    A  B  C Mark
 0  3  5  6    T
 1  4  5  2    T, 'df_2':    A  B  C Mark
 2  3  4  5    B
 3  5  6  7    B, 'df_3':    A  B  C Mark
 4  3  4  5    T
 5  2  5  2    T}

You can then test by printing all the dfs as :然后,您可以通过将所有 dfs 打印为以下内容来进行测试:

print(frames['df_1'])

   A  B  C Mark
0  3  5  6    T
1  4  5  2    T

Another way around wrapping this into np.array_split for the given Post: However, np.array_split returns a list of DataFrames hence you can list then down and even loop through the list.围绕包装成另一种方式np.array_split对于给定的帖子:然而, np.array_split回报DataFrames的列表,因此你可以列出,然后下来,通过列表,甚至循环。

Outcome:结果:

>>> np.array_split(df, 3)
[   A  B  C Mark
0  3  5  6    T
1  4  5  2    T,    A  B  C Mark
2  3  4  5    B
3  5  6  7    B,    A  B  C Mark
4  3  4  5    T
5  2  5  2    T]

Listing them as an individual dfs:将它们列为单独的 dfs:

>>> df[0]
   A  B  C Mark
0  3  5  6    T
1  4  5  2    T

>>> df[1]
   A  B  C Mark
2  3  4  5    B
3  5  6  7    B

>>> df[2]
   A  B  C Mark
4  3  4  5    T
5  2  5  2    T

Or you can assign them names:或者您可以为它们分配名称:

df1 = df[0]
df2 = df[1]
df2 = df[2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM