[英]Python pandas dataframe splitting
I have this kind of dataFrame which I would like to split into seperate dataframes:我有这种数据帧,我想将其拆分为单独的数据帧:
A B C Mark
3 5 6 T
4 5 2 T
3 4 5 B
5 6 7 B
3 4 5 T
2 5 2 T
For instance the table above should be split into three pandas dataframes.例如,上面的表格应该被分成三个熊猫数据框。 First dataframe the two rows with Mark "T" as one dataframe, the second dataframe the next two rows with Mark "B" and the third dataframe the last two rows with Mark "T".
第一个数据帧标记为“T”的两行作为一个数据帧,第二个数据帧标记为“B”的下两行,第三个数据帧标记为“T”的最后两行。
df1 df1
A B C Mark
3 5 6 T
4 5 2 T
df2 df2
A B C Mark
3 4 5 B
5 6 7 B
df3 df3
A B C Mark
3 4 5 T
2 5 2 T
Create dictionary of DataFrames with consecutive counter by shift
and cumsum
with convert groupby object to tuples and then to dictionary:通过将 groupby 对象转换为元组然后转换为字典,通过
shift
和cumsum
创建具有连续计数器的cumsum
字典:
dfs = dict(tuple(df.groupby(df['Mark'].ne(df['Mark'].shift()).cumsum())))
print (dfs)
{1: A B C Mark
0 3 5 6 T
1 4 5 2 T, 2: A B C Mark
2 3 4 5 B
3 5 6 7 B, 3: A B C Mark
4 3 4 5 T
5 2 5 2 T}
Select each DataFrame:选择每个数据帧:
print (dfs[1])
print (dfs[2])
print (dfs[3])
Create a dictionary as below:创建字典如下:
frames = {}
for i, grp in df.groupby(df.Mark.ne(df.Mark.shift()).cumsum()):
frames.update([('df_'+str(i),grp)])
{'df_1': A B C Mark
0 3 5 6 T
1 4 5 2 T, 'df_2': A B C Mark
2 3 4 5 B
3 5 6 7 B, 'df_3': A B C Mark
4 3 4 5 T
5 2 5 2 T}
You can then test by printing all the dfs as :然后,您可以通过将所有 dfs 打印为以下内容来进行测试:
print(frames['df_1'])
A B C Mark
0 3 5 6 T
1 4 5 2 T
Another way around wrapping this into np.array_split
for the given Post: However, np.array_split
returns a list of DataFrames hence you can list then down and even loop through the list.围绕包装成另一种方式
np.array_split
对于给定的帖子:然而, np.array_split
回报DataFrames的列表,因此你可以列出,然后下来,通过列表,甚至循环。
>>> np.array_split(df, 3)
[ A B C Mark
0 3 5 6 T
1 4 5 2 T, A B C Mark
2 3 4 5 B
3 5 6 7 B, A B C Mark
4 3 4 5 T
5 2 5 2 T]
Listing them as an individual dfs:将它们列为单独的 dfs:
>>> df[0]
A B C Mark
0 3 5 6 T
1 4 5 2 T
>>> df[1]
A B C Mark
2 3 4 5 B
3 5 6 7 B
>>> df[2]
A B C Mark
4 3 4 5 T
5 2 5 2 T
Or you can assign them names:或者您可以为它们分配名称:
df1 = df[0]
df2 = df[1]
df2 = df[2]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.