如何将不同列大小的 pandas dataframe 拆分为单独的数据帧？

Question

I have a large pandas dataframe, consisting of a different number of columns throughout the dataframe.我有一个大的 pandas dataframe，由整个 dataframe 中不同数量的列组成。 Here is an example: Current dataframe example这是一个示例：当前 dataframe 示例

I would like to split the dataframe into multiple dataframes, based on the number of columns it has.我想根据 dataframe 的列数将其拆分为多个数据帧。

Example output image here: Output image示例 output 图像在这里： Output 图像

Thanks.谢谢。

Answer 1

If you have a dataframe of say 10 columns and you want to put the records with 3 NaN values in another result dataframe as those with 1 NaN , you can do this as follows:如果您有一个 dataframe 比如说 10 列，并且您想将具有 3 个NaN值的记录放在另一个结果 dataframe 和具有 1 个NaN的记录中，您可以按如下方式执行此操作：

# evaluate the number of NaNs per row
num_counts=df.isna().sum('columns')
# group by this number and add the grouped
# dataframe to a dictionary
results= dict()
num_counts=df.isna().sum('columns')
for key, sub_df in df.groupby(num_counts):
    results[key]= sub_df

After executing this code, results contains subsets of df where each subset contains the same number of NaN s (so the same number of non- NaN s).执行此代码后，结果包含df的子集，其中每个子集包含相同数量的NaN （因此相同数量的非NaN ）。

If you want to write your results to a excel file, you just need to execute the following code:如果要将结果写入 excel 文件，只需执行以下代码：

with pd.ExcelWriter('sorted_output.xlsx') as writer:
    for key, sub_df in results.items():
        # if you want to avoid the detour of using dicitonaries
        # just replace the previous line by
        # for key, sub_df in df.groupby(num_counts):
        sub_df.to_excel(
            writer,
            sheet_name=f'missing {key}',
            na_rep='',
            inf_rep='inf',
            float_format=None,
            index=True,
            index_label=True,
            header=True)

Example:例子：

# create an example dataframe
df=pd.DataFrame(dict(a=[1, 2, 3, 4, 5, 6], b=list('abbcac')))
df.loc[[2, 4, 5], 'c']= list('xyz')
df.loc[[2, 3, 4], 'd']= list('vxw')
df.loc[[1, 2], 'e']= list('qw')

It looks like this:它看起来像这样：

Out[58]: 
   a  b    c    d    e
0  1  a  NaN  NaN  NaN
1  2  b  NaN  NaN    q
2  3  b    x    v    w
3  4  c  NaN    x  NaN
4  5  a    y    w  NaN
5  6  c    z  NaN  NaN

If you execute the code above on this dataframe, you get a dictionary with the following content:如果你在这个 dataframe 上执行上面的代码，你会得到一个包含以下内容的字典：

0:    a  b  c  d  e
   2  3  b  x  v  w

1:    a  b  c  d    e
   4  5  a  y  w  NaN

2:    a  b    c    d    e
   1  2  b  NaN  NaN    q
   3  4  c  NaN    x  NaN
   5  6  c    z  NaN  NaN

3:    a  b    c    d    e
   0  1  a  NaN  NaN  NaN

The keys of the dictionary are the number of NaN s in the row and the values are the dataframes which contain only rows with that number of NaN s in them.字典的键是行中NaN的数量，值是数据帧，其中仅包含具有该数量NaN的行。

Answer 2

If I'm getting you right, what you want to do is to split existing 1 dataframe with n columns into ceil(n/5) dataframes, each with 5 columns, and the last one with the reminder of n/5 .如果我说得对，您要做的是将现有的 1 dataframe 与n列拆分为ceil(n/5)数据帧，每个数据帧有 5 列，最后一个带有n/5的提醒。

If that's the case this will do the trick:如果是这种情况，这将起到作用：

import pandas as pd
import math

max_cols=5

dt={"a": [1,2,3], "b": [6,5,3], "c": [8,4,2], "d": [8,4,0], "e": [1,9,5], "f": [9,7,9]}

df=pd.DataFrame(data=dt)

dfs=[df[df.columns[max_cols*i:max_cols*i+max_cols]] for i in range(math.ceil(len(df.columns)/max_cols))]

for el in dfs:
    print(el)

And output:和 output：

   a  b  c  d  e
0  1  6  8  8  1                                            
1  2  5  4  4  9                                            
2  3  3  2  0  5                                               
   f                                                        
0  9                                                        
1  7                                                        
2  9                                                        

[Program finished]

如何将不同列大小的 pandas dataframe 拆分为单独的数据帧？

问题描述

2 个解决方案

解决方案1
1 2019-10-13 09:13:03

解决方案2
1 2019-10-13 14:19:50

如何将不同列大小的 pandas dataframe 拆分为单独的数据帧？

问题描述

2 个解决方案

解决方案1 1 2019-10-13 09:13:03

解决方案2 1 2019-10-13 14:19:50

解决方案1
1 2019-10-13 09:13:03

解决方案2
1 2019-10-13 14:19:50