简体   繁体   English

如何使数据帧列表的长度相等

[英]How to make a list of dataframes all equal in length

If I have a number of DataFrames that are inside a list like this: 如果我在列表中有许多DataFrame,如下所示:

X = pd.DataFrame({"t":[1,2,3,4,5,6,7,8],"A":[34,12,78,84,26,84,26,34], "B":[54,87,35,25,82,35,25,82], "C":[56,78,0,14,13,0,14,13], "D":[0,23,72,56,14,72,56,14], "E":[78,12,31,0,34,31,0,34]})
Y = pd.DataFrame({"t":[1,2,3],"A":[45,24,65], "B":[45,87,65], "C":[98,52,32], "D":[0,23,1], "E":[24,12, 65]})
Z = pd.DataFrame({"t":[1,2,3,4,5],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85]})

allFiles = [X, Y, Z]
list_ = []
for file_ in allFiles:
    df = file_
    df = df.sort('t')
    list_.append(df) 

The list then looks like this: 该列表看起来像这样:

在此输入图像描述

How could I shorten the length of every data frame, to the length of the shortest one? 我怎样才能缩短每个数据帧的长度,缩短到最短的长度?

EDIT. 编辑。 Keeping in mind that I would like to keep the list with df's 请记住,我想用df保留列表

You can use concat with dropna if in DataFrames are no NaN values: 如果在DataFrames中没有NaN值,则可以使用带有dropna concat

df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
print (df)
    A                        B                                  C              \
    A   B   C   D   E  t     A     B     C     D     E    t     A     B     C   
0  34  54  56   0  78  1  45.0  45.0  98.0   0.0  24.0  1.0  14.0  47.0  85.0   
1  12  87  78  23  12  2  24.0  87.0  52.0  23.0  12.0  2.0  96.0   7.0  45.0   
2  78  35   0  72  31  3  65.0  65.0  32.0   1.0  65.0  3.0  25.0   5.0  65.0   


      D     E    t  
0   3.0  68.0  1.0  
1  35.0  10.0  2.0  
2  12.0  45.0  3.0  

And then create new list by groupby with list comprehension : 然后使用list comprehensiongroupby创建新列表:

list_ = [g for i, g in df.groupby(level=0, axis=1, group_keys=False)]
print (list_)
[    A                   
    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,       B                             
      A     B     C     D     E    t
0  45.0  45.0  98.0   0.0  24.0  1.0
1  24.0  87.0  52.0  23.0  12.0  2.0
2  65.0  65.0  32.0   1.0  65.0  3.0,       C                             
      A     B     C     D     E    t
0  14.0  47.0  85.0   3.0  68.0  1.0
1  96.0   7.0  45.0  35.0  10.0  2.0
2  25.0   5.0  65.0  12.0  45.0  3.0]

But in output is Multiindex , so you need groupby by first level created by get_value and then removed by droplevel : 但是在输出中是Multiindex ,所以你需要groupby by get_value创建的第一级,然后通过droplevel删除:

df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
lvl = df.columns.get_level_values(0)
df.columns = df.columns.droplevel(0)
print (df)
    A   B   C   D   E  t     A     B     C     D     E    t     A     B     C  \
0  34  54  56   0  78  1  45.0  45.0  98.0   0.0  24.0  1.0  14.0  47.0  85.0   
1  12  87  78  23  12  2  24.0  87.0  52.0  23.0  12.0  2.0  96.0   7.0  45.0   
2  78  35   0  72  31  3  65.0  65.0  32.0   1.0  65.0  3.0  25.0   5.0  65.0   

      D     E    t  
0   3.0  68.0  1.0  
1  35.0  10.0  2.0  
2  12.0  45.0  3.0  
list_ = [g for i, g in df.groupby(lvl, axis=1)]

print (list_)

[    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,       A     B     C     D     E    t
0  45.0  45.0  98.0   0.0  24.0  1.0
1  24.0  87.0  52.0  23.0  12.0  2.0
2  65.0  65.0  32.0   1.0  65.0  3.0,       A     B     C     D     E    t
0  14.0  47.0  85.0   3.0  68.0  1.0
1  96.0   7.0  45.0  35.0  10.0  2.0
2  25.0   5.0  65.0  12.0  45.0  3.0]

print (list_[0])
    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3

Another simplier solution: 另一种简化的解决方案

allFiles = [X, Y, Z]

min_len = np.min([len(df.index) for df in allFiles])
print (min_len)
3

print ([df.reindex(np.arange(min_len)) for df in allFiles])
[    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,     A   B   C   D   E  t
0  45  45  98   0  24  1
1  24  87  52  23  12  2
2  65  65  32   1  65  3,     A   B   C   D   E  t
0  14  47  85   3  68  1
1  96   7  45  35  10  2
2  25   5  65  12  45  3]

EDIT1: Solution if t is index with unique values. EDIT1:解决方案,如果t是具有unique值的index

Get shortest index and then use reindex in list comprehension : 获取最短index ,然后在list comprehension使用reindex

X = X.set_index('t')
Y = Y.set_index('t')
Z = Z.set_index('t')
allFiles = [X, Y, Z]

min_idx = min([df.index for df in allFiles], key=len)
print (min_idx)
Int64Index([1, 2, 3], dtype='int64', name='t')

print ([df.reindex(min_idx) for df in allFiles])
[    A   B   C   D   E
t                    
1  34  54  56   0  78
2  12  87  78  23  12
3  78  35   0  72  31,     A   B   C   D   E
t                    
1  45  45  98   0  24
2  24  87  52  23  12
3  65  65  32   1  65,     A   B   C   D   E
t                    
1  14  47  85   3  68
2  96   7  45  35  10
3  25   5  65  12  45]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使不同长度的不同数据帧的长度变得相等(下采样和上采样) - How to make different dataframes of different lengths become equal in length (downsampling and upsampling) 使字典列表中的字典长度相等 - Make dictionaries in list of dictionaries equal length 如何使DataFrame列的长度相等? - How to make DataFrame column of equal length? 如何打印出与列表长度相等的'&'字符串 - How to print out a string of a '&', equal to the length of a list 如何通过添加列表来使列表列表中的所有列表具有相同的长度 - How to make all lists in a list of lists the same length by adding to them 如何从 python 中的列表中删除除一个长度相等的元素外的所有元素? - How to remove from a list in python, all but one of the elements that have equal length? 如何编码 pandas 数据帧列表中的所有标签? - How to code all labels in list of pandas dataframes? 如何将列添加到列表中的所有数据框 - How to add columns to all dataframes in a list 如何加载具有与 tensorflow 输入相同长度列表的列的数据帧? - How do I load dataframes with columns having lists of equal length as tensorflow inputs? 如何按每个数据帧的长度拆分/分组数据帧列表 - How to split/group a list of dataframes by the length of each data frames
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM