如何使数据帧列表的长度相等

Question

If I have a number of DataFrames that are inside a list like this: 如果我在列表中有许多DataFrame，如下所示：

X = pd.DataFrame({"t":[1,2,3,4,5,6,7,8],"A":[34,12,78,84,26,84,26,34], "B":[54,87,35,25,82,35,25,82], "C":[56,78,0,14,13,0,14,13], "D":[0,23,72,56,14,72,56,14], "E":[78,12,31,0,34,31,0,34]})
Y = pd.DataFrame({"t":[1,2,3],"A":[45,24,65], "B":[45,87,65], "C":[98,52,32], "D":[0,23,1], "E":[24,12, 65]})
Z = pd.DataFrame({"t":[1,2,3,4,5],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85]})

allFiles = [X, Y, Z]
list_ = []
for file_ in allFiles:
    df = file_
    df = df.sort('t')
    list_.append(df)

The list then looks like this: 该列表看起来像这样：

How could I shorten the length of every data frame, to the length of the shortest one? 我怎样才能缩短每个数据帧的长度，缩短到最短的长度？

EDIT. 编辑。 Keeping in mind that I would like to keep the list with df's 请记住，我想用df保留列表

Answer 1

You can use concat with dropna if in DataFrames are no NaN values: 如果在DataFrames中没有NaN值，则可以使用带有dropna concat ：

df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
print (df)
    A                        B                                  C              \
    A   B   C   D   E  t     A     B     C     D     E    t     A     B     C   
0  34  54  56   0  78  1  45.0  45.0  98.0   0.0  24.0  1.0  14.0  47.0  85.0   
1  12  87  78  23  12  2  24.0  87.0  52.0  23.0  12.0  2.0  96.0   7.0  45.0   
2  78  35   0  72  31  3  65.0  65.0  32.0   1.0  65.0  3.0  25.0   5.0  65.0   


      D     E    t  
0   3.0  68.0  1.0  
1  35.0  10.0  2.0  
2  12.0  45.0  3.0

And then create new list by groupby with list comprehension : 然后使用list comprehension按groupby创建新列表：

list_ = [g for i, g in df.groupby(level=0, axis=1, group_keys=False)]
print (list_)
[    A                   
    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,       B                             
      A     B     C     D     E    t
0  45.0  45.0  98.0   0.0  24.0  1.0
1  24.0  87.0  52.0  23.0  12.0  2.0
2  65.0  65.0  32.0   1.0  65.0  3.0,       C                             
      A     B     C     D     E    t
0  14.0  47.0  85.0   3.0  68.0  1.0
1  96.0   7.0  45.0  35.0  10.0  2.0
2  25.0   5.0  65.0  12.0  45.0  3.0]

But in output is Multiindex , so you need groupby by first level created by get_value and then removed by droplevel : 但是在输出中是Multiindex ，所以你需要groupby by get_value创建的第一级，然后通过droplevel删除：

df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
lvl = df.columns.get_level_values(0)
df.columns = df.columns.droplevel(0)
print (df)
    A   B   C   D   E  t     A     B     C     D     E    t     A     B     C  \
0  34  54  56   0  78  1  45.0  45.0  98.0   0.0  24.0  1.0  14.0  47.0  85.0   
1  12  87  78  23  12  2  24.0  87.0  52.0  23.0  12.0  2.0  96.0   7.0  45.0   
2  78  35   0  72  31  3  65.0  65.0  32.0   1.0  65.0  3.0  25.0   5.0  65.0   

      D     E    t  
0   3.0  68.0  1.0  
1  35.0  10.0  2.0  
2  12.0  45.0  3.0

list_ = [g for i, g in df.groupby(lvl, axis=1)]

print (list_)

[    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,       A     B     C     D     E    t
0  45.0  45.0  98.0   0.0  24.0  1.0
1  24.0  87.0  52.0  23.0  12.0  2.0
2  65.0  65.0  32.0   1.0  65.0  3.0,       A     B     C     D     E    t
0  14.0  47.0  85.0   3.0  68.0  1.0
1  96.0   7.0  45.0  35.0  10.0  2.0
2  25.0   5.0  65.0  12.0  45.0  3.0]

print (list_[0])
    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3

Another simplier solution: 另一种简化的解决方案

allFiles = [X, Y, Z]

min_len = np.min([len(df.index) for df in allFiles])
print (min_len)
3

print ([df.reindex(np.arange(min_len)) for df in allFiles])
[    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,     A   B   C   D   E  t
0  45  45  98   0  24  1
1  24  87  52  23  12  2
2  65  65  32   1  65  3,     A   B   C   D   E  t
0  14  47  85   3  68  1
1  96   7  45  35  10  2
2  25   5  65  12  45  3]

EDIT1: Solution if t is index with unique values. EDIT1：解决方案，如果t是具有unique值的index 。

Get shortest index and then use reindex in list comprehension : 获取最短index ，然后在list comprehension使用reindex ：

X = X.set_index('t')
Y = Y.set_index('t')
Z = Z.set_index('t')
allFiles = [X, Y, Z]

min_idx = min([df.index for df in allFiles], key=len)
print (min_idx)
Int64Index([1, 2, 3], dtype='int64', name='t')

print ([df.reindex(min_idx) for df in allFiles])
[    A   B   C   D   E
t                    
1  34  54  56   0  78
2  12  87  78  23  12
3  78  35   0  72  31,     A   B   C   D   E
t                    
1  45  45  98   0  24
2  24  87  52  23  12
3  65  65  32   1  65,     A   B   C   D   E
t                    
1  14  47  85   3  68
2  96   7  45  35  10
3  25   5  65  12  45]

如何使数据帧列表的长度相等

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-12-08 11:45:57

如何使数据帧列表的长度相等

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-12-08 11:45:57

解决方案1
3 已采纳 2016-12-08 11:45:57