[英]How to make a list of dataframes all equal in length
If I have a number of DataFrames that are inside a list like this: 如果我在列表中有许多DataFrame,如下所示:
X = pd.DataFrame({"t":[1,2,3,4,5,6,7,8],"A":[34,12,78,84,26,84,26,34], "B":[54,87,35,25,82,35,25,82], "C":[56,78,0,14,13,0,14,13], "D":[0,23,72,56,14,72,56,14], "E":[78,12,31,0,34,31,0,34]})
Y = pd.DataFrame({"t":[1,2,3],"A":[45,24,65], "B":[45,87,65], "C":[98,52,32], "D":[0,23,1], "E":[24,12, 65]})
Z = pd.DataFrame({"t":[1,2,3,4,5],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85]})
allFiles = [X, Y, Z]
list_ = []
for file_ in allFiles:
df = file_
df = df.sort('t')
list_.append(df)
The list then looks like this: 该列表看起来像这样:
How could I shorten the length of every data frame, to the length of the shortest one? 我怎样才能缩短每个数据帧的长度,缩短到最短的长度?
EDIT. 编辑。 Keeping in mind that I would like to keep the list with df's
请记住,我想用df保留列表
You can use concat
with dropna
if in DataFrames
are no NaN
values: 如果在
DataFrames
中没有NaN
值,则可以使用带有dropna
concat
:
df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
print (df)
A B C \
A B C D E t A B C D E t A B C
0 34 54 56 0 78 1 45.0 45.0 98.0 0.0 24.0 1.0 14.0 47.0 85.0
1 12 87 78 23 12 2 24.0 87.0 52.0 23.0 12.0 2.0 96.0 7.0 45.0
2 78 35 0 72 31 3 65.0 65.0 32.0 1.0 65.0 3.0 25.0 5.0 65.0
D E t
0 3.0 68.0 1.0
1 35.0 10.0 2.0
2 12.0 45.0 3.0
And then create new list by groupby
with list comprehension
: 然后使用
list comprehension
按groupby
创建新列表:
list_ = [g for i, g in df.groupby(level=0, axis=1, group_keys=False)]
print (list_)
[ A
A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, B
A B C D E t
0 45.0 45.0 98.0 0.0 24.0 1.0
1 24.0 87.0 52.0 23.0 12.0 2.0
2 65.0 65.0 32.0 1.0 65.0 3.0, C
A B C D E t
0 14.0 47.0 85.0 3.0 68.0 1.0
1 96.0 7.0 45.0 35.0 10.0 2.0
2 25.0 5.0 65.0 12.0 45.0 3.0]
But in output is Multiindex
, so you need groupby
by first level created by get_value
and then removed by droplevel
: 但是在输出中是
Multiindex
,所以你需要groupby
by get_value
创建的第一级,然后通过droplevel
删除:
df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
lvl = df.columns.get_level_values(0)
df.columns = df.columns.droplevel(0)
print (df)
A B C D E t A B C D E t A B C \
0 34 54 56 0 78 1 45.0 45.0 98.0 0.0 24.0 1.0 14.0 47.0 85.0
1 12 87 78 23 12 2 24.0 87.0 52.0 23.0 12.0 2.0 96.0 7.0 45.0
2 78 35 0 72 31 3 65.0 65.0 32.0 1.0 65.0 3.0 25.0 5.0 65.0
D E t
0 3.0 68.0 1.0
1 35.0 10.0 2.0
2 12.0 45.0 3.0
list_ = [g for i, g in df.groupby(lvl, axis=1)]
print (list_)
[ A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, A B C D E t
0 45.0 45.0 98.0 0.0 24.0 1.0
1 24.0 87.0 52.0 23.0 12.0 2.0
2 65.0 65.0 32.0 1.0 65.0 3.0, A B C D E t
0 14.0 47.0 85.0 3.0 68.0 1.0
1 96.0 7.0 45.0 35.0 10.0 2.0
2 25.0 5.0 65.0 12.0 45.0 3.0]
print (list_[0])
A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3
Another simplier solution: 另一种简化的解决方案
allFiles = [X, Y, Z]
min_len = np.min([len(df.index) for df in allFiles])
print (min_len)
3
print ([df.reindex(np.arange(min_len)) for df in allFiles])
[ A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, A B C D E t
0 45 45 98 0 24 1
1 24 87 52 23 12 2
2 65 65 32 1 65 3, A B C D E t
0 14 47 85 3 68 1
1 96 7 45 35 10 2
2 25 5 65 12 45 3]
EDIT1: Solution if t
is index
with unique
values. EDIT1:解决方案,如果
t
是具有unique
值的index
。
Get shortest index
and then use reindex
in list comprehension
: 获取最短
index
,然后在list comprehension
使用reindex
:
X = X.set_index('t')
Y = Y.set_index('t')
Z = Z.set_index('t')
allFiles = [X, Y, Z]
min_idx = min([df.index for df in allFiles], key=len)
print (min_idx)
Int64Index([1, 2, 3], dtype='int64', name='t')
print ([df.reindex(min_idx) for df in allFiles])
[ A B C D E
t
1 34 54 56 0 78
2 12 87 78 23 12
3 78 35 0 72 31, A B C D E
t
1 45 45 98 0 24
2 24 87 52 23 12
3 65 65 32 1 65, A B C D E
t
1 14 47 85 3 68
2 96 7 45 35 10
3 25 5 65 12 45]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.