Pandas DataFrame的多个列表

Question

I have three list here 我这里有三个清单

[1,2,3,4,5]

[5,4,6,7,2]

[1,2,4,5,6,7,8,9,0]

I want this kind of output: 我想要这种输出：

A     B    C
1     5    1
2     4    2
3     6    4
4     7    5
5     2    6
           7
           8
           9
           0

I tried one syntax , but it gives me this error arrays must all be same length and another error was Length of values does not match length of index 我尝试了一种语法，但它给了我这个错误arrays must all be same length而另一个错误是Length of values does not match length of index

Is there any way to get this kind of output? 有没有办法获得这种输出？

Answer 1

This is not easily supported, but it can be done. 这不容易支持，但可以做到。 DataFrame.from_dict will with the "index" orient. DataFrame.from_dict将具有“索引”方向。 Assuming your lists are A , B , and C : 假设您的列表是A ， B和C ：

pd.DataFrame([A, B, C]).T

     0    1    2
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

Another option is using DataFrame.from_dict : 另一种选择是使用DataFrame.from_dict ：

pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T

     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

A third solution with zip_longest and DataFrame.from_records : 第三个解决方案是zip_longest和DataFrame.from_records ：

from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(A, B, C), columns=['A', 'B', 'C'])
# pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

     A    B  C
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

Answer 2

alternative is to perform a list comprehension of a Series of each list and construct a df from this: 另一种方法是对每个列表的Series执行列表理解，并从中构造一个df：

In[61]:
df = pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
df

Out[61]: 
     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

timings: 定时：

%timeit pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
%timeit pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
from itertools import zip_longest
%timeit pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

1.23 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
977 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
545 µs ± 8.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So the last method is the fastest 所以最后一种方法是最快的

Answer 3

An idea for a custom way. 自定义方式的想法。

Define a couple of methods to adjust the input data: 定义几种方法来调整输入数据：

def longest(*lists):
  return max([ len(x) for x in lists])

def equalize(col, size):
  delta = size - len(col)
  if delta == 0: return col
  return col + [None for _ in range(delta)]

To be used building the dataframe: 要用于构建数据框：

import pandas as pd

size = longest(col1, col2, col3)
df = pd.DataFrame({'a':equalize(col1, size), 'b':equalize(col2, size), 'c':equalize(col3, size)})

Which returns 哪个回报

     a    b  c
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

Pandas DataFrame的多个列表

问题描述

3 个解决方案

解决方案1
5 已采纳 2018-12-19 10:21:33

解决方案2
4 2018-12-19 10:25:47

timings: 定时：

解决方案3
0 2018-12-19 10:35:07

Pandas DataFrame的多个列表

问题描述

3 个解决方案

解决方案1 5 已采纳 2018-12-19 10:21:33

解决方案2 4 2018-12-19 10:25:47

timings: 定时：

解决方案3 0 2018-12-19 10:35:07

解决方案1
5 已采纳 2018-12-19 10:21:33

解决方案2
4 2018-12-19 10:25:47

解决方案3
0 2018-12-19 10:35:07