[英]Multiple lists to Pandas DataFrame
I have three list here 我这里有三个清单
[1,2,3,4,5]
[5,4,6,7,2]
[1,2,4,5,6,7,8,9,0]
I want this kind of output: 我想要这种输出:
A B C
1 5 1
2 4 2
3 6 4
4 7 5
5 2 6
7
8
9
0
I tried one syntax , but it gives me this error arrays must all be same length
and another error was Length of values does not match length of index
我尝试了一种语法,但它给了我这个错误arrays must all be same length
而另一个错误是Length of values does not match length of index
Is there any way to get this kind of output? 有没有办法获得这种输出?
This is not easily supported, but it can be done. 这不容易支持,但可以做到。 DataFrame.from_dict
will with the "index" orient. DataFrame.from_dict
将具有“索引”方向。 Assuming your lists are A
, B
, and C
: 假设您的列表是A
, B
和C
:
pd.DataFrame([A, B, C]).T
0 1 2
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
Another option is using DataFrame.from_dict
: 另一种选择是使用DataFrame.from_dict
:
pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
A B C
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
A third solution with zip_longest
and DataFrame.from_records
: 第三个解决方案是zip_longest
和DataFrame.from_records
:
from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(A, B, C), columns=['A', 'B', 'C'])
# pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])
A B C
0 1.0 5.0 1
1 2.0 4.0 2
2 3.0 6.0 4
3 4.0 7.0 5
4 5.0 2.0 6
5 NaN NaN 7
6 NaN NaN 8
7 NaN NaN 9
8 NaN NaN 0
alternative is to perform a list comprehension of a Series
of each list and construct a df from this: 另一种方法是对每个列表的Series
执行列表理解,并从中构造一个df:
In[61]:
df = pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
df
Out[61]:
A B C
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
%timeit pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
%timeit pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
from itertools import zip_longest
%timeit pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])
1.23 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
977 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
545 µs ± 8.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So the last method is the fastest 所以最后一种方法是最快的
An idea for a custom way. 自定义方式的想法。
Define a couple of methods to adjust the input data: 定义几种方法来调整输入数据:
def longest(*lists):
return max([ len(x) for x in lists])
def equalize(col, size):
delta = size - len(col)
if delta == 0: return col
return col + [None for _ in range(delta)]
To be used building the dataframe: 要用于构建数据框:
import pandas as pd
size = longest(col1, col2, col3)
df = pd.DataFrame({'a':equalize(col1, size), 'b':equalize(col2, size), 'c':equalize(col3, size)})
Which returns 哪个回报
a b c
0 1.0 5.0 1
1 2.0 4.0 2
2 3.0 6.0 4
3 4.0 7.0 5
4 5.0 2.0 6
5 NaN NaN 7
6 NaN NaN 8
7 NaN NaN 9
8 NaN NaN 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.