简体   繁体   English

Pandas DataFrame的多个列表

[英]Multiple lists to Pandas DataFrame

I have three list here 我这里有三个清单

[1,2,3,4,5]

[5,4,6,7,2]

[1,2,4,5,6,7,8,9,0]

I want this kind of output: 我想要这种输出:

A     B    C
1     5    1
2     4    2
3     6    4
4     7    5
5     2    6
           7
           8
           9
           0

I tried one syntax , but it gives me this error arrays must all be same length and another error was Length of values does not match length of index 我尝试了一种语法,但它给了我这个错误arrays must all be same length而另一个错误是Length of values does not match length of index

Is there any way to get this kind of output? 有没有办法获得这种输出?

This is not easily supported, but it can be done. 这不容易支持,但可以做到。 DataFrame.from_dict will with the "index" orient. DataFrame.from_dict将具有“索引”方向。 Assuming your lists are A , B , and C : 假设您的列表是ABC

pd.DataFrame([A, B, C]).T

     0    1    2
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

Another option is using DataFrame.from_dict : 另一种选择是使用DataFrame.from_dict

pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T

     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

A third solution with zip_longest and DataFrame.from_records : 第三个解决方案是zip_longestDataFrame.from_records

from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(A, B, C), columns=['A', 'B', 'C'])
# pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

     A    B  C
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

alternative is to perform a list comprehension of a Series of each list and construct a df from this: 另一种方法是对每个列表的Series执行列表理解,并从中构造一个df:

In[61]:
df = pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
df

Out[61]: 
     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

timings: 定时:

%timeit pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
%timeit pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
from itertools import zip_longest
%timeit pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

1.23 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
977 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
545 µs ± 8.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So the last method is the fastest 所以最后一种方法是最快的

An idea for a custom way. 自定义方式的想法。

Define a couple of methods to adjust the input data: 定义几种方法来调整输入数据:

def longest(*lists):
  return max([ len(x) for x in lists])

def equalize(col, size):
  delta = size - len(col)
  if delta == 0: return col
  return col + [None for _ in range(delta)]

To be used building the dataframe: 要用于构建数据框:

import pandas as pd

size = longest(col1, col2, col3)
df = pd.DataFrame({'a':equalize(col1, size), 'b':equalize(col2, size), 'c':equalize(col3, size)})

Which returns 哪个回报

     a    b  c
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM