简体   繁体   English

在数据框中联接列表的两种方法:行和列

[英]Two ways to join lists in dataframe: as rows and columns

I have two lists: 我有两个清单:

l1 = ['0a',22,44]
l2 = ['0b',25,55,66]

Now I join them so that each list becomes a column of a data frame: 现在,我加入它们,以便每个列表成为数据框的一列:

import pandas as p
df1 = p.DataFrame(zip(l1,l2))
df1

I received the data frame with 3 rows and 2 columns (the value 66 of l2 is missed). 我收到了3行2列的数据帧(丢失了l2的值66 )。 It looks identical to the definition of ndarray , which says: " all columns must have the same number of rows if ndarray is passed into dataframe ". 它看起来与ndarray的定义相同,该定义表示:“ 如果将ndarray传递到数据帧中,则所有列的行数必须相同 ”。 But I don't work with ndarray ! 但是我不使用ndarray

If, however, I join lists as rows of a data frame, then Python saves 66 : 但是,如果我将列表作为数据框的行加入,则Python保存66

df2 = p.DataFrame([l1,l2])

Is there any way to pass lists into dataframe as columns, while saving all values of lists in dataframe 有什么方法可以将列表作为列传递到数据框中,同时将列表的所有值保存在数据框中

Function zip returned list which truncated in length to the length of the shortest argument sequence. 函数zip返回的列表的长度被截断为最短参数序列的长度。 So result will be: 因此结果将是:

In [1]: zip(l1,l2)
Out[1]: [('0a', '0b'), (22, 25), (44, 55)]

To save value 66 use izip_longest from itertools: 要保存值66使用itertools中的izip_longest

In [3]: p.DataFrame(list(itertools.izip_longest(l1, l2)))
Out[3]:
      0   1
0    0a  0b
1    22  25
2    44  55
3  None  66

Or you can use map with None . 或者您可以将mapNone一起使用。 (but map changed in Python 3.x, so that only works in Python 2.x): (但地图在Python 3.x中已更改,因此仅在Python 2.x中有效):

In [4]: p.DataFrame(map(None, l1, l2))
Out[4]:
      0   1
0    0a  0b
1    22  25
2    44  55
3  None  66

The problem is actually with your zip statement: 问题实际上与您的zip语句有关:

>>> zip(l1,l2)
[('0a', '0b'), (22, 25), (44, 55)]

You can create a Series for each of your lists and then concatenate them to create your data frame. 您可以为每个列表创建一个系列,然后将它们串联以创建数据框。 Here, I use a dictionary comprehension to create the series. 在这里,我使用字典理解来创建系列。 concat requires an NDFrame object, so I first create a DataFrame from each of the Series. concat需要一个NDFrame对象,因此我首先从每个Series创建一个DataFrame。

series = {col_name: values 
          for col_name, values in zip([l1[0], l2[0]], 
                                      [l1[1:], l2[1:]])}

df = pd.concat([pd.DataFrame(s, columns=[col]) for col, s in series.iteritems()], axis=1)
>>> df
   0b  0a
0  25  22
1  55  44
2  66 NaN

Also, it appeared that the first element in each list was actually the title to the Series, so I took the liberty of using the first element as the series name. 而且,似乎每个列表中的第一个元素实际上是系列的标题,因此我可以随意使用第一个元素作为系列名称。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM