简体   繁体   English

无法使用python-pandas将数据框中的列表传输到numpy数组

[英]Fail to transfer the list in data frame to numpy array with python-pandas

For a data frame df :对于数据框df

name       list1                    list2
a          [1, 3, 10, 12, 20..]     [2, 6, 23, 29...]
b          [2, 10, 14, 3]           [4, 7, 8, 13...]
c          []                       [98, 101, 200]
...

I want to transfer the list1 and list2 to np.array and then hstack them.我想将list1list2传输到np.array然后hstack它们。 Here is what I did:这是我所做的:

df.pv = df.apply(lambda row: np.hstack((np.asarray(row.list1), np.asarray(row.list2))), axis=1)

And I got such an error:我得到了这样一个错误:

ValueError: Shape of passed values is (138493, 175), indices imply (138493, 4)

Where 138493==len(df)其中138493==len(df)

Please note that some value in list1 and list2 is empty list, [] .请注意list1list2中的某些值是空列表[] And the length of list are different among rows.并且列表的长度在行之间是不同的。 Do you know what is the reason how can I fix the problem?你知道我该如何解决这个问题的原因是什么吗? Thanks in advance!提前致谢!

EDIT:编辑:

When I just try to convert one list to array:当我只是尝试将一个列表转换为数组时:

df.apply(lambda row: np.asarray(row.list1), axis=1)

An error also occurs:还会出现错误:

ValueError: Empty data passed with indices specified.

Your apply function is almost correct.您的应用功能几乎是正确的。 All you have to do - convert the output of the np.hstack() function back to a python list.您所要做的 - 将np.hstack()函数的输出转换回 python 列表。

df.apply(lambda row: list(np.hstack((np.asarray(row.list1), np.asarray(row.list2)))), axis=1)

The code is shown below (including the df creation):代码如下所示(包括df的创建):

df = pd.DataFrame([('a',[1, 3, 10, 12, 20],[2, 6, 23, 29]),
                   ('b',[2, 10, 1.4, 3],[4, 7, 8, 13]),
                   ('c',[],[98, 101, 200])],
                   columns = ['name','list1','list2'])

df['list3'] = df.apply(lambda row: list(np.hstack((np.asarray(row.list1), np.asarray(row.list2)))), axis=1)

print(df)

Output:输出:

0              [1, 3, 10, 12, 20, 2, 6, 23, 29]
1    [2.0, 10.0, 1.4, 3.0, 4.0, 7.0, 8.0, 13.0]
2                          [98.0, 101.0, 200.0]
Name: list3, dtype: object

If you want a numpy array, the only way I could get it to work is:如果你想要一个 numpy 数组,我可以让它工作的唯一方法是:

df['list3'] = df['list3'].apply(lambda x: np.array(x))

print(type(df['list3'].ix[0]))
Out[] : numpy.ndarray

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM