[英]Append a column to a dataframe in Pandas
I am trying to append a numpy.darray to a dataframe with little success.我试图将 numpy.darray 附加到数据帧中,但收效甚微。 The dataframe is called user2 and the numpy.darray is called CallTime.数据帧称为 user2,numpy.darray 称为 CallTime。
I tried:我试过:
user2["CallTime"] = CallTime.values
but I get an error message:但我收到一条错误消息:
Traceback (most recent call last):
File "<ipython-input-53-fa327550a3e0>", line 1, in <module>
user2["CallTime"] = CallTime.values
AttributeError: 'numpy.ndarray' object has no attribute 'values'
Then I tried:然后我尝试:
user2["CallTime"] = user2.assign(CallTime = CallTime.values)
but I get again the same error message as above.但我再次收到与上述相同的错误消息。
I also tried to use the merge command but for some reason it was not recognized by Python although I have imported pandas.我也尝试使用合并命令,但由于某种原因,尽管我导入了熊猫,但 Python 无法识别它。 In the example below CallTime is a dataframe:在下面的示例中,CallTime 是一个数据帧:
user3 = merge(user2, CallTime)
Error message:错误信息:
Traceback (most recent call last):
File "<ipython-input-56-0ebf65759df3>", line 1, in <module>
user3 = merge(user2, CallTime)
NameError: name 'merge' is not defined
Any ideas?有任何想法吗?
Thank you!谢谢!
pandas DataFrame
is a 2-dimensional data structure, and each column of a DataFrame
is a 1-dimensional Series
. pandas DataFrame
是一个二维的数据结构,一个DataFrame
每一列都是一个一维的Series
。 So if you want to add one column to a DataFrame, you must first convert it into Series
.因此,如果要向 DataFrame 添加一列,则必须先将其转换为Series
。 np.ndarray is a multi-dimensional data structure. np.ndarray 是一种多维数据结构。 From your code, I believe the shape of np.ndarray CallTime
should be nx1
( n
rows and 1
colmun), and it's easy to convert it to a Series.从您的代码中,我相信 np.ndarray CallTime
的形状应该是nx1
( n
行和1
列),并且很容易将其转换为系列。 Here is an example:下面是一个例子:
df = DataFrame(np.random.rand(5,2), columns=['A', 'B'])
This creates a dataframe df
with two columns 'A' and 'B', and 5
rows.这将创建一个具有两列“A”和“B”以及5
行的数据框df
。
CallTime = np.random.rand(5,1)
Assume this is your np.ndarray
data CallTime
假设这是您的np.ndarray
数据CallTime
df['C'] = pd.Series(CallTime[:, 0])
This will add a new column to df
.这将向df
添加一个新列。 Here CallTime[:,0]
is used to select first column of CallTime
, so if you want to use different column from np.ndarray
, change the index.这里CallTime[:,0]
用于选择CallTime
第一列,因此如果您想使用与np.ndarray
不同的列,请更改索引。
Please make sure that the number of rows for df
and CallTime
are equal.请确保df
和CallTime
的行数相等。
Hope this would be helpful.希望这会有所帮助。
I think instead to provide only documentation, I will try to provide a sample:我认为只提供文档,我将尝试提供一个示例:
import numpy as np
import pandas as pd
data = {'A': [2010, 2011, 2012],
'B': ['Bears', 'Bears', 'Bears'],
'C': [11, 8, 10],
'D': [5, 8, 6]}
user2 = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])
#creating the array what will append to pandas dataframe user2
CallTime = np.array([1, 2, 3])
#convert to list the ndarray array CallTime, if you your CallTime is a matrix than after converting to list you can iterate or you can convert into dataframe and just append column required or just join the dataframe.
user2.loc[:,'CallTime'] = CallTime.tolist()
print(user2)
I think this one will help, also check numpy.ndarray.tolist documentation if need to find out why we need the list and how to do, also here is example how to create dataframe from numpy in case of need https://stackoverflow.com/a/35245297/2027457我认为这个会有所帮助,如果需要找出我们为什么需要列表以及如何做,还可以查看numpy.ndarray.tolist文档,这里还有示例如何在需要https://stackoverflow 的情况下从 numpy 创建数据帧。 com/a/35245297/2027457
Here is a simple solution.这是一个简单的解决方案。
user2["CallTime"] = CallTime
The problem here for you is that CallTime is an array, you couldn't use .values.您的问题是 CallTime 是一个数组,您不能使用 .values。 Since .values is used to convert a dataframe to array.由于 .values 用于将数据帧转换为数组。 For example,例如,
df = DataFrame(np.random.rand(10,2), columns=['A', 'B'])
# The followings are correct
df.values
df['A'].values
df['B'].values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.