[英]Applying Numpy functions on Pandas data frame
I have a numpy array as following: 我有一个如下的numpy数组:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
The array is called myArray, and I perform two indexing operations on the 2D array and get following results: 该数组称为myArray,我在2D数组上执行两个索引操作,并得到以下结果:
In[1]: a2 = myArray[1:]
a2
Out[1]:array([[3, 4],
[5, 6],
[7, 8]])
In[2]: a1 = myArray[:-1]
a1
Out[2]:array([[1, 2],
[3, 4],
[5, 6]])
Now, I perform numpy function to get following results: 现在,我执行numpy函数以获得以下结果:
In[]: theta = np.arccos((a1*a2).sum(axis= 1)/(np.sqrt((a1**2).sum(axis= 1)*(a2**2).sum(axis= 1))))
theta
Out[]: array([ 0.1798535 , 0.05123717, 0.02409172])
I perform the same sequence of operations on an equivalent data frame: 我在等效数据帧上执行相同的操作序列:
In[]: df = pd.DataFrame(data = myArray, columns = ["x", "y"])
df
Out[]:
x y
0 1 2
1 3 4
3 5 6
4 7 8
In[]: b2 = df[["x", "y"]].iloc[1:]
Out[]: b2
x y
1 3 4
2 5 6
3 7 8
In[]: b1 = df[["x", "y"]].iloc[:-1]
b1
Out[]:
x y
0 1 2
1 3 4
2 5 6
But now when I am trying to get theta for the data frame, I am only getting 0's and NaN values 但是现在当我尝试获取数据帧的theta时,我只得到0和NaN值
In[]: theta2 = np.arccos((b1*b2).sum(axis= 1)/(np.sqrt((b1**2).sum(axis= 1)*(b2**2).sum(axis= 1))))
theta2
Out[]:
0 NaN
1 0.0
2 0.0
3 NaN
dtype: float64
Is it the right way I am applying the numpy functions to indexed data frames ? 我将numpy函数应用于索引数据帧是否正确? How should I get the same result for theta when applying it for data frame ?
将theta应用于数据帧时,如何获得相同的theta结果?
UPDATE 更新
As suggested below, using b1.values and b2.values works, but now when I am constructing a function, and applying it to the df, I keep getting value error: 如下所示,使用b1.values和b2.values可以正常工作,但是现在当我构造一个函数并将其应用于df时,我不断收到值错误:
def theta(group):
b2 = df[["x", "y"]].iloc[1:]
b1 = df[["x", "y"]].iloc[:-1]
t = np.arccos((b1.values*b2.values).sum(axis= 1)/
(np.sqrt((b1.values**2).sum(axis= 1)*(b2.values**2).sum(axis= 1))))
return t
df2 = df.apply(theta)
This gives ValueError 这给出了ValueError
ValueError: Shape of passed values is (2, 3), indices imply (2, 4)
Please let me know where I am wrong. 请让我知道我错了。
Thanks in advance. 提前致谢。
The index of b1 and b2 is not aligned. b1和b2的索引未对齐。
If you do: 如果您这样做:
b2.index=b1.index
np.arccos((b1*b2).sum(axis= 1)/(np.sqrt((b1**2).sum(axis= 1)*(b2**2).sum(axis= 1))))
Should output: 应该输出:
Out[75]:
0 0.179853
1 0.051237
2 0.024092
dtype: float64
If you don't want to change index, you can call df.values explicitly: 如果您不想更改索引,则可以显式调用df.values:
np.arccos((b1.values*b2.values).sum(axis= 1)/(np.sqrt((b1.values**2).sum(axis= 1)*(b2.values**2).sum(axis= 1))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.