[英]Function works on each row of data frame, but not using df.apply
I have this pandas dataframe containing two samples X and Y for each row: 我有这个pandas数据帧,每行包含两个样本X和Y:
import pandas as pd
import numpy as np
df = pd.DataFrame({'X': [np.random.normal(0, 1, 10),
np.random.normal(0, 1, 10),
np.random.normal(0, 1, 10)],
'Y': [np.random.normal(0, 1, 10),
np.random.normal(0, 1, 10),
np.random.normal(0, 1, 10)]})
I want to use a function ttest_ind()
(a statistical test taking two samples as input) on each row, and take the first element of the response (the function returns two elements): 我想在每一行上使用函数ttest_ind()
(以两个样本作为输入的统计测试),并获取响应的第一个元素(该函数返回两个元素):
If I do it for a given row, eg 1st row, it works: 如果我为给定的行(例如第1行)执行此操作,则可以:
from scipy import stats stats.ttest_ind(df['X'][0], df['Y'][0], equal_var = False)[0] # Returns a float
However, if I use apply to do it on each row, I get an error: 但是,如果我使用apply在每一行上执行它,我会收到一个错误:
df.apply(lambda x: stats.ttest_ind(x['X'], x['Y'], equal_var = False)[0]) # Throws the following error: Traceback (most recent call last): File "pandas\\_libs\\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc File "pandas\\_libs\\hashtable_class_helper.pxi", line 759, in pandas._libs.hashtable.Int64HashTable.get_item TypeError: an integer is required During handling of the above exception, another exception occurred: ... KeyError: ('X', 'occurred at index X')
What am I doing wrong? 我究竟做错了什么?
You just need to specify the axis on which you want to apply your function. 您只需指定要应用函数的轴。 Take a look at the relevant docs for apply()
. 查看apply()
的相关文档 。 In short, axis = 1
says "apply the function to each row of my dataframe". 简而言之, axis = 1
表示“将函数应用于我的数据帧的每一行”。 The default is axis = 0
, which tries to apply the function to each column instead. 默认值为axis = 0
,它尝试将函数应用于每列。
df.apply(lambda x: stats.ttest_ind(x['X'], x['Y'], equal_var = False)[0], axis=1)
0 0.985997
1 -0.197396
2 0.034277
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.