I have a problem with doing a t-test in scipy that's driving me slowly crazy. It should be simple to resolve, but nothing I do works and there's no solution I can find through extensive searching. I'm using Spyder on the latest distribution of Anaconda.
Specifically: I want to compare means between two columns––'Trait_A' and 'Trait_B'––in a pandas dataframe that I've imported from a csv file. Some of the values in one of the columns are 'Nan' ('Not a Number'). The default setting on the independent samples scipy t-test function doesn't accommodate 'NaN' values. However, setting the 'nan_policy' parameter to 'omit' should deal with this . Nevertheless, when I do, the test statistic and p value come back as 'NaN.' When I restrict the range of values covered to actual numbers, the test works fine. My data and code are below; can anyone suggest what I'm doing wrong? Thanks!
Data:
Trait_A Trait_B
0 1.714286 0.000000
1 4.275862 4.000000
2 0.500000 4.625000
3 1.000000 0.000000
4 1.000000 4.000000
5 1.142857 1.000000
6 2.000000 1.000000
7 9.416667 1.956522
8 2.052632 0.571429
9 2.100000 0.166667
10 0.666667 0.000000
11 2.333333 1.705882
12 2.768145 NaN
13 0.000000 NaN
14 6.333333 NaN
15 0.928571 NaN
My code:
import pandas as pd
import scipy.stats as sp
data= pd.read_csv("filepath/Data2.csv")
print (sp.stats.ttest_ind(data['Trait_A'], data['Trait_B'], nan_policy='omit'))
My result:
Ttest_indResult(statistic=nan, pvalue=nan)
It seems like a bug. You can drop nan
s before passing them to the t-test:
sp.stats.ttest_ind(data.dropna()['Trait_A'], data.dropna()['Trait_B'])
Ttest_indResult(statistic=0.88752464718609214, pvalue=0.38439692093551037)
The bug is in line 3885, in file scipy/scipy/stats/stats.py :
# check both a and b
contains_nan, nan_policy = (_contains_nan(a, nan_policy) or
_contains_nan(b, nan_policy))
must be
contains_nan = (_contains_nan(a, nan_policy)[0] or
_contains_nan(b, nan_policy)[0])
swapping 'Trait_A'
and 'Trait_B'
in your case solve your problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.