简体   繁体   English

Python中的ANOVA

[英]ANOVA in python

My csv file contains discrete and continuous variables and I would like to find the "equation of model" which explains my continuous variable (a) according to my discrete variables(x,y,z); 我的csv文件包含离散变量和连续变量,我想找到“模型方程”,它根据离散变量(x,y,z)解释了连续变量(a); ->a=f(x,y,z).The problem is that i'm trying this code but it fails. -> a = f(x,y,z)。问题是我正在尝试这段代码,但失败了。 When I want to see the result of the print([f_value, p_value] stats.f_oneway = (x, y, z)), I get [nan, nan]. 当我想查看print([f_value,p_value] stats.f_oneway =(x,y,z))的结果时,得到[nan,nan]。 Extract from my code: 从我的代码中提取:

from numpy import (genfromtxt,hstack,arange)
#Pr linear regression
from scipy import stats
import scipy
#Pr ANOVA
from statsmodels.stats.multicomp import (pairwise_tukeyhsd,MultiComparison)

from pylab import savefig
from matplotlib.pyplot import (figure,setp)

fname="G:/table.csv"
my_data = genfromtxt(fname,delimiter=',')

#Transformation of file into table

x= my_data[:,3]
y= my_data[:,4]
z= my_data[:,6]


#one way anova
[f_value, p_value] = stats.f_oneway(x, y, z)

I want estimate the model R=u+f(x,y,z)+ e where R (continuous variable), u (constant), e (measurement error). 我想估计模型R = u + f(x,y,z)+ e,其中R(连续变量),u(恒定),e(测量误差)。 I want have the coefficients for how the discrete variables affect my continuous outcome. 我想要有关离散变量如何影响我的连续结果的系数。

My best guess is that your data contains NaN values, or bad values. 我最好的猜测是您的数据包含NaN值或错误值。 You could try to detect is with following piece of code: 您可以尝试使用以下代码检测是否为:

for i in range(1,len(x)):
    (f_val,p_val) = stats.f_oneway(x[:i],y[:i],z[:i])
    if numpy.isnan(f_val) or numpy.isnan(p_val):
        print i-1,x[i-1],y[i-1],z[i-1],f_val,p_val

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM