简体   繁体   中英

Why do I not get a p-value and F value from ANOVA in R?

I am asked to determine using an appropriate test whether the internal concentration of isoleucine is statistically different between appropriate experiments. So I constructed a data frame in R as follow:

dat <- data.frame(experiment = c('no glu', 'glu', 'gluFCCP', 'gluV', 'gluN', 'gluNV'),internal_concentration = c(2.38, 8.57, 3.42, 6.17, 4.58,3.51)) 
print(dat)
experiment     internal_concentration
1     no glu                   2.38
2        glu                   8.57
3    gluFCCP                   3.42
4       gluV                   6.17
5       gluN                   4.58
6      gluNV                   3.51

I chose to perform an ANOVA test:

model<-lm(dat$internal_concentration~dat$experiment)

anova(model)

Analysis of Variance Table

Response: dat$internal_concentration Df Sum Sq Mean Sq F value Pr(>F) dat$experiment  5 25.558  5.1117               
Residuals       0  0.000  

I need the F and p values to determine if there's a significant difference, but this result is not working. How should I interpret this error and any suggestion on how to change the code/the test used? I just have one value for the concentration for each experiment.

# Dataset (reproduced)

df <- data.frame(
  x = c('no glu', 'glu', 'gluFCCP', 'gluV', 'gluN', 'gluNV'),  # 6 groups
  y = c(2.38, 8.57, 3.42, 6.17, 4.58, 3.51)                    # 6 values
  )  

It appears you want to test for a difference across one of the six experimental groups using only six observations—one for each group. You did not report your specific error message, but it is clear you do not have enough residual degrees of freedom to estimate a model. You cannot estimate a variance or a $p$ -value in this setting. Here is a summary() of your model estimated using the lm() function:

Call:
lm(formula = y ~ x, data = df)

Residuals:
ALL 6 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)     8.57         NA      NA       NA
xgluFCCP       -5.15         NA      NA       NA
xgluN          -3.99         NA      NA       NA
xgluNV         -5.06         NA      NA       NA
xgluV          -2.40         NA      NA       NA
xno glu        -6.19         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 5 and 0 DF,  p-value: NA

The foregoing summary output is a little more explicit. The number of groups equals the number of observations, which is more variables than you can afford. Running a model using aov() or lm() does not change anything.

In general, the number of observations should exceed the number of parameters (ie, groups).

Your model has as many parameters as measurements. This will always make the residuals zero.

As a consequence the anova function will not gonna compute a $p$ -value (and not only that, it will not be able either to compute an $F$ -value).

Possibly you have multiple measurements for each of the 6 conditions. If you include all those measurements separately then the analysis will make sense.

ANOVA uses those additional measurements to estimate the variance within groups, the error/variance occurring even when the condition is the same. You need an estimate of that variance or otherwise, you can not know how the modeled values relate to the random variation that would be present when there is no effect (even when there is no effect, your measurements will show a difference between the groups because of variations in measurements).

how to change the code/the test used? I just have one value for the concentration for each experiment.

You say each experimental condition is done only once so, as others have said, you cannot estimate the variation around each condition and you cannot derive any p-value.

However, you could still have a reasonable guess for how much variation there is around each experimental condition. Perhaps is accepted in the field that glucose concentration is measured quite reliably and your experimental set up is fairly reproducible. Also, you may have an expectation and explanation for the differences between conditions. If so, I would present the data as it is adding an acceptable measure of uncertainty.

Let's say that if you repeated each experiment many times you would expect most of the results (say 95%) to be within 1 unit away from what you have, then your results would look like:

dat <- data.frame(experiment = c('no glu', 'glu', 'gluFCCP', 'gluV', 'gluN', 'gluNV'),
    internal_concentration = c(2.38, 8.57, 3.42, 6.17, 4.58,3.51)) 

dat <- dat[order(dat$internal_concentration),] dat$up <- dat$internal_concentration + 1 dat$down <- dat$internal_concentration - 1

b <- barplot(dat$internal_concentration, names.arg= dat$experiment, ylim= c(0, max(dat$up)), ylab= 'Internal concentration') segments(x0= b, y0= dat$down, y1= dat$up)

在此处输入图片说明

This suggests that you cannot say much about the difference between "no glu" and gluFCCP and gluNV. Whereas the difference between "no glu" and glu is sufficiently large to be worth discussing.

It's not ideal and you have to make it very very clear that those error bars are educated guesses but maybe is a fair way of using the data you have.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM