简体   繁体   English

基于 p 值显示 LM

[英]Displaying LM based on p-value

I'm displaying linear regression models in plots using the ggpmisc package.我使用ggpmisc package 在图中显示线性回归模型。 I only want the regression line, p-value and r2-value to be showed int the plot if the p-value is less than 0.2.如果 p 值小于 0.2,我只希望在 plot 中显示回归线、p 值和 r2 值。

Does anyone know if there is a way to choose to display these things based on the p-value?有谁知道是否有办法根据 p 值选择显示这些东西?

在此处输入图像描述

Here's the code for the plot:这是 plot 的代码:

library(ggpmisc)
library(ggplot2)

formula <- y~x

ggplot(df, aes(carbon, 
               acetone, 
               fill=soil_type)) +
  geom_smooth(method = "lm",
              formula = formula, 
              color="black") +
  geom_point(aes(shape=soil_type, 
                 color=soil_type, 
                 size=soil_type)) +
  scale_fill_manual(values=c("green3", "brown")) + 
  scale_color_manual(values=c("black", "black")) + 
  scale_shape_manual(values=c(21, 24))+
  scale_size_manual(values=c(2.7, 2.0))+
  labs(shape="soil_type", 
       color="soil_type") +
  theme_bw() +
  facet_wrap(~days, 
             ncol = 2)+
  stat_poly_eq(
    aes(label = paste(stat(adj.rr.label),
                      stat(p.value.label), 
                      sep = "*\", \"*")),
    formula = formula, 
    rr.digits = 2, 
    p.digits = 1, 
    parse = TRUE,size=3.5)

Here's the dataset:这是数据集:

df <- structure(list(carbon = c(1.4, 0.8, 1.6, 0.1, 0.4, 0.4, 0.4, 
1.3, 0.4, 1.1, 0.2, 1, 0.4, 0.4, 0.5, 0.8, 0.1, 0.5, 0.4, 0.6, 
1.1, 0.6, 0.2, 0.2, 0.4, 0.1, 0.3, 0.5, 1.4, 0.3, 0.3, 1.1, 0.3, 
0.7, 0.4, 0.4, 1.1, 0.1, 0.6, 1.3, 0.1, 1.6, 0.4, 0.5, 0.5, 1.2, 
0.5, 0.5, 1.4, 0.8, 1.6, 0.1, 0.4, 0.4, 0.4, 1.3, 0.4, 1.1, 0.2, 
1, 0.4, 0.4, 0.5, 0.8, 0.1, 0.5, 0.4, 0.6, 1.1, 0.6, 0.2, 0.2, 
0.4, 0.1, 0.3, 0.5, 1.4, 0.3, 0.3, 1.1, 0.3, 0.7, 0.4, 0.4, 1.1, 
0.1, 0.6, 1.3, 0.1, 1.6, 0.4, 0.5, 0.5, 1.2, 0.5, 0.5), days = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), soil_type = c("organic", 
"mineral", "organic", "mineral", "mineral", "mineral", "mineral", 
"organic", "mineral", "organic", "mineral", "mineral", "mineral", 
"mineral", "mineral", "mineral", "mineral", "mineral", "mineral", 
"mineral", "organic", "mineral", "mineral", "mineral", "mineral", 
"mineral", "mineral", "mineral", "organic", "mineral", "mineral", 
"organic", "mineral", "mineral", "mineral", "mineral", "organic", 
"mineral", "mineral", "organic", "mineral", "organic", "mineral", 
"mineral", "mineral", "organic", "mineral", "mineral", "organic", 
"mineral", "organic", "mineral", "mineral", "mineral", "mineral", 
"organic", "mineral", "organic", "mineral", "mineral", "mineral", 
"mineral", "mineral", "mineral", "mineral", "mineral", "mineral", 
"mineral", "organic", "mineral", "mineral", "mineral", "mineral", 
"mineral", "mineral", "mineral", "organic", "mineral", "mineral", 
"organic", "mineral", "mineral", "mineral", "mineral", "organic", 
"mineral", "mineral", "organic", "mineral", "organic", "mineral", 
"mineral", "mineral", "organic", "mineral", "mineral"), acetone = c(0.9, 
0.7, 0.5, 44.4, 44.2, 9.7, 66, 3.3, 8.6, 26.8, 111.4, 14.5, 40.7, 
23.2, 51.6, 3.5, 64.3, 9.8, 48.5, 39.4, 0.2, 24.2, 55.3, 30.2, 
28.9, 63.6, 80.7, 50.4, 0.9, 34.4, 102.2, 2.8, 16.5, 9.7, 32.1, 
124.4, 3.7, 56.8, 10.6, 0.7, 41.1, 1.3, 62.5, 1.1, 86.3, 0.1, 
2.7, 5, 0.1, 0.1, 0.1, 179.1, 60.9, 2.6, 65.3, 14.7, 0, 34.9, 
133.7, 0, 56, 36.2, 2, 0.2, 44.9, 24.5, 123.8, 26.5, 0.1, 0.2, 
23.6, 146.3, 0.3, 169.2, 164.4, 30, 0, 0, 123.1, 0.2, 3.1, 58, 
0, 0.1, 0, 44.9, 1, 0, 102.6, 3.9, 91.4, 1.3, 21.8, 0.1, 0.6, 
1.8)), row.names = c(NA, -96L), class = "data.frame")

New df新的df

new_df <- structure(list(log10_carbon_content_pct = c(1.37049458496569, 
0.832668550451795, 1.59213788019068, 0.145507171409663, 0.446381812222442, 
0.439569517147175, 0.422589839851482, 1.28870743057217, 0.400192488592576, 
1.09659720835789, 0.241297387109993, 0.961610908091281, 0.398026858883686, 
0.392257161341674, 0.453700473359772, 0.806451323247262, 0.11544408343624, 
0.495474955889315, 0.354492600589436, 0.61526599889915, 1.14182589451108, 
0.600537294364469, 0.160768561861128, 0.180699201296035, 0.447002898466162, 
0.104657791008796, 0.276806345628763, 0.530903734802764, 1.41408715182753, 
0.272305844402086, 0.250175948083925, 1.12073840554294, 0.323045735481701, 
0.652971172017589, 0.373463721632369, 0.378942698613437, 1.13800253645643, 
0.0874264570362855, 0.601027315144485, 1.34486364979713, 0.139721704815204, 
1.60809259256346, 0.379305517750582, 0.51215053692203, 0.466496903744401, 
1.23437806425139, 0.541766399511599, 0.471365065418019, 1.37049458496569, 
0.832668550451795, 1.59213788019068, 0.145507171409663, 0.446381812222442, 
0.439569517147175, 0.400192488592576, 1.09659720835789, 0.241297387109993, 
0.961610908091281, 0.398026858883686, 0.392257161341674, 0.453700473359772, 
0.806451323247262, 0.11544408343624, 0.495474955889315, 0.354492600589436, 
0.61526599889915, 1.14182589451108, 0.600537294364469, 0.160768561861128, 
0.180699201296035, 0.447002898466162, 0.104657791008796, 0.276806345628763, 
0.530903734802764, 1.41408715182753, 0.272305844402086, 0.250175948083925, 
1.12073840554294, 0.323045735481701, 0.652971172017589, 0.373463721632369, 
0.378942698613437, 1.13800253645643, 0.0874264570362855, 0.601027315144485, 
1.34486364979713, 0.139721704815204, 1.60809259256346, 0.379305517750582, 
1.23437806425139, 0.541766399511599, 0.471365065418019), daysincubated4 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 94L, 
94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 
94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 
94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 
94L, 94L, 94L, 94L), soil_type = c("organic", "mineral", "organic", 
"mineral", "mineral", "mineral", "mineral", "organic", "mineral", 
"organic", "mineral", "mineral", "mineral", "mineral", "mineral", 
"mineral", "mineral", "mineral", "mineral", "mineral", "organic", 
"mineral", "mineral", "mineral", "mineral", "mineral", "mineral", 
"mineral", "organic", "mineral", "mineral", "organic", "mineral", 
"mineral", "mineral", "mineral", "organic", "mineral", "mineral", 
"organic", "mineral", "organic", "mineral", "mineral", "mineral", 
"organic", "mineral", "mineral", "organic", "mineral", "organic", 
"mineral", "mineral", "mineral", "mineral", "organic", "mineral", 
"mineral", "mineral", "mineral", "mineral", "mineral", "mineral", 
"mineral", "mineral", "mineral", "organic", "mineral", "mineral", 
"mineral", "mineral", "mineral", "mineral", "mineral", "organic", 
"mineral", "mineral", "organic", "mineral", "mineral", "mineral", 
"mineral", "organic", "mineral", "mineral", "organic", "mineral", 
"organic", "mineral", "organic", "mineral", "mineral"), log10_acetone_c = c(0.00926846768640111, 
0.00722297480690438, 0.00476160511452692, 0.444394072789671, 
0.442046700697262, 0.0969444813777115, 0.659755819077841, 0.0331353719785704, 
0.0860951658767628, 0.268398621135863, 1.11425178483619, 0.144553061208862, 
0.406832035925707, 0.23171358353469, 0.516127469481814, 0.0348431782930108, 
0.643065970146376, 0.0976949557752846, 0.485466415318889, 0.394413062169997, 
0.00215649305508422, 0.241895264091051, 0.553024436602299, 0.302071278712034, 
0.289062005904557, 0.63551683217124, 0.806576564543876, 0.504060450046605, 
0.00875624846772753, 0.343705138058693, 1.02196979128145, 0.0275595437157376, 
0.165301609757072, 0.0968005281885054, 0.321248914837189, 1.2444284170588, 
0.0365188178564554, 0.567626974656115, 0.10556180687771, 0.00690130440471077, 
0.411060711247439, 0.0126107339499284, 0.625415265123349, 0.0107804497649973, 
0.863015429856585, 0.000900918219072745, 0.0266090695624902, 
0.0503551827004673, 0.000268027005920481, 0.000469317124751776, 
3.95824821597101e-05, 0.00441758233902833, 0.00236289505353141, 
0.00113488982479906, 0, 0, 0, 0, 0.0531046200284991, 0, 0, 0.000214196723493331, 
0.00302534713027796, 0.000436083384348923, 0.000226218782648512, 
0.00292979624099701, 0.00124508843867096, 0, 0.0286531848530279, 
0.0134556110640359, 0, 0, 0, 0.0104783275343697, 0.0154919389302452, 
0.0197930173217508, 0, 0, 0, 0, 0, 0, 0.00051087017306838, 0.000186632771190318, 
0.00500797053508424, 0, 0.318561145793628, 0, 0, 0.0126117092437348, 
0.00699751577956711, 0)), row.names = c(NA, -92L), class = "data.frame")


names = c(0,0) #Create a starting point of a matrix for the group names

#For each group, run a lm to find if pvalue < 0.2
for(i in unique(new_df$daysincubated4)){
  for(j in unique(new_df$soil_type)){
    lm = summary(lm(log10_acetone_c~log10_carbon_content_pct, new_df[new_df$daysincubated4==i & new_df$soil_type==j,])) 
    p = pf(lm$fstatistic[1], lm$fstatistic[2], lm$fstatistic[3], lower.tail=FALSE)
    if(p < 0.2){names = rbind(names, c(i,j))} #Get the groups that pass
  }
}

names = names[-1,] #Remove starting point

new_df2 = new_df[new_df$daysincubated4%in%names[,1] & new_df$soil_type%in%names[,2],]

formula <- y~x

(acetone_c_vs_cc <- ggplot(new_df, 
                           aes(log10_carbon_content_pct, 
                               log10_acetone_c, 
                               fill=soil_type)) +
    geom_smooth(method = "lm",
                formula = formula, color="black", data = new_df2) +
    geom_point(aes(shape=soil_type, color=soil_type, size=soil_type)) +
    scale_fill_manual(values=c("#00AFBB", "brown")) + 
    scale_color_manual(values=c("black", "black")) + 
    scale_shape_manual(values=c(21, 24))+
    scale_size_manual(values=c(2.4, 1.7))+
    labs(shape="soil_type", color="soil_type") +
    labs(x = "Soil organic carbon (%)", 
         y = "Emission (umol/g dw SOC/h)", 
         title = "Acetone vs Carbon content", 
         subtitle = "Emission and carbon data has been log10 transformed") + 
    theme_bw() +
    facet_wrap(~daysincubated4, 
               ncol = 4)+
    stat_poly_eq(data = new_df2,
      aes(label = paste(stat(adj.rr.label),
                        stat(p.value.label), 
                        sep = "*\", \"*")),
      formula = formula, 
      rr.digits = 2, 
      p.digits = 1, 
      parse = TRUE,size=3.5))

I'm not sure if this is the best way, but you can run the lm's by hand to check your condition, then subset your df to use only the observations of the group that passes:我不确定这是否是最好的方法,但是您可以手动运行 lm 来检查您的状况,然后将您的 df 子集化以仅使用通过的组的观察结果:

First, to check the condition:首先,检查条件:

names = c(0,0) #Create a starting point of a matrix for the group names

#For each group, run a lm to find if pvalue < 0.2
for(i in unique(df$days)){
  for(j in unique(df$soil_type)){
    lm = summary(lm(acetone~carbon, df[df$days==i & df$soil_type==j,])) 
    p = pf(lm$fstatistic[1], lm$fstatistic[2], lm$fstatistic[3], lower.tail=FALSE)
    if(p < 0.2){names = rbind(names, c(i,j))} #Get the groups that pass
  }
}

names = names[-1,] #Remove starting point

Now, create a subset of the df, and pass it as a data argument to both geom_smooth and stat_poly_eq :现在,创建 df 的一个子集,并将其作为data参数传递给geom_smoothstat_poly_eq

df2 = df[df$days%in%names[,1] & df$soil_type%in%names[,2],]

ggplot(df, aes(carbon, 
                   acetone, 
                   fill=soil_type)) +
  geom_smooth(method = "lm",
              formula = formula, 
              color = "black",
              data = df2) +
  geom_point(aes(shape=soil_type, 
                 color=soil_type, 
                 size=soil_type)) +
  scale_fill_manual(values=c("green3", "brown")) + 
  scale_color_manual(values=c("black", "black")) + 
  scale_shape_manual(values=c(21, 24))+
  scale_size_manual(values=c(2.7, 2.0))+
  labs(shape="soil_type", 
       color="soil_type") +
  theme_bw() +
  facet_wrap(~days, 
             ncol = 2)+
  stat_poly_eq(
    data = df2,
    aes(label = paste(stat(adj.rr.label),
                      stat(p.value.label), 
                      sep = "*\", \"*")),
    formula = formula, 
    rr.digits = 2, 
    p.digits = 1, 
    parse = TRUE,size=3.5)

Output: Output:

在此处输入图像描述

EDIT 1编辑 1

The part where i subset the df was wrong (it only worked because there was a group that didn't passed at all).我对 df 进行子集化的部分是错误的(它只是因为有一个组根本没有通过)。 The problem is that df$days%in%names[,1] & df$soil_type%in%names[,2] doesn't check pair by pair.问题是df$days%in%names[,1] & df$soil_type%in%names[,2]不会逐对检查。 So we actually need do do a loop:所以我们实际上需要做一个循环:

#Create subset of df with groups that passed
new_df2 = numeric()
for(i in 1:nrow(names)){
  new_df2 = rbind(new_df2,
                  new_df[new_df$daysincubated4%in%names[i,1] & new_df$soil_type%in%names[i,2],])
}

Obs: this rearranges the data frame, but i don't think this would be a problem for you. Obs:这会重新排列数据框,但我认为这对您来说不是问题。

Output: Output:

在此处输入图像描述

Obs: the 0.2 p-values are actually rounded up, so they should pass. Obs:0.2 p 值实际上是四舍五入的,所以它们应该通过。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM