简体   繁体   English

有没有办法纠正由 R 中的对照组引起的偏斜?

[英]Is there a way to correct for skewness caused by the control group in R?

I am working with a research data set where we exposed groups of small fish to a stressor and then sampled them at different timepoints to get an idea of how their cortisol (stress hormone) levels changed with time.我正在使用一个研究数据集,我们将成群的小鱼暴露在压力源中,然后在不同的时间点对它们进行采样,以了解它们的皮质醇(压力荷尔蒙)水平如何随时间变化。 We sampled at time 0 for our control (before the stressor was introduced), and also at 15, 30, and 60 minutes post-stressor exposure.我们在时间 0 进行采样以进行对照(在引入压力源之前),以及在压力源暴露后 15、30 和 60 分钟进行采样。 Fish were grouped so that an entire group (subtank) was sampled at once to reduce confounding stressors on fish, but control samples were taken from every group prior to exposure to develop a basal cortisol level.对鱼进行分组,以便一次对整个组(子箱)进行采样,以减少对鱼的混杂压力,但在暴露前从每组中抽取对照样品以形成基础皮质醇水平。

The problem I am having is that our control groups (at time 0) have significantly lower cortisol values compared to all three of our treatment groups, which is skewing our data to the right.我遇到的问题是,与我们的所有三个治疗组相比,我们的对照组(在时间 0)的皮质醇值显着降低,这使我们的数据向右倾斜。 I have tried log, sqrt, reciprocal, and cube root transformations in R on the data with the controls included and have never been able to correct for normality within alpha=0.05, but I have gotten close.我已经在 R 中对包含控件的数据进行了对数、平方、倒数和立方根转换,并且从未能够在 alpha=0.05 内校正正态性,但我已经接近了。 Without the control data, our treatment data is normally distributed, so I feel like there should be a way to address this statistically without removing it?如果没有控制数据,我们的治疗数据是正态分布的,所以我觉得应该有一种方法可以在统计上解决这个问题而不删除它?

Does anyone know any parametric ways to address this in a statistically-sound manner in R?有谁知道在 R 中以统计合理的方式解决这个问题的任何参数方法? The end-goal is to run an ANOVA, so if parametric methods won't work, any near-equivalent, non-parametric recommendations would be appreciated!最终目标是运行方差分析,因此如果参数方法不起作用,任何接近等效的非参数建议将不胜感激!

Non-Parametric ANOVA approach:非参数方差分析方法:

You could theoretically just run a Kruskall-Wallis ANOVA if you want a non-parametric approach.如果您想要一种非参数方法,理论上您可以只运行 Kruskall-Wallis ANOVA。

# Load libraries:
library(tidyverse)
library(rstatix)

# Run Kruskal on PlantGrowth dataset in R:
res.kruskal <- PlantGrowth %>%
  kruskal_test(weight ~ group)
res.kruskal

Which will give you this result if you print the res.kruskal part:如果您打印 res.kruskal 部分,这将为您提供以下结果:

  .y.        n statistic    df      p method        
* <chr>  <int>     <dbl> <int>  <dbl> <chr>         
1 weight    30      7.99     2 0.0184 Kruskal-Wallis

Transformation Approach:转换方法:

There are additional ways to correct skew in R if you wanna try them, though I'm not as familiar with your data or if these would work with them:如果您想尝试它们,还有其他方法可以纠正 R 中的偏斜,尽管我不熟悉您的数据或者这些方法是否适用:

# Right skewed data:
x <- c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,3,4,5,6,7,8,9)

# Visualize data:
hist(x)

在此处输入图像描述

Moderate transformation:中等转变:

# Right skew moderate transformations:
sqrt(x)

# Visualize mod transform:
hist(sqrt(x))

在此处输入图像描述

Larger transformation:更大的转变:

# Right skew greater transformation:
log10(x)

# Visualization great transform:
hist(log10(x))

在此处输入图像描述

Extreme transformation:极端转变:

# Right skew extreme transformation:
1/x

# Visualize extreme transform:
hist(1/x)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM