简体   繁体   中英

Is there a way to correct for skewness caused by the control group in R?

I am working with a research data set where we exposed groups of small fish to a stressor and then sampled them at different timepoints to get an idea of how their cortisol (stress hormone) levels changed with time. We sampled at time 0 for our control (before the stressor was introduced), and also at 15, 30, and 60 minutes post-stressor exposure. Fish were grouped so that an entire group (subtank) was sampled at once to reduce confounding stressors on fish, but control samples were taken from every group prior to exposure to develop a basal cortisol level.

The problem I am having is that our control groups (at time 0) have significantly lower cortisol values compared to all three of our treatment groups, which is skewing our data to the right. I have tried log, sqrt, reciprocal, and cube root transformations in R on the data with the controls included and have never been able to correct for normality within alpha=0.05, but I have gotten close. Without the control data, our treatment data is normally distributed, so I feel like there should be a way to address this statistically without removing it?

Does anyone know any parametric ways to address this in a statistically-sound manner in R? The end-goal is to run an ANOVA, so if parametric methods won't work, any near-equivalent, non-parametric recommendations would be appreciated!

Non-Parametric ANOVA approach:

You could theoretically just run a Kruskall-Wallis ANOVA if you want a non-parametric approach.

# Load libraries:
library(tidyverse)
library(rstatix)

# Run Kruskal on PlantGrowth dataset in R:
res.kruskal <- PlantGrowth %>%
  kruskal_test(weight ~ group)
res.kruskal

Which will give you this result if you print the res.kruskal part:

  .y.        n statistic    df      p method        
* <chr>  <int>     <dbl> <int>  <dbl> <chr>         
1 weight    30      7.99     2 0.0184 Kruskal-Wallis

Transformation Approach:

There are additional ways to correct skew in R if you wanna try them, though I'm not as familiar with your data or if these would work with them:

# Right skewed data:
x <- c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,3,4,5,6,7,8,9)

# Visualize data:
hist(x)

在此处输入图像描述

Moderate transformation:

# Right skew moderate transformations:
sqrt(x)

# Visualize mod transform:
hist(sqrt(x))

在此处输入图像描述

Larger transformation:

# Right skew greater transformation:
log10(x)

# Visualization great transform:
hist(log10(x))

在此处输入图像描述

Extreme transformation:

# Right skew extreme transformation:
1/x

# Visualize extreme transform:
hist(1/x)

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM