简体   繁体   English

用于比较 R 中不同数据帧中的行的非参数测试

[英]Nonparametric test to compare rows in different dataframes in R

This is my first post here.这是我在这里的第一篇文章。

I have 4 dataframes for which I would like to do stepwise nonparametric tests for each row.我有 4 个数据帧,我想对每一行进行逐步非参数测试。

在此处输入图片说明

Eg.例如。 I would like to compare the values for each row in dataframe A with the values for each row in dataframe B.我想将数据帧 A 中每一行的值与数据帧 B 中每一行的值进行比较。

I would need a non parametric test eg.我需要一个非参数测试,例如。 Wilcoxon or whatever.威尔科克森什么的。

I thought of making a new column with the median, but I am certain that there is something better.我想用中位数创建一个新列,但我确信有更好的东西。

Could you give me an idea how to do this?你能给我一个想法如何做到这一点吗?

Thank you in advance!先感谢您!

Edit: Here are my imaginary dataframes.编辑:这是我想象的数据框。

I want to compare each dataframe row-wise eg do a nonparametric test for John in dataframes A and B, then for Dora, etc.我想逐行比较每个数据帧,例如在数据帧 A 和 B 中对 John 进行非参数检验,然后对 Dora 等进行非参数检验。

A <- data.frame("A" = c("John","Dora","Robert","Jim"), 
                "A1" = c(8,1,10,5), 
                "A2"= c(9,1,1,4))
B <- data.frame("B" = c("John","Dora","Robert","Jim"), 
                "B1" = c(1,1,1,5), 
                "B2"= c(3,2,1,5), 
                "B3"=c(4,3,1,5), 
                "B4"=c(6,8,8,1))

I think you are looking for the function wilcox.test (in stats package).我认为您正在寻找函数wilcox.test (在stats包中)。

Solution 1: Using a for loop解决方案 1:使用for loop

One way to compare each row of A with the corresponding row of B (and extract the p value) is to create a for loop such as this:将 A 的每一行与 B 的对应行进行比较(并提取 p 值)的一种方法是创建一个for loop ,如下所示:

pval = NULL
for(i in 1:nrow(A))
{
    vec_a = as.numeric(A[i,2:ncol(A)])
    vec_b = as.numeric(B[B$B == A$A[i],2:ncol(B)])

    p <- wilcox.test(vec_a,vec_b)
    pval = c(pval, p$p.value)
    print(p)
}

At the end, you will get a vector pval containing the pvalue for each row.最后,您将获得一个包含每行 pvalue 的向量pval

pval
[1] 0.1333333 0.2188194 0.5838824 1.0000000

Solution 2: Using tidyverse解决方案 2:使用tidyverse

A more elegant solution is to have the use of the tidyverse packages (in particular dplyr and tidyr ) to assemble your dataframe into a single one, and compare each name by group by passing a formula in the function wilcox.test .一个更优雅的解决方案是使用tidyverse包(特别是dplyrtidyr )将您的数据帧组装成一个单一的数据帧,并通过在函数wilcox.test传递一个公式来按组比较每个名称。

First, we can merge your dataframes by their name using left_join function from dplyr :首先,我们可以使用他们的名字合并您dataframes left_join从功能dplyr

library(dplyr)
DF <- left_join(A,B, by = c("A"="B"))

       A A1 A2 B1 B2 B3 B4
1   John  8  9  1  3  4  6
2   Dora  1  1  1  2  3  8
3 Robert 10  1  1  1  1  8
4    Jim  5  4  5  5  5  1

Then using dplyr and tidyr packages, you can reshape your dataframe into a longer format:然后使用dplyrtidyr包,您可以将数据帧重塑为更长的格式:

library(dplyr)
library(tidyr)
DF %>% pivot_longer(., -A, names_to = "var", values_to = "values") 

# A tibble: 24 x 3
   A     var   values
   <fct> <chr>  <dbl>
 1 John  A1         8
 2 John  A2         9
 3 John  B1         1
 4 John  B2         3
 5 John  B3         4
 6 John  B4         6
 7 Dora  A1         1
 8 Dora  A2         1
 9 Dora  B1         1
10 Dora  B2         2
# … with 14 more rows

We will create a new column "group" that will indicate A or B depending of values in the column var:我们将创建一个新列“组”,根据列 var 中的值指示 A 或 B:

DF %>% pivot_longer(., -A, names_to = "var", values_to = "values") %>%
  mutate(group = gsub("\\d","",var))

# A tibble: 24 x 4
   A     var   values group
   <fct> <chr>  <dbl> <chr>
 1 John  A1         8 A    
 2 John  A2         9 A    
 3 John  B1         1 B    
 4 John  B2         3 B    
 5 John  B3         4 B    
 6 John  B4         6 B    
 7 Dora  A1         1 A    
 8 Dora  A2         1 A    
 9 Dora  B1         1 B    
10 Dora  B2         2 B    
# … with 14 more rows

Finally, we can group by A and summarise the dataframe to get the p value of the function wilcox.test when comparing values in each group for each name:最后,我们可以按A分组并汇总数据帧, wilcox.test在比较每个名称的每个组中的值时获得函数wilcox.test的 p 值:

DF %>% pivot_longer(., -A, names_to = "var", values_to = "values") %>%
  mutate(group = gsub("\\d","",var)) %>%
  group_by(A) %>%
  summarise(Pval = wilcox.test(values~group)$p.value)

# A tibble: 4 x 2
  A       Pval
  <fct>  <dbl>
1 Dora   0.219
2 Jim    1    
3 John   0.133
4 Robert 0.584

It looks longer (especially because I explain each steps) but at the end, you can see that we need fewer lines than the first solution.它看起来更长(特别是因为我解释了每个步骤)但最后,您可以看到我们需要比第一个解决方案更少的行。

Does it answer your question ?它回答你的问题吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM