使用dplyr在data_frame的所有列中运行卡方检验

Question

There are several similar questions that grab chi-square results, but that solves my problem. 有几个类似的问题可以得出chi-square结果，但这可以解决我的问题。 I'd like to calculate p.values from chi-square tests for all columns in a data_frame and store them in a column within the original data_frame . 我想通过chi-square检验为data_frame所有列计算p.value并将它们存储在原始data_frame中的一列中。 There will be duplicate values which I'm fine with. 我会有很好的重复值。 Ultimately, I'd like to select all columns in a data_frame that have a p.value lower than x with my variable of choice. 最后，我想select的所有列data_frame有一个p.value比X与我选择的可变低。

require(dplyr)

my_df <- data_frame(
  one_f = sample(LETTERS[1:5],100,T),
  two_f = sample(LETTERS[4:5],100,T),
  three_f = sample(LETTERS[5],100,T)
)
my_df %>% 
  head()

my_df %>% 
  summarise_all(funs(chisq.test(.,my_df$two_f)$p.value))

Gets me this error: 让我知道这个错误：

Error in summarise_impl(.data, dots) : 
  Evaluation error: 'x' and 'y' must have at least 2 levels.


my_df %>% 
  mutate_if(n_distinct>1,fun(chisq.test(.,my_df$two_f)$p.value))

Get me this error: 让我这个错误：

Error in n_distinct > 1 : 
  comparison (6) is possible only for atomic and list types

I'm looking for something like this. 我正在寻找这样的东西。

my_df %>% 
      mutate(p.value = sample(c(0.043,0.87,0.00),nrow(.),T)) %>% 
      head()

Then I plan to use gather and filter then spread to get the significantly associated variables according to my chi-square test. 然后，我计划根据我的chi-square检验使用gather和filter然后进行spread以获取显着关联的变量。

I suppose 我想

my_df %>% filter(foo,bar >= 0.05)#function that finds p.values and filters by 
# alpha level

would be my ultimate goal. 这将是我的最终目标。

Answer 1

require(dplyr)
require(tidyr)

my_df <- data_frame(
  one_f = sample(LETTERS[1:5],100,T),
  two_f = sample(LETTERS[4:5],100,T),
  three_f = sample(LETTERS[5],100,T)
)

# select all column names where the column has more than 1 distinct values
my_df %>% 
  summarise_all(function(x) length(unique(x))) %>%
  gather() %>%
  filter(value > 1) %>%
  pull(key) -> list_cols

# apply function only to those columns
my_df %>% 
  select(list_cols) %>%
  summarise_all(funs(chisq.test(.,my_df$two_f)$p.value))

# # A tibble: 1 x 2
#     one_f                      two_f
#     <dbl>                      <dbl>
#   1 0.880 0.000000000000000000000120

使用dplyr在data_frame的所有列中运行卡方检验

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-01-24 18:40:56

使用dplyr在data_frame的所有列中运行卡方检验

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-01-24 18:40:56

解决方案1
1 已采纳 2018-01-24 18:40:56