简体   繁体   English

在 r 中针对不同类别进行 t 检验

[英]t-test in r for different categories

I have a dataset with 26 variables and 4662 observation over one year.我有一个包含 26 个变量和一年内 4662 个观察值的数据集。 I want to analyse differences which may occur after a specific date.我想分析在特定日期之后可能发生的差异。 There is the variable time which is 0 if it is before the date and 1 if it is after.有一个变量时间,如果它在日期之前为 0,如果在日期之后为 1。 Another variable categories my different types of observation.另一个变量分类我不同类型的观察。

I would like to examine if there are significant differences between each categories before and after the specific date.我想检查在特定日期之前和之后每个类别之间是否存在显着差异。 But the differences which I want to look at are saved in another variable number_trackers.但是我想查看的差异保存在另一个变量 number_trackers 中。 c4 is a placeholder for all other unimprtant variables I wont need for this t.test c4 是我不需要这个 t.test 的所有其他非重要变量的占位符

reproduceable Dataframe可复制的数据框

Dataset <- data.frame = category=c("tools", "finance", "business", "education","tools","education"), 
number_trackers = c(10, 12, 1, 30, 7, 21), 
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
time = c(1,0,0,0,1,1))

It would be best if the output would be a t-test for each category with the two different time periods.如果输出是两个不同时间段的每个类别的 t 检验,那将是最好的。

A loop with categories might help:带有类别的循环可能会有所帮助:

#taking the list of unique categories
categories <- unique(Dataset$category)

#Creating an empty list
output_list <- list()

#Lopping the t-test for different categories and creating a list of output
for (i in categories) {
  output_list[[i]] <- t.test(number_trackers ~ time, 
                             data = Dataset[Dataset$category == i,], 
                             paired = FALSE)
}

If you want to see the summary of the first category:如果要查看第一类的摘要:

output_list[[categories[1]]]

Edit:编辑:

For generating a summary table of the output用于生成输出的汇总表

sum_tab <- as.data.frame(matrix(nrow = length(categories), ncol = 7))
colnames(sum_tab) <- c("t", "df", "p.value", "ConfIntLower", 
                       "ConfIntUpper", "Mean in Gr 0", "Mean in Gr 1")
rownames(sum_tab) <- categories

for (i in categories) {
  sum_tab[i, ] <- with(output_list[[i]], 
                       c(statistic, parameter, p.value, conf.int, estimate))
}


write.csv(sum_tab, "Summary.csv", row.names = TRUE)

PS: Since the reproducible example is not sufficient, I couldn't run this to show the output. PS:由于可重现的示例是不够的,我无法运行它来显示输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM