简体   繁体   English

如何关联 R 中的多个子集

[英]How to correlate multiple subsets in R

How do I correlate 8 subsets separately against two different dependent variables?如何将 8 个子集分别与两个不同的因变量相关联? I keep getting the same correlation coefficient for the two different subsets (example below).对于两个不同的子集,我一直得到相同的相关系数(下面的示例)。 Here is the input:这是输入:

with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
                                                     mean.legit))

with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
                                                     mean.leegauthor))

with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
                                                     mean.legit))

with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
                                                     mean.leegauthor))

Output (I get this for both PARTY_Strength = 1 and 2): Output(我得到这个对于 PARTY_Strength = 1 和 2):

Pearson's product-moment correlation Pearson 的积矩相关性

data: PARTYID_Strength and mean.legit t = 3.1005, df = 607, p-value = 0.002022 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:数据:PARTYID_Strength 和 mean.legit t = 3.1005,df = 607,p 值 = 0.002022 备择假设:真实相关性不等于 0 95% 置信区间:
0.0458644 0.2023031 sample estimates: 0.0458644 0.2023031 样本估计:
cor心电图
0.1248597 0.1248597

Pearson's product-moment correlation Pearson 的积矩相关性

data: PARTYID_Strength and mean.leegauthor t = 2.8474, df = 607, p-value = 0.004557 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:数据:PARTYID_Strength 和 mean.leegauthor t = 2.8474,df = 607,p 值 = 0.004557 备择假设:真实相关性不等于 0 95% 置信区间:
0.03568431 0.19250344 sample estimates: 0.03568431 0.19250344 样本估计:
cor心电图
0.1148091 0.1148091

Sample data:样本数据:

> dput(head(mydata2, 10))
``structure(list(PARTYID = c(1, 3, 1, 1, 1, 4, 3, 1, 1, 1), PARTYID_Other = 
c("NA", 
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), PARTYID_Strength = 
c(1, 
7, 1, 2, 1, 8, 1, 6, 1, 1), PARTYID_Strength_Other = c("NA", 
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), THERM_Dem = c(80, 
65, 85, 30, 76, 15, 55, 62, 90, 95), THERM_Rep = c(1, 45, 10, 
5, 14, 14, 0, 4, 10, 3), Gender = c("Female", "Male", "Male", 
"Female", "Female", "Male", "Male", "Female", "Female", "Male"
), `MEAN Age` = c(29.5, 49.5, 29.5, 39.5, 29.5, 21, 39.5, 39.5, 
29.5, 65), Age = c("25 - 34", "45 - 54", "25 - 34", "35 - 44", 
"25 - 34", "18 - 24", "35 - 44", "35 - 44", "25 - 34", "65+"), 
Ethnicity = c("White or Caucasian", "Asian or Asian American", 
"White or Caucasian", "White or Caucasian", "Hispanic or Latino", 
"White or Caucasian", "White or Caucasian", "White or Caucasian", 
"White or Caucasian", "White or Caucasian"), Ethnicity_Other = c("NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"), States = c("Texas", 
"Texas", "Ohio", "Texas", "Puerto Rico", "New Hampshire", 
"South Carolina", "Texas", "Texas", "Texas"), Education = c("Master's 
degree", 
"Bachelor's degree in college (4-year)", "Bachelor's degree in college (4- 
 year)", 
"Master's degree", "Master's degree", "Less than high school degree", 
"Some college but no degree", "Master's degree", "Master's degree", 
"Some college but no degree"), `MEAN Income` = c(30000, 140000, 
150000, 60000, 80000, 30000, 30000, 120000, 150000, 60000
), Income = c("Less than $30,000", "$130,001 to $150,000", 
"More than $150,000", "$50,001 to $70,000", "$70,001 to $90,000", 
"Less than $30,000", "Less than $30,000", "$110,001 to $130,000", 
"More than $150,000", "$50,001 to $70,000"), mean.partystrength = c(3.875, 
2.875, 2.375, 3.5, 2.625, 3.125, 3.375, 3.125, 3.25, 3.625
), mean.traitrep = c(2.5, 2.625, 2.25, 2.625, 2.75, 1.875, 
2.75, 2.875, 2.75, 3), mean.traitdem = c(2.25, 2.625, 2.375, 
2.75, 2.625, 2.125, 1.875, 3, 2, 2.5), mean.leegauthor = c(1, 
2, 2, 4, 1, 4, 1, 1, 1, 1), mean.legit = c(1.71428571428571, 
3.28571428571429, 2.42857142857143, 2.42857142857143, 2.14285714285714, 
1.28571428571429, 1.42857142857143, 1.14285714285714, 2.14285714285714, 
1.28571428571429)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))``

Thank you!谢谢!

To run the tests, create a vector of the columns of interest and then sapply an anonymous function to each of them.要运行测试,请创建感兴趣列的向量,然后将匿名sapply应用到每个列。

fixed <- "PARTYID_Strength"
cols <- c("mean.leegauthor", "mean.legit")

cor_test_result <- sapply(cols, function(x){
  fmla <- paste(fixed, x, sep = "+")
  fmla <- as.formula(paste("~", fmla))
  cor.test(fmla, mydata2)
}, simplify = FALSE)

cor_test_result$mean.leegauthor
#
#        Pearson's product-moment correlation
#
#data:  PARTYID_Strength and mean.leegauthor
#t = 1.4804, df = 8, p-value = 0.177
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# -0.2343269  0.8462610
#sample estimates:
#      cor 
#0.4637152 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM