简体   繁体   中英

Correlation ratio for two columns in data frame based on group in R

I am trying to find the correlation ratio(I think that's the correct term, not to good at stats) between two columns in my data frame based on another column's unique values. I am not sure if I am using the correct function or not. I want the following number highlighted in yellow below. I can't seem to get what I am looking for. Would appreciate any help.

在此处输入图片说明

Sample data:

test_df<-structure(list(stdate = c("2015-06-25", "2015-06-25", "2015-06-29", 
"2015-06-29", "2008-05-05", "2008-05-05", "2015-06-30", "2015-06-30", 
"2015-06-30", "2017-11-15", "2017-11-15", "2017-11-13", "2017-11-13", 
"2015-08-31", "2015-08-31", "2008-05-01", "2008-05-01", "2017-02-14", 
"2017-02-14", "2017-02-13"), sttime = c("10:30:00", "10:30:00", 
"09:45:00", "09:45:00", "11:50:00", "11:50:00", "10:45:00", "10:45:00", 
"09:00:00", "09:50:00", "09:50:00", "09:10:00", "09:10:00", "13:50:00", 
"13:50:00", "09:30:00", "09:30:00", "10:30:00", "10:30:00", "08:30:00"
), locid = c("USGS-01388500", "USGS-01388500", "USGS-01464585", 
"USGS-01464585", "USGS-01464515", "USGS-01464515", "USGS-01407330", 
"USGS-01407330", "USGS-01466500", "USGS-01387500", "USGS-01387500", 
"USGS-01395000", "USGS-01395000", "USGS-01400860", "USGS-01400860", 
"USGS-01377000", "USGS-01377000", "USGS-01367625", "USGS-01367625", 
"USGS-01398000"), Specific_conductance = c(525, 525, 184, 184, 
226, 226, 203, 203, 41, 674, 674, 466, 466, 312, 312, 540, 540, 
844, 844, 683), tds = c(294, 275, 119, 100, 155, 116, 155, 115, 
43, 403, 382, 286, 274, 177, 173, 328, 277, 435, 440, 347)), .Names = c("stdate", 
"sttime", "locid", "Specific_conductance", "tds"), row.names = c(NA, 
20L), class = "data.frame")

Code:

correlation_df<-test_df%>%
       group_by(locid)%>%
       summarise(correl=cor(tds,Specific_conductance))

When I run this I get 1 by 1 data frame with NA.. I want a value for each locid

Have you tried running that code with your full data? In your test_df , you've only got two entries for each locid , so it's trying to correlate two numbers (which will always give NAs). If I make up a dummy data frame with more data, it works fine:

test_df <- tibble(locid = rep(c("a", "b", "c", "d"), 100), tds = rnorm(400), 
Specific_conductance = rnorm(400))

correlation_df <- test_df%>%
  group_by(locid)%>%
  summarise(correl = cor(tds,Specific_conductance))
correlation_df

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM