I have a tibble/dataframe with
sample_id condition state
---------------------------------
sample1 case val1
sample1 case val2
sample1 case val3
sample2 control val1
sample2 control val2
sample2 control val3
The dataframe is generated within a for loop for different states. Hence, every dataframe has a different name for the state column.
I want to group the data by sample_id
and calculate the median of the state column such that every unique sample_id
has a single median value. The output should be like below...
sample_id condition state
---------------------------------
sample1 case median
sample2 control median
I am trying the command below; it is working if give the name of the column, but I am not able to pass the name via the state character variable. I tried ensym(state)
and !!ensym(state)
, but they all are throwing errors.
ddply(dat_state, .(sample_id), summarize, condition=unique(condition), state_exp=median(ensym(state)))
As camille notes above, this is easier in dplyr. Basic syntax (not yet addressing your question):
my_df %>%
group_by(sample_id, condition) %>%
summarize(state = median(state))
Note that syntax will give you values for every unique sample_id
- condition
pair. Which isn't an issue in your example, since every sample_id
has the same condition
, but just something to be aware of.
On to your question... It's not quite clear to me how you're planning to pass the state name to your calculation. But a couple ways you can handle this. One is to use dplyr's "rename" function:
x <- "Massachusetts"
my_df %>%
rename(state = x) %>%
group_by(sample_id, condition) %>%
summarize(state = median(state))
The (probably more proper) way to do this is to write a function using dplyr's "tidyeval" syntax:
myfunc <- function(df, state_name) {
df %>%
group_by(sample_id, condition) %>%
summarize(state = median({{state_name}}))
}
myfunc(my_df, Massachusetts) # Note: Unquoted state name
Thank you all for putting effort into answering my question. With your suggestions, I have found the solution. Below is the code to what I was trying to achieve by grouping sample_id
and condition
and passing state
through a variable.
state_mark <- c("pPCLg2", "STAT1", "STAT5", "AKT")
for(state in state_mark){
dat_state <- dat_clust_stim[,c("sample_id", "condition", state)]
# I had to use !!ensym() to convert a character to a symbol.
dat_med <- group_by(dat_state, sample_id, condition) %>%
summarise(med = median(!!ensym(state)))
dat_med <- ungroup(dat_med)
x <- dat_med[dat_med$condition == "case", "med"]
y <- dat_med[dat_med$condition == "control", "med"]
t_test <- t.test(x$med, y$med)
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.