I measured bacterial inhibating power on viruses. I have data matrix of n rows (individuals) and 4 columns (a,b,c,x). Depending on column x I would like to define them as good or bad inhibators. However, I am not sure how to put a treshold of column x, depending on other measured columns (a,b,c). Is there any R function that could separate/group my dataframe?
In dplyr logic there is group_by()
, it works like this:
library(dplyr)
df %>%
group_by(A) %>% # df is now grouped by column A
summarise(Mean = mean(C)) # calculates the mean of C for each group of A, summarise will delete any other columns not summarised and show only distinct rows
df %>%
group_by(A) %>%
mutate(Mean = mean(C)) # This will add the grouped mean to each row without changing the data frame
If you summarise then you are done but after group_by and mutate you have to ungroup
your data frame at some point.
data.table example below. In the data, we have 50 observations (a) across 5 groups (Group).
Data
dt = data.table(
a = runif(1:50),
Group = sample(LETTERS[1:5], 50, replace = T)
)
Example 1
Firstly, we can calculate the Group mean of a and label it 'Good' if it is above 0.5 and 'Bad' if below. Note that this summary does not include a.
dt1 = dt[, .(Mean = mean(a)), keyby = Group][, Label := ifelse(Mean > 0.5, 'Good', 'Bad')]
> dt1
Group Mean Label
1: A 0.2982229 Bad
2: B 0.4102181 Bad
3: C 0.6201973 Good
4: D 0.4841881 Bad
5: E 0.4443718 Bad
Example 2
Similarly to Fnguyen's answer, the following code will not summarise the data per group; it will merely show the Group Mean and Label next to each observation.
dt2 = dt[, Mean := mean(a), by = Group][, Label := ifelse(Mean > 0.5, 'Good', 'Bad')]
> head(dt2)
a Group Mean Label
1: 0.4253110 E 0.4443718 Bad
2: 0.4217955 A 0.2982229 Bad
3: 0.7389260 E 0.4443718 Bad
4: 0.2499628 E 0.4443718 Bad
5: 0.3807705 C 0.6201973 Good
6: 0.2841950 E 0.4443718 Bad
Example 3
Lastly, we can of course apply a conditional argument to create a new column without having previously calculated a Grouped variable. The following example tests a combined condition on columns a and b.
dt3 = data.table(a = runif(100), b = runif(100))
dt3[, abGrThan0.5 := ifelse((a > 0.5 & b > 0.5), TRUE, FALSE)]
> head(dt3)
a b abGrThan0.5
1: 0.5132690 0.02104807 FALSE
2: 0.8466798 0.96845916 TRUE
3: 0.5776331 0.79215074 TRUE
4: 0.9740055 0.59381244 TRUE
5: 0.4311248 0.07473373 FALSE
6: 0.2547600 0.09513784 FALSE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.