I'm having trouble dividing a column in my dataset into tertiles based on another column in the dataset. For instance, how can I divide gene expression levels into three groups (low, medium, high) based on gene expression level? The columns in the dataset have genes as one column and expression as another column.
I was thinking of using this function:
sort(datasetname$expression)
So, this would sort the expression levels from highest to lowest. But then, I'm not sure how to label which ones as low, medium or high and how to make new subsets for each of these?
Thanks in advance!
Here is an example using the iris example dataset that comes with R. Here the tertiles will be based on the variable Petal Length.
# generate tertile limits using the quantile function,
# with proportion spacing of 0 to 1 at .33 intervals.
# These 4 values represent the start and end points in terms of Petal Length,
# of the three terriles.
tertile_limits <- quantile(iris$Petal.Length, seq(0, 1, 1/3), na.rm = TRUE)
# use the tertile start and end points (4 points, which creates 3 intervals)
# to create a new factor in the dataset
# The three tertiles are also explicitly labelled Low, Medium, and High, though this is optional.
iris$Petal.Length.Tertiles <- cut(iris$Petal.Length, tertile_limits, c('Low', 'Medium', 'High'), include.lowest = TRUE)
You can get the tertiles using the quantile
function and then assign groups using the cut
function. Here is an example using mtcars and mpg:
cars <- mtcars
breaks <- quantile(cars$mpg, c(.33, .67, 1))
breaks <- c(0, breaks)
labels <- c('low', 'medium', 'high')
cuts <- cut(cars$mpg, breaks = breaks, labels = labels)
cars <- cbind(cars, cuts)
head(cars)
mpg cyl disp hp drat wt qsec vs am gear carb cuts
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 medium
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 medium
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 high
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 medium
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 medium
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 medium
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.