简体   繁体   中英

How to divide column in dataset into three groups (tertiles) based on another column in the dataset? Using R

I'm having trouble dividing a column in my dataset into tertiles based on another column in the dataset. For instance, how can I divide gene expression levels into three groups (low, medium, high) based on gene expression level? The columns in the dataset have genes as one column and expression as another column.

I was thinking of using this function:

sort(datasetname$expression)

So, this would sort the expression levels from highest to lowest. But then, I'm not sure how to label which ones as low, medium or high and how to make new subsets for each of these?

Thanks in advance!

Here is an example using the iris example dataset that comes with R. Here the tertiles will be based on the variable Petal Length.

# generate tertile limits using the quantile function,
# with proportion spacing of 0 to 1 at .33 intervals.
# These 4 values represent the start and end points in terms of Petal Length,
# of the three terriles.
tertile_limits <- quantile(iris$Petal.Length, seq(0, 1, 1/3), na.rm = TRUE)

# use the tertile start and end points (4 points, which creates 3 intervals)
# to create a new factor in the dataset
# The three tertiles are also explicitly labelled Low, Medium, and High, though this is optional.
iris$Petal.Length.Tertiles <- cut(iris$Petal.Length, tertile_limits, c('Low', 'Medium', 'High'), include.lowest = TRUE)

You can get the tertiles using the quantile function and then assign groups using the cut function. Here is an example using mtcars and mpg:

cars <- mtcars
breaks <- quantile(cars$mpg, c(.33, .67, 1))
breaks <- c(0, breaks)
labels <- c('low', 'medium', 'high')
cuts <- cut(cars$mpg, breaks = breaks, labels = labels)
cars <- cbind(cars, cuts)
head(cars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb   cuts
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 medium
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 medium
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   high
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 medium
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 medium
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 medium

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM