简体   繁体   English

如何根据数据集中的另一列将数据集中的列分为三组(三分位数)? 使用 R

[英]How to divide column in dataset into three groups (tertiles) based on another column in the dataset? Using R

I'm having trouble dividing a column in my dataset into tertiles based on another column in the dataset.我无法根据数据集中的另一列将数据集中的列划分为三分位数。 For instance, how can I divide gene expression levels into three groups (low, medium, high) based on gene expression level?例如,如何根据基因表达水平将基因表达水平分为三组(低、中、高)? The columns in the dataset have genes as one column and expression as another column.数据集中的列将基因作为一列,将表达作为另一列。

I was thinking of using this function:我正在考虑使用这个 function:

sort(datasetname$expression)排序(数据集名称$表达式)

So, this would sort the expression levels from highest to lowest.因此,这会将表达水平从最高到最低排序。 But then, I'm not sure how to label which ones as low, medium or high and how to make new subsets for each of these?但是,我不知道如何 label 低、中或高,以及如何为每一个创建新的子集?

Thanks in advance!提前致谢!

Here is an example using the iris example dataset that comes with R.这是使用 R 附带的 iris 示例数据集的示例。 Here the tertiles will be based on the variable Petal Length.在这里,三分位数将基于可变花瓣长度。

# generate tertile limits using the quantile function,
# with proportion spacing of 0 to 1 at .33 intervals.
# These 4 values represent the start and end points in terms of Petal Length,
# of the three terriles.
tertile_limits <- quantile(iris$Petal.Length, seq(0, 1, 1/3), na.rm = TRUE)

# use the tertile start and end points (4 points, which creates 3 intervals)
# to create a new factor in the dataset
# The three tertiles are also explicitly labelled Low, Medium, and High, though this is optional.
iris$Petal.Length.Tertiles <- cut(iris$Petal.Length, tertile_limits, c('Low', 'Medium', 'High'), include.lowest = TRUE)

You can get the tertiles using the quantile function and then assign groups using the cut function.您可以使用quantile function 获得三分位数,然后使用cut function 分配组。 Here is an example using mtcars and mpg:这是一个使用 mtcars 和 mpg 的示例:

cars <- mtcars
breaks <- quantile(cars$mpg, c(.33, .67, 1))
breaks <- c(0, breaks)
labels <- c('low', 'medium', 'high')
cuts <- cut(cars$mpg, breaks = breaks, labels = labels)
cars <- cbind(cars, cuts)
head(cars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb   cuts
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 medium
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 medium
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   high
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 medium
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 medium
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 medium

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM