简体   繁体   English

如何根据 R 中的列值范围拆分数据框?

[英]How do I split a data frame based on range of column values in R?

I have a data set like this:我有一个这样的数据集:

Users   Age
1        2
2        7
3        10
4        3
5        8
6        20

How do I split this data set into 3 data sets where the first consists of all users with ages between 0–5, second is 6–10 and third is 11–15?如何将此数据集拆分为 3 个数据集,其中第一个包含年龄在 0-5 岁之间的所有用户,第二个是 6-10 岁,第三个是 11-15 岁?

You can combine split with cut to do this in a single line of code, avoiding the need to subset with a bunch of different expressions for different data ranges:您可以将splitcut结合起来在一行代码中完成此操作,从而避免使用针对不同数据范围的一系列不同表达式进行子集化的需要:

split(dat, cut(dat$Age, c(0, 5, 10, 15), include.lowest=TRUE))
# $`[0,5]`
#   Users Age
# 1     1   2
# 4     4   3
# 
# $`(5,10]`
#   Users Age
# 2     2   7
# 3     3  10
# 5     5   8
# 
# $`(10,15]`
# [1] Users Age  
# <0 rows> (or 0-length row.names)

cut splits up data based on the specified break points, and split splits up a data frame based on the provided categories. cut根据指定的断点split数据, split根据提供的类别拆分数据框。 If you stored the result of this computation into a list called l , you could access the smaller data frames with l[[1]] , l[[2]] , and l[[3]] or the more verbose:如果将此计算的结果存储到名为l的列表中,则可以使用l[[1]]l[[2]]l[[3]]或更详细地访问较小的数据帧:

l$`[0,5]`
l$`(5,10]`
l$`(10, 15]`

First, here's your dataset for my purposes: foo=data.frame(Users=1:6,Age=c(2,7,10,3,8,20))首先,这是我的数据集: foo=data.frame(Users=1:6,Age=c(2,7,10,3,8,20))

Here's your first dataset with ages 0–5: subset(foo,Age<=5&Age>=0)这是您的第一个年龄为 0-5 岁的数据集: subset(foo,Age<=5&Age>=0)

  Users Age
1     1   2
4     4   3

Here's your second with ages 6–10: subset(foo,Age<=10&Age>=6)这是你 6-10 岁的第二个: subset(foo,Age<=10&Age>=6)

  Users Age
2     2   7
3     3  10
5     5   8

Your third (using subset(foo,Age<=15&Age>=11) ) is empty – your last Age observation is over 15.你的第三个(使用subset(foo,Age<=15&Age>=11) )是空的——你最后一次观察Age超过 15 岁。

Note also that fractional ages between 5 and 6 or 10 and 11 (eg, 5.1, 10.5) would be excluded, as this code matches your question very literally.另请注意,将排除 5 到 6 或 10 到 11 之间的小数年龄(例如,5.1、10.5),因为此代码非常符合您的问题。 If you'd want someone with an age less than 6 to go in the first group, just amend that code to subset(foo,Age<6&Age>=0) .如果您希望年龄小于 6 岁的人进入第一组,只需将该代码修改为subset(foo,Age<6&Age>=0) If you'd prefer a hypothetical person with Age=5.1 in the second group, that group's code would be subset(foo,Age<=10&Age>5) .如果您更喜欢第二组中Age=5.1的假设人,则该组的代码将是subset(foo,Age<=10&Age>5)

We could also use the between function from the data.table package.我们也可以使用data.table包中的between函数。

# Create a data frame
dat <- data.frame(Users = 1:7, Age = c(2, 7, 10, 3, 8, 12, 15))

# Convert the data frame to data table by reference
# (data.table is also a data.frame)
setDT(dat)

# Define a list with the cut pairs
cuts <- list(c(0, 5), c(6, 10), c(11, 15))

# Cycle through dat and cut it into list of data tables by the values in Age
# matching the defined cuts
lapply(X = cuts, function(i) {
  dat[between(x = dat[ , Age], lower = i[1], upper = i[2])]
})

Output:输出:

[[1]]
   Users Age
1:     1   2
2:     4   3

[[2]]
   Users Age
1:     2   7
2:     3  10
3:     5   8

[[3]]
   Users Age
1:     6  12
2:     7  15

Many other things are possible, including doing it by group, data.table is rather flexible.许多其他事情都是可能的,包括按组进行, data.table相当灵活。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于R中的变量拆分数据帧 - How do I split a data frame based on a variable in R 根据列中的值范围在R中子集数据帧 - Subsetting data frame in R based on range of values in a column 根据R中的范围值加入2个数据框 - Join 2 data frame based on range values in R 如何在R中的2个变量的数据框中拆分一列,具体取决于一个变量 - How do I split a column in a data frame of 2 variables, depending on one variable, in R 如何在R中的数据框中使用mutate根据第二列的值更新列 - How do I use mutate in a data frame in R to update column based on value of a second column 如何根据另一列中的部分字符串向 R 中的数据框添加一列? - How do I add a column to a data frame in R based on a partial string in another column? 将日期范围内数据框某些列的值相乘,并基于 R 中另一列的值 - Multiply values of some columns of a data frame within a date range and based on the values of another column in R 如何根据使用 R 与第三列的匹配,将数据框中多列的值替换为第二列中的值? - How do I replace values across multiple columns in a data-frame with values from a second column, based on a match with a third column using R? R:如何根据给定列的值删除数据框的行 - R: How to delete rows of a data frame based on the values of a given column 在R中,如何基于数据帧中的值创建多个随机值向量? - In R, how do I create multiple vectors of random values based on values from a data frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM