根据R中的样本大小合并组

Question

I have a table in the following format. 我有一个以下格式的表。 I simplified it so illustrate the problem (number of samples are random, in my data they add up to 10000 but the structure is the same) 我对其进行了简化，以说明问题（样本数量是随机的，在我的数据中，它们的总和为10000，但结构相同）

# 0-5    5-10    10-15    15-20    20-25    25-30    30-35    35-40    40-45    45-50
# 700    1000    1400     1700     1900     1500     1000      300       50      1

The groups are created dynamically based on the min and max value of my input. 这些组是根据我输入的最小值和最大值动态创建的。 y refers to my input random sample. y是指我输入的随机样本。 I created this table using the following code. 我使用以下代码创建了该表。

groups <- seq(0, 50, (50-0) / 10)
assoc <- cut(sr$y, groups, include.lowest = TRUE)
tab <- tabulate(assoc, nbins = length(groups) -1 )

Now my goal is to merge the colums (and its samples) with the next one if it does not fullfill the condition of eg 100 samples. 现在，我的目标是将列（及其样本）与下一个合并，如果它不能满足例如100个样本的条件。 I got to the point of checking with a which: 我到了要检查的地方：

sn <- which(tab < 60) + 1

And now I am stuck with merging the colums and its sample data. 现在，我坚持合并各栏及其示例数据。 I really would appreciate some help. 我真的很感谢您的帮助。

Answer 1

One solution can be achieved using gather , separate , unite and spread from tidyr package. 可以使用实现一个解决方案gather ， separate ， unite和spread从tidyr包。

The approach is: 方法是：

Use Spread and separate to get data in row-wise format with from & to 用Spread和separate得到处理逐行格式的数据与from和to
Assign group by merging a row with samples less 100 with next row. 通过合并一行samples少于100行与下一行来分配group 。
Calculate min of from , max of to and sum of samples 计算min的from ， max的to和sum的samples
Finally unite and spread to get the data.frame in original format. 最后unite并spread以获得原始格式的data.frame。

Solution#1 解决方案＃1

library(dplyr)
library(tidyr)

gather(df, key, samples) %>%
separate(key, c("from", "to"), sep = "-") %>%
group_by(grp = ifelse(samples >= 100 | lag(samples)<100,row_number(), row_number()+1)) %>%
summarise(from = min(from), to = max(to), samples = sum(samples)) %>%
select(-grp) %>%
mutate(from = sprintf("%2s",from)) %>%
unite("key", from, to, sep="-") %>%
spread(key, samples) %>% as.data.frame()
#    0-5  5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-50
# 1  700  1000  1400  1700  1900  1500  1000   300    51

Solution#2: 解决方案＃2：

if OP's intention is to continue groping of columns until a target samples (eg 100) is reached then we need a custom function to create group. 如果OP的意图是继续对列进行摸索，直到达到目标样本（例如100个），那么我们需要一个自定义函数来创建组。 The function will be as: 该函数将为：

findGroup <- function(x, targetVal = 100){
  grp <- seq_along(x)
  for(i in seq_along(x[-length(x)])){
    if(x[i] < targetVal){
      x[i+1] = x[i+1] + x[i]
      grp[i+1] = grp[i]
    }
  }
  grp
}

# Use findGroup function to organize data. Just line with `group_by` has been changed.
gather(df, key, samples) %>%
  separate(key, c("from", "to"), sep = "-") %>%
  group_by(grp = findGroup(samples)) %>%
  summarise(from = min(from), to = max(to), samples = sum(samples)) %>%
  select(-grp) %>%
  mutate(from = sprintf("%2s",from)) %>%
  unite("key", from, to, sep="-") %>%
  spread(key, samples) %>% as.data.frame()

#    0-5  5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-50
# 1  700  1000  1400  1700  1900  1500  1000   300    51

Data 数据

df <- structure(list(`0-5` = 700L, `5-10` = 1000L,  `10-15` = 1400L, `15-20` = 1700L, 
                     `20-25` = 1900L, `25-30` = 1500L, `30-35` = 1000L, `35-40` = 300L, 
                     `40-45` = 50L, `45-50` = 1L), .Names = c("0-5", "5-10",
                     "10-15", "15-20", "20-25", "25-30", "30-35", "35-40", "40-45", 
                     "45-50"), class = "data.frame", row.names = 1L)
df
  #   0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
  # 1 700 1000  1400  1700  1900  1500  1000   300    50     1

根据R中的样本大小合并组

问题描述

1 个解决方案

解决方案1
0 2018-03-22 19:58:39

根据R中的样本大小合并组

问题描述

1 个解决方案

解决方案1 0 2018-03-22 19:58:39

解决方案1
0 2018-03-22 19:58:39