[英]Restructuring values in unequal buckets by aggregating on a column in R
我有一個如下所示的數據集:
| Id | Name | Date_diff |
|----|:-----:|----------:|
| 50 | David | 0 |
| 50 | David | -16 |
| 50 | David | -4 |
| 50 | David | -1 |
| 50 | David | 0 |
| 50 | David | -2 |
| 84 | Ron | -11 |
| 84 | Ron | -12 |
| 84 | Ron | -168 |
| 84 | Ron | -8 |
| 84 | Ron | 16 |
| 84 | Ron | NA |
可重現的代碼是:
df= data.frame(Id= c('50','84'), Name= c('David','Ron'))
df=df[rep(seq_len(nrow(df)),each=6),]
Date_diff= c(0,-16,-4,-1,0,-2,-11,-12,-168,-8,16,'NA')
df=data.frame(df,Date_diff)
現在,對於每個 Id,我需要創建不同的不等桶列,這些列將包含“Date-diff”列中的值計數。 存儲桶范圍需要是 'NA'、'>0'、'0'、'-1'、'-2 到 -3'、'-4 到 -6'、'-7 到 -12' 和 '> -12'。 還將有一個附加列“總計”,用於保存存儲桶中存在的總和值。
例如,當我們考慮 Id=50 時,我們看到值 '0' 有 2 個計數會落在桶 '0' 中,值 '-16' 有 1 個計數會落在桶中 '> 0', 1 計數值 -4,該值將落在“-4 到 -6”范圍內,依此類推。 決賽桌應如下所示:
| Id | Name | NA | >0 | 0 | -1 | -2 to -3 | -4 to -6 | -7 to -12 | >-12 | Total |
|----|:-----:|---:|----|---|----|----------|----------|-----------|------|-------|
| 50 | David | 0 | 0 | 2 | 1 | 1 | 1 | 0 | 1 | 6 |
| 84 | Ron | 1 | 1 | 0 | 0 | 0 | 0 | 3 | 1 | 6 |
我最初嘗試創建一個新列來對其中 'Date_diff' 中的值進行分類,但是在中斷中提供的值可能是錯誤的。 這是我嘗試過的:
df <- transform(df, group=cut(Date_diff, breaks=c(-Inf,-13,-7,-4,-2,-1,Inf),
labels=c('<-12', '-7 to -12','-4 to -6','-2 to -3', '-1','>0')))
有人可以讓我知道如何達到預期的結果嗎?
問題之一是將'NA'
作為字符串而不是NA
。 這是一個解決方案:
df <- data.frame(
id = c('50', '84'),
name = c('david', 'ron'),
date_diff = c(0, -16, -4, -1, 0, -2, -11, -12, -168, -8, 16, na)
)
library(dplyr)
library(tidyr)
df %>%
mutate(
group = cut(
Date_diff,
breaks = c(-Inf,-13,-7,-4,-2,-1,Inf),
labels = c('<-12', '-7 to -12','-4 to -6','-2 to -3', '-1','>0')
),
group = if_else(is.na(group), "NA", as.character(group))
) %>%
group_by(Id, Name, group) %>%
summarise(n = n()) %>%
mutate(Total = sum(n, na.rm = T)) %>%
pivot_wider(names_from = group, values_from = n)
原始cut
分配可以within
幫助下重新分配特殊組, table
以按剪切組計算值,並reshape
以將長格式轉換為寬格式
cut
+ within
# CREATE CUT COLUMN
df <- within(df, {
Group <- as.character(cut(Date_diff,
breaks=c(-Inf,-13,-7,-4,-2,-1,Inf),
labels=c('<-12','-7 to -12','-4 to -6',
'-2 to -3','-1','>0')))
# ADJUST FOR ZERO AND ONE GROUPING
Group <- ifelse(Date_diff == 0, "0", ifelse(Date_diff == 1, "1", Group))
})
# TABULATE COUNTS
tbl_df <- transform(data.frame(table(Name=df$Name, Group = df$Group, useNA = "ifany"),
stringsAsFactors = FALSE),
Group = ifelse(is.na(Group), "NA", as.character(Group)))
# RESHAPE
final_df <- reshape(tbl_df, v.names = "Freq", timevar = "Group", idvar = "Name",
direction = "wide", sep = "_")
# REORDER AND RENAME COLUMNS
cols <- c("NA", ">0", "0", "-1", "-2 to -3", "-4 to -6", "-7 to -12", "<-12")
final_df <- setNames(final_df[c("Name", paste0("Freq_", cols))],
c("Name", gsub("Freq_", "", cols)))
# ADD TOTAL AND ID COLUMNS
final_df$Total <- rowSums(final_df[-1])
final_df <- merge(unique(df[c("Id", "Name")]), final_df, by="Name")[c("Id", "Name", cols)]
輸出
final_df
# Name Id NA >0 0 -1 -2 to -3 -4 to -6 -7 to -12 <-12 Total
# 1 David 50 0 0 2 1 1 1 0 1 6
# 2 Ron 84 1 1 0 0 0 0 3 1 6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.