[英]R: Function to combine columns with identical data
在我之前的帖子(见上面的链接)中,我想知道如何组合具有相同数据的列并更改列名以反映范围。 我从生产的 function 开始
我应用了同样的 function
library(dplyr)
library(flextable)
library(stringr)
library(tidyverse)
Shop_fcn <- function(data){
data %>%
group_by(Day) %>%
mutate(N_nam = n_distinct(Names)) %>%
group_by(Names, Day, N_nam, Store, Item) %>%
summarize(n_item = n()) %>%
group_by(Day, N_nam, Store, Item) %>%
summarize(n_nam = n(),
n_item = sum(n_item))%>%
mutate(pct = round(n_nam/N_nam*100,digits = 1),
txt = paste0( n_nam, " (", pct, "%)"),
Day_n = (paste0("Day ", Day," (N=", N_nam, ")")))%>%
ungroup %>%
select(Day_n , Store, Item, txt) %>%
group_by(Store, Item, txt) %>%
summarise(Day_n = if(n() > 1)
sprintf('Day %s %s', paste(range(readr::parse_number(unique(Day_n))),
collapse=' - '),
str_remove(first(Day_n), '^[^(]+')) else Day_n) %>%
pivot_wider(values_from = txt, names_from = Day_n) %>%
mutate_at(vars(starts_with(c("Day"))), ~if_else(is.na(.), "", .)) %>%
arrange(Store, Item) %>%
group_by(store2 = Store) %>%
mutate(Store = if_else(row_number() != 1, "", Store))%>%
ungroup() %>%
select(Store, Item, str_sort(names(.)[-(1:2)], numeric = TRUE), -store2)
}
到更大的数据集
Names <- as.character(c('Adam','Morticia','Morticia','Morticia','Morticia','Morticia','Morticia','Morticia',
'Morticia','Morticia','Morticia','Morticia','Mickey','Minnie','Minnie','Minnie','Minnie','Minnie',
'Lucy', 'Lucy','Lucy','Morticia','Morticia','Morticia','Adam','Gomez','Olive','Olive','Olive',
'Ricky','Morticia','Adam','Eve','Ricky','Morticia','Morticia','Minnie','Adam','Lucy','Ricky',
'Ricky','Ricky','Ricky','Ricky','Minnie','Adam','Adam', 'Morticia', 'Adam', 'Adam', 'Adam', 'Adam',
'Adam','Lucy','Olive','Eve','Gomez','Morticia','Mickey','Olive'))
Day <- as.numeric(c(1,1,2,3,6,8,9,10,11,12,13,14,1,1,2,5,6,14,1,1,14,4,4,4,2,1,1,1,14,1,5,2,
1,1,4,5,3,2,1,1,14,14,14,14,4,2,2,4,2,2,2,2,14,1,1,14,14,7,14,1))
Store <- as.character(c('None','None','None','None','None','None','None','None','None','None',
'None','None','None','None','None','None','None','None','ACE','ACE','ACE','ACE','Amazon','Amazon',
'Best Buy','CVS','Hobby Lobby','Hobby Lobby','Hobby Lobby','Home Depot','Home Depot',
'Ikea','Ikea','Ikea','Ikea','Ikea','Ikea','Lowes','Lowes','Petco','Petco','Petco','Petco',
'Petco','Petco','Target','Target','Target','Walgreens','Walgreens','Walgreens','Walgreens',
'Walgreens','Walgreens','Walgreens','Walmart','Walgreens','Walgreens','Walgreens','Walgreens'))
Item <- as.character(c('None','None','None','None','None','None','None','None','None','None','None','None',
'None','None', 'None','None','None','None', 'Hammer','Nails','Plywood', 'Bricks','Frame','Batteries','TV','Advil',
'Brush','Paint','Paint','Level','Wrench','Pillow', 'Blanket','Lamp','Vase','Table','Chair','Screwdriver','Plunger','Cat food',
'Cat litter','Goldfish','Dog food','Dog treat','Hamster','Rug','Vacuum',
'Gloves','Tylenol','Napkins','Benadryl','Soap','Soap','Shampoo','Conditioner','Lotion',
'Lotion','Foil','Lotion','Foil'))
Shop_list <- as.data.frame(cbind(Names, Day, Store, Item), stringsAsFactors=FALSE)
Shop_day<- Shop_list %>%
bind_rows() %>%
Shop_fcn ()
flextable(Shop_day)
第 1-14 天和第 3-5 天不应合并
应用我原来的 function 让我更接近我想要的 output,
Shop_fcn <- function(data){
data %>%
group_by(Day) %>%
mutate(N_nam = n_distinct(Names)) %>%
group_by(Names, Day, N_nam, Store, Item) %>%
summarize(n_item = n()) %>%
group_by(Day, N_nam, Store, Item) %>%
summarize(n_nam = n(),
n_item = sum(n_item))%>%
mutate(pct = round(n_nam/N_nam*100,digits = 1),
txt = paste0( n_nam, " (", pct, "%)"),
Day_n = (paste0("Day ", Day," (N=", N_nam, ")")))%>%
ungroup %>% select(Day_n , Store, Item, txt) %>%
pivot_wider(values_from = txt, names_from = Day_n) %>%
mutate_at(vars(starts_with(c("Day"))), ~if_else(is.na(.), "", .)) %>%
arrange(Store, Item) %>%
group_by(store2 = Store) %>%
mutate(Store = if_else(row_number() != 1, "", Store))%>%
ungroup() %>% select(-store2)
}
Shop_day<- Shop_list %>%
bind_rows() %>%
Shop_fcn ()
flextable(Shop_day)
但是,我现在遇到了相同的问题,即合并相同的日期(特别是第 8-13 天)和新一期未订购 1-14 天的新问题。
我不确定最好的解决方案是修改 function,还是将新的 function 应用于 flextable 以组合列和相应的列名。
我试图删除重复的列,但仍然无法想出一个解决方案来解决如何将重复列的名称保留为一个范围或如何以正确的顺序获取列。
Shop_nodup <- Shop_day[!duplicated(as.list(Shop_day))]
flextable(Shop_nodup)
列名不按顺序排列的原因是因为Day
列是字符类型而不是数字类型。 将其转换为数字 class 将使它们按所需顺序排列。 数字变成字符,因为在您的数据生成代码中,您使用as.data.frame(cbind(....))
其中cbind
将数据转换为矩阵,并且由于矩阵可以包含只有类型的数据,它会将数字转换为字符。 相反,您应该使用data.frame(....)
来保持类的类型不变。
要将日期列与相似的值组合在一起,我在从每天的值创建唯一键后使用rleid
。
您可以使用的 function 是 -
library(tidyverse)
library(data.table)
library(flextable)
Shop_fcn <- function(data){
Shop_list %>%
group_by(Day = as.numeric(Day)) %>%
mutate(N_nam = n_distinct(Names)) %>%
group_by(Names, Day, N_nam, Store, Item) %>%
summarize(n_item = n()) %>%
group_by(Day, N_nam, Store, Item) %>%
summarize(n_nam = n(),
n_item = sum(n_item)) %>%
ungroup -> tmp
tmp %>%
group_by(Day) %>%
summarise(txt = paste(n_nam, n_item, Store, Item, sep = '-', collapse = ',')) %>%
mutate(grp = rleid(txt)) %>%
select(-txt) %>%
left_join(tmp, by = 'Day') %>%
group_by(grp) %>%
mutate(pct = round(n_nam/N_nam*100,digits = 1),
txt = paste0( n_nam, " (", pct, "%)"),
Day_n = if(n_distinct(Day) > 1) sprintf('Day %s - %s (N = %s)', first(Day), last(Day), N_nam) else sprintf('Day %s (N=%s)', Day, N_nam)) %>%
ungroup %>%
select(Day_n, Store, Item, txt) %>%
pivot_wider(values_from = txt, names_from = Day_n, values_fn = first, values_fill = '') %>%
arrange(Store, Item) %>%
group_by(Store) %>%
mutate(Store = if_else(row_number() != 1, "", Store)) %>%
ungroup()
}
对于您上一篇文章中的数据,这将返回 -
Shop_day<- Shop_list %>% Shop_fcn
flextable(Shop_day)
对于这篇文章中的数据,它返回 -
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.