简体   繁体   English

R:Function 将具有相同数据的列合并

[英]R: Function to combine columns with identical data

Link to original post 链接到原始帖子

In my previous post (see link above), I wanted to know how to combine columns that have the same data and change the column name to reflect the range.在我之前的帖子(见上面的链接)中,我想知道如何组合具有相同数据的列并更改列名以反映范围。 I started with a function that produced我从生产的 function 开始在此处输入图像描述

and the accepted answer produced my desired output并且接受的答案产生了我想要的 output 在此处输入图像描述

I applied that same function我应用了同样的 function

library(dplyr)
library(flextable)
library(stringr)
library(tidyverse)

Shop_fcn <- function(data){
  data %>%
    group_by(Day) %>%
    mutate(N_nam = n_distinct(Names)) %>%
    group_by(Names, Day, N_nam, Store, Item) %>%
    summarize(n_item = n()) %>%
    group_by(Day, N_nam, Store, Item) %>%
    summarize(n_nam = n(),
              n_item = sum(n_item))%>%
    mutate(pct = round(n_nam/N_nam*100,digits = 1),
           txt = paste0( n_nam, " (", pct, "%)"),
           Day_n = (paste0("Day ", Day," (N=",  N_nam, ")")))%>%
    ungroup %>% 
    select(Day_n , Store, Item, txt) %>%
    group_by(Store, Item, txt) %>%
    summarise(Day_n = if(n() > 1) 
      sprintf('Day %s %s', paste(range(readr::parse_number(unique(Day_n))), 
                                 collapse=' - '), 
              str_remove(first(Day_n), '^[^(]+')) else Day_n) %>%
    pivot_wider(values_from = txt, names_from = Day_n) %>%
    mutate_at(vars(starts_with(c("Day"))), ~if_else(is.na(.), "", .)) %>%
    arrange(Store, Item) %>% 
    group_by(store2 = Store) %>% 
    mutate(Store = if_else(row_number() != 1, "", Store))%>%
    ungroup() %>%
    select(Store, Item, str_sort(names(.)[-(1:2)], numeric = TRUE), -store2)
  
}


to a larger data set到更大的数据集

Names <- as.character(c('Adam','Morticia','Morticia','Morticia','Morticia','Morticia','Morticia','Morticia',
'Morticia','Morticia','Morticia','Morticia','Mickey','Minnie','Minnie','Minnie','Minnie','Minnie',
'Lucy', 'Lucy','Lucy','Morticia','Morticia','Morticia','Adam','Gomez','Olive','Olive','Olive',  
'Ricky','Morticia','Adam','Eve','Ricky','Morticia','Morticia','Minnie','Adam','Lucy','Ricky',
'Ricky','Ricky','Ricky','Ricky','Minnie','Adam','Adam', 'Morticia', 'Adam', 'Adam', 'Adam', 'Adam', 
'Adam','Lucy','Olive','Eve','Gomez','Morticia','Mickey','Olive'))

Day <- as.numeric(c(1,1,2,3,6,8,9,10,11,12,13,14,1,1,2,5,6,14,1,1,14,4,4,4,2,1,1,1,14,1,5,2,    
       1,1,4,5,3,2,1,1,14,14,14,14,4,2,2,4,2,2,2,2,14,1,1,14,14,7,14,1))

Store <- as.character(c('None','None','None','None','None','None','None','None','None','None',
'None','None','None','None','None','None','None','None','ACE','ACE','ACE','ACE','Amazon','Amazon',
'Best Buy','CVS','Hobby Lobby','Hobby Lobby','Hobby Lobby','Home Depot','Home Depot',   
'Ikea','Ikea','Ikea','Ikea','Ikea','Ikea','Lowes','Lowes','Petco','Petco','Petco','Petco',
'Petco','Petco','Target','Target','Target','Walgreens','Walgreens','Walgreens','Walgreens',
'Walgreens','Walgreens','Walgreens','Walmart','Walgreens','Walgreens','Walgreens','Walgreens'))

Item <- as.character(c('None','None','None','None','None','None','None','None','None','None','None','None',
'None','None', 'None','None','None','None', 'Hammer','Nails','Plywood', 'Bricks','Frame','Batteries','TV','Advil',
'Brush','Paint','Paint','Level','Wrench','Pillow',  'Blanket','Lamp','Vase','Table','Chair','Screwdriver','Plunger','Cat food',  
'Cat litter','Goldfish','Dog food','Dog treat','Hamster','Rug','Vacuum',
 'Gloves','Tylenol','Napkins','Benadryl','Soap','Soap','Shampoo','Conditioner','Lotion',    
'Lotion','Foil','Lotion','Foil'))


Shop_list <- as.data.frame(cbind(Names, Day, Store, Item), stringsAsFactors=FALSE)
Shop_day<- Shop_list %>%
  bind_rows() %>%
  Shop_fcn ()

flextable(Shop_day)

and got the following并得到以下在此处输入图像描述

Days 1- 14 and Days 3 - 5 should not have been combined第 1-14 天和第 3-5 天不应合并

Applying my original function gets me closer to my desired output,应用我原来的 function 让我更接近我想要的 output,


Shop_fcn <- function(data){
  data %>%
    group_by(Day) %>%
    mutate(N_nam = n_distinct(Names)) %>%
    group_by(Names, Day, N_nam, Store, Item) %>%
    summarize(n_item = n()) %>%
    group_by(Day, N_nam, Store, Item) %>%
    summarize(n_nam = n(),
              n_item = sum(n_item))%>%
    mutate(pct = round(n_nam/N_nam*100,digits = 1),
           txt = paste0( n_nam, " (", pct, "%)"),
           Day_n = (paste0("Day ", Day," (N=",  N_nam, ")")))%>%
    ungroup %>% select(Day_n , Store, Item, txt) %>%
    pivot_wider(values_from = txt, names_from = Day_n) %>%
    mutate_at(vars(starts_with(c("Day"))), ~if_else(is.na(.), "", .)) %>%
    arrange(Store, Item) %>% 
    group_by(store2 = Store) %>% 
    mutate(Store = if_else(row_number() != 1, "", Store))%>%
    ungroup() %>% select(-store2)
}
Shop_day<- Shop_list %>%
  bind_rows() %>%
  Shop_fcn ()

flextable(Shop_day)

在此处输入图像描述 however, I'm now stuck with the same problem of combining identical days (specifically, columns Days 8-13) and the new issue of the Days not being ordered 1-14.但是,我现在遇到了相同的问题,即合并相同的日期(特别是第 8-13 天)和新一期未订购 1-14 天的新问题。

I'm not sure if the best solution would be to modify the function, or to apply a new function to the flextable to combine columns and the respective column names.我不确定最好的解决方案是修改 function,还是将新的 function 应用于 flextable 以组合列和相应的列名。

I tried to remove the duplicate columns, but still couldn't come up with a solution on how to preserve the names of the duplicated columns appear as a range or how to get the columns in the proper order.我试图删除重复的列,但仍然无法想出一个解决方案来解决如何将重复列的名称保留为一个范围或如何以正确的顺序获取列。

Shop_nodup <- Shop_day[!duplicated(as.list(Shop_day))]
flextable(Shop_nodup)

在此处输入图像描述

  • The reason why the column names are not in order is because Day column is of type character instead of numeric.列名不按顺序排列的原因是因为Day列是字符类型而不是数字类型。 Turning it to numeric class will bring them in required order.将其转换为数字 class 将使它们按所需顺序排列。 Numbers turn into characters because in your data generation code you are using as.data.frame(cbind(....)) where cbind converts data to matrix and as matrix can have data of only type it would turn numbers into character.数字变成字符,因为在您的数据生成代码中,您使用as.data.frame(cbind(....))其中cbind将数据转换为矩阵,并且由于矩阵可以包含只有类型的数据,它会将数字转换为字符。 Instead you should have used data.frame(....) which would have kept the type of classes intact.相反,您应该使用data.frame(....)来保持类的类型不变。

  • To combine day columns together with similar values I use rleid after creating a unique key from the values in each day.要将日期列与相似的值组合在一起,我在从每天的值创建唯一键后使用rleid

The function that you can use is -您可以使用的 function 是 -

library(tidyverse)
library(data.table)
library(flextable)

Shop_fcn <- function(data){
  Shop_list %>%
    group_by(Day = as.numeric(Day)) %>%
    mutate(N_nam = n_distinct(Names)) %>%
    group_by(Names, Day, N_nam, Store, Item) %>%
    summarize(n_item = n()) %>%
    group_by(Day, N_nam, Store, Item) %>%
    summarize(n_nam = n(),
              n_item = sum(n_item)) %>%
    ungroup -> tmp
  
  tmp %>%
    group_by(Day) %>%
    summarise(txt = paste(n_nam, n_item, Store, Item, sep = '-', collapse = ',')) %>%
    mutate(grp = rleid(txt)) %>%
    select(-txt) %>%
    left_join(tmp, by = 'Day') %>%
    group_by(grp) %>%
    mutate(pct = round(n_nam/N_nam*100,digits = 1),
           txt = paste0( n_nam, " (", pct, "%)"),
           Day_n = if(n_distinct(Day) > 1) sprintf('Day %s - %s (N = %s)', first(Day), last(Day), N_nam) else sprintf('Day %s (N=%s)', Day, N_nam)) %>% 
    ungroup %>% 
    select(Day_n, Store, Item, txt) %>%
    pivot_wider(values_from = txt, names_from = Day_n, values_fn = first, values_fill = '') %>%
    arrange(Store, Item) %>% 
    group_by(Store) %>% 
    mutate(Store = if_else(row_number() != 1, "", Store)) %>%
    ungroup()
}

For the data in your previous post this returns -对于您上一篇文章中的数据,这将返回 -

Shop_day<- Shop_list %>% Shop_fcn
flextable(Shop_day)

在此处输入图像描述

For the data in this post it returns -对于这篇文章中的数据,它返回 -

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM