简体   繁体   中英

Combine row values into character vector by condition

I have a data.frame where values are repeated in col1 .

col1 <- c("A", "A", "B", "B", "C")
col2 <- c(1995, 1997, 1999, 2000, 2005)

df <- data.frame(col1, col2)

I want to combine values in col2 that correspond to the same letter in col1 into one cell, so that col2 shows a range of values for a particular letter in col1 . I do this by splitting the data.frame by col1 , applying fun , and binding the split data.frames back together.

library(tidyverse)

split_df <- split(df, df$col1)

fun <- function(df) {
  
  if (length(unique(df$col2)) > 1) {
    
    df$col2 <- paste(min(df$col2),
                     max(df$col2),
                     sep = "-")
    
    df <- distinct(df)
    
  }
  
  return(df)
}

split_df <- lapply(split_df, fun)

df <- do.call(rbind, split_df)

This works, but I am wondering if there is a more intuitive or more efficient solution?

Base R way using aggregate -

aggregate(col2~col1, df, function(x)  paste0(unique(range(x)), collapse = '-'))

#  col1      col2
#1    A 1995-1997
#2    B 1999-2000
#3    C      2005

Same can also be written with dplyr -

library(dplyr)

df %>% 
  group_by(col1) %>%
  summarise(col2 = paste0(unique(range(col2)), collapse = '-'))

One option would be the tidyverse, where you can accomplish this a little more succinctly. The basic idea is the same:

library(tidyverse)

new.result <- df %>% 
  group_by(col1) %>% 
  summarize(
    col2 = ifelse(n() == 1, as.character(col2), paste(min(col2), max(col2), sep = '-'))
  )

  col1  col2     
  <chr> <chr>    
1 A     1995-1997
2 B     1999-2000
3 C     2005     

A different (but possibly overcomplicated) approach assumes that you have at most two years per grouping. We can pivot the start and end years into their own columns, and then paste them together directly. This requires a little more data transformation but avoids having to check explicitly for groups with 1 year:

df %>% 
  group_by(col1) %>% 
  mutate(n = row_number()) %>% 
  pivot_wider(names_from = n, values_from = col2) %>% 
  rowwise() %>% 
  mutate(
    vec = list(c(`1`, `2`)),
    col2 = paste(vec[!is.na(vec)], collapse = '-')
  ) %>% 
  select(col1, col2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM