簡體   English   中英

如何在 r 中使用 dplyr 將具有條件的多行折疊成一行?

[英]How to collapse multiple rows with condition into one row using dplyr in r?

我將用一個例子來說明我的問題。

樣本數據:

df <- data.frame(ID = 1:5, Description = c("'foo' is a dog", "'bar' is a dog", "'foo' is a cat", "'foo' is not a cat", "'bar' is a fish"), Category = c("A", "A", "B", "B", "C"))

> df
 ID      Description Category
1  1     'foo' is a dog        A
2  2     'bar' is a dog        A
3  3     'foo' is a cat        B
4  4     'foo' is not a cat    B
5  5     'bar' is a fish       C

我想要做的是折疊相同類別的類似描述/ID,預期輸出:

 ID  Category          Description
1 3     B        ‘foo’ is a cat    
2 1,2   A        ‘foo,bar’ is a dog
3 5     C        ‘bar’ is a fish   
4 4     B        ‘foo’ is not a cat

我想開始使用 dplyr,但我無法完全了解如何實現這一點,有人可以幫幫我嗎?

df %>% 
  group_by(Category) %>% 
  ## some condition to check if content outside of single quote are the same. 
  ## If so, collapse them into one row, otherwise, leave as it is. 
  ## The regex to get the content outside of single quote 
     `gsub("^'(.*?)'.*", "\\2", x)` 
  ## then collapse 
  summarise(new description = paste())

只要弄清楚,請隨時提出更好的解決方案:

df %>% 
  mutate(sec = gsub("^'.*?'(.*)", "\\1", Description),
         content = gsub("^'(.*?)'.*", "\\1", Description)) %>% 
  group_by(sec, Category) %>%
  summarise(
    ID=str_c(unique(ID), collapse=","),
    content=str_c(unique(content), collapse=",")) %>%
  mutate(Description=str_c(sQuote(content), sec)) %>%
  ungroup() %>%
  dplyr::select(ID, Category, Description)

這是實現輸出的另一種方法。

library(tidyverse)

df %>%
  mutate(value = str_extract(Description, "'\\w+'"), 
         Description = trimws(str_remove(Description, value))) %>%
  group_by(Description, Category) %>%
  summarise(ID = toString(ID), 
            value = sprintf("'%s'", toString(gsub("'", "", value)))) %>%
  unite(Description, value, Description, sep = ' ')

#  Description         Category ID   
#  <chr>               <chr>    <chr>
#1 'foo' is a cat      B        3    
#2 'foo, bar' is a dog A        1, 2 
#3 'bar' is a fish     C        5    
#4 'foo' is not a cat  B        4    

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM