簡體   English   中英

如何根據 R 中另一列中的空行將多行合並為一行?

[英]How can I collapse multiple rows into one based on empty rows in another column in R?

我有以下數據框,它來自一個文本文件:

Account_No            Title                                      Date

52683    DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.      8/03/2019
6224     KABIR IN MALWA                                        29/04/2015
25801    A VILLAGE IS A BUSY PLACE / GEETHA,V.                  5/06/2020
11439    KABIR IN AMERICA                                      29/04/2015
25802    A VILLAGE IS A BUSY PLACE / GEETHA,V.                  5/06/2020
7843     IN EVERY BODY KABIR                                   29/04/2015
13013    MOBY-DICK : A POP-UP BOOK / ITA,SAM.                  22/01/2020
38110    DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND            29/11/2010
         TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &
         MYERSO.
38118    SCIENCE COMMUNICATION IN THEORY AND PRACTICE           6/12/2010
          / STOCKLMAYER, SUSAN M Et al (ED.
7844     KABIR IN THUMRI                                       29/04/2015
25042    TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH       13/04/2018
         GUIDE / BAL,MIEKE.
         001.3 BAL

在 excel 中,我使用 Data to Columns 使用固定寬度分隔符創建下表:

戶口號碼 標題 日期
52683 水地形設計 / MATHUR,ANURADHA。 08-03-2019
6224 馬爾瓦的卡比爾 29-04-2015
25801 村庄是一個繁忙的地方 / GEETHA,V. 05-06-2020
11439 卡比爾在美國 29-04-2015
25802 村庄是一個繁忙的地方 / GEETHA,V. 05-06-2020
7843 在每個身體中 29-04-2015
13013 MOBY-DICK:彈出式書/ITA,SAM。 22-01-2020
38110 發展設計和學科知識 29-11-2010
技術:結構 / OWEN-JACKSON, GWYNETH &
邁爾索。
38118 理論與實踐中的科學傳播 06-12-2010
/ 斯德哥爾摩,蘇珊 M 等人(ED。
7844 圖姆里的卡比爾 29-04-2015
25042 人文旅行的概念——粗略 13-04-2018
指南 / 巴爾,米克。
001.3 巴爾
24655 評估研究:針對世衛組織人員的方法 05-11-2019
需要閱讀研究 / DANE,FRANCIS C。
001.4 丹
30170 案例研究方法/吉爾漢姆,比爾。 03-11-2011

這樣做的問題是,有些書的標題在多行中(例如“DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND...”,其中帳號為 38110),而它們應該與相應的 Account_No 在一行中。

我怎樣才能做到這一點?

數據輸出:

structure(list(Account_No = c("52683", "6224", "25801", "11439", 
"25802", "7843", "13013", "38110", "", "", "38118", "", "7844", 
"25042", "", "", "24655", "", "", "30170"), Title = c("DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.", 
"KABIR IN MALWA", "A VILLAGE IS A BUSY PLACE / GEETHA,V.", "KABIR IN AMERICA", 
"A VILLAGE IS A BUSY PLACE / GEETHA,V.", "IN EVERY BODY KABIR", 
"MOBY-DICK : A POP-UP BOOK / ITA,SAM.", "DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND", 
"TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &", "MYERSO.", 
"SCIENCE COMMUNICATION IN THEORY AND PRACTICE", "/ STOCKLMAYER, SUSAN M Et al (ED.", 
"KABIR IN THUMRI", "TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH", 
"GUIDE / BAL,MIEKE.", "001.3 BAL", "EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO", 
"NEED TO READ RESEARCH / DANE,FRANCIS C.", "001.4 DAN", "CASE STUDY RESEARCH METHODS / GILLHAM,BILL."
), Date = c("08-03-2019", "29-04-2015", "05-06-2020", "29-04-2015", 
"05-06-2020", "29-04-2015", "22-01-2020", "29-11-2010", "", "", 
"06-12-2010", "", "29-04-2015", "13-04-2018", "", "", "05-11-2019", 
"", "", "03-11-2011")), row.names = c(NA, 20L), class = "data.frame")
``

可能有一些很好的方法可以做到這一點,但這有效。 我正在運行一個for循環,但有人可以幫助我用更有效的方法替換它。

背后的邏輯是,首先我將空字符串""轉換為NA ,然后這里需要修改的列號是第二列。 所以我正在從最后一行( nrow將給我最后一行號)到第二行運行一個for循環,即我將從最后一行向后執行操作。 因此,假設行號為 10,我將在第n-1列中查找同一行,此處為2-1 is 1 如果在第一列中該行有NA ,那么我檢查同一行的第二列,如果不是NA ,則表示文本在下一行,所以我將把文本合並到上一行第二列,我重復這個過程,直到我到達第二行。

用文字解釋這一點對我來說有點棘手,但這個概念很簡單。

library(tidyverse)
df <- structure(list(Account_No = c("52683", "6224", "25801", "11439", 
                              "25802", "7843", "13013", "38110", "", "", "38118", "", "7844", 
                              "25042", "", "", "24655", "", "", "30170"), Title = c("DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.", 
                                                                                    "KABIR IN MALWA", "A VILLAGE IS A BUSY PLACE / GEETHA,V.", "KABIR IN AMERICA", 
                                                                                    "A VILLAGE IS A BUSY PLACE / GEETHA,V.", "IN EVERY BODY KABIR", 
                                                                                    "MOBY-DICK : A POP-UP BOOK / ITA,SAM.", "DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND", 
                                                                                    "TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &", "MYERSO.", 
                                                                                    "SCIENCE COMMUNICATION IN THEORY AND PRACTICE", "/ STOCKLMAYER, SUSAN M Et al (ED.", 
                                                                                    "KABIR IN THUMRI", "TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH", 
                                                                                    "GUIDE / BAL,MIEKE.", "001.3 BAL", "EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO", 
                                                                                    "NEED TO READ RESEARCH / DANE,FRANCIS C.", "001.4 DAN", "CASE STUDY RESEARCH METHODS / GILLHAM,BILL."
                              ), Date = c("08-03-2019", "29-04-2015", "05-06-2020", "29-04-2015", 
                                          "05-06-2020", "29-04-2015", "22-01-2020", "29-11-2010", "", "", 
                                          "06-12-2010", "", "29-04-2015", "13-04-2018", "", "", "05-11-2019", 
                                          "", "", "03-11-2011")), row.names = c(NA, 20L), class = "data.frame")


roller_coaster <- function(df, col_numb){
  for(i in nrow(df):2){
    if(is.na(df[i,(col_numb-1)])){
      if(!is.na(df[i,(col_numb)])){
        paste(df[i-1,col_numb], df[i,col_numb], sep = ' ') -> df[i-1,col_numb]
        NA -> df[i,col_numb]
      }
    }
  }
  df
}

df %>% 
  as_tibble() %>% 
  mutate(across(everything(), na_if, "")) %>% 
  roller_coaster(2) %>% 
  drop_na()
#> # A tibble: 13 x 3
#>    Account_No Title                                                     Date    
#>    <chr>      <chr>                                                     <chr>   
#>  1 52683      DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.         08-03-2…
#>  2 6224       KABIR IN MALWA                                            29-04-2…
#>  3 25801      A VILLAGE IS A BUSY PLACE / GEETHA,V.                     05-06-2…
#>  4 11439      KABIR IN AMERICA                                          29-04-2…
#>  5 25802      A VILLAGE IS A BUSY PLACE / GEETHA,V.                     05-06-2…
#>  6 7843       IN EVERY BODY KABIR                                       29-04-2…
#>  7 13013      MOBY-DICK : A POP-UP BOOK / ITA,SAM.                      22-01-2…
#>  8 38110      DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND TECHNOLOGY : … 29-11-2…
#>  9 38118      SCIENCE COMMUNICATION IN THEORY AND PRACTICE / STOCKLMAY… 06-12-2…
#> 10 7844       KABIR IN THUMRI                                           29-04-2…
#> 11 25042      TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH GUIDE / … 13-04-2…
#> 12 24655      EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO NEED TO … 05-11-2…
#> 13 30170      CASE STUDY RESEARCH METHODS / GILLHAM,BILL.               03-11-2…

代表 package (v0.3.0) 於 2021 年 2 月 11 日創建

以下是如何使用 tidyverse 說明:首先將 Account_No 的空白單元格更改為 NA,然后用上面的值填充 NA,直到下一個值,然后按 Account_No 分組,然后將 Title 的行組合到一行,然后調整 - > 得到所需 output

library(tidyverse)

df1 <- df %>% 
  mutate_at(vars(colnames(.)[names(.) %in% "Account_No"]),
            .funs = funs(ifelse(.=="", NA, as.character(.)))) %>% 
  fill(Account_No) %>% 
  group_by(Account_No) %>% 
  summarise(Title = paste(Title, collapse = " ")) %>% 
  ungroup () %>% 
  left_join(df, df1, by="Account_No") %>% 
  select (Account_No, Title = Title.x, Date, -Title.y)

在此處輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM