![](/img/trans.png)
[英]How can I sum multiple rows of a dataset into one in R? based on column value
[英]How can I collapse multiple rows into one based on empty rows in another column in R?
我有以下數據框,它來自一個文本文件:
Account_No Title Date
52683 DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA. 8/03/2019
6224 KABIR IN MALWA 29/04/2015
25801 A VILLAGE IS A BUSY PLACE / GEETHA,V. 5/06/2020
11439 KABIR IN AMERICA 29/04/2015
25802 A VILLAGE IS A BUSY PLACE / GEETHA,V. 5/06/2020
7843 IN EVERY BODY KABIR 29/04/2015
13013 MOBY-DICK : A POP-UP BOOK / ITA,SAM. 22/01/2020
38110 DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND 29/11/2010
TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &
MYERSO.
38118 SCIENCE COMMUNICATION IN THEORY AND PRACTICE 6/12/2010
/ STOCKLMAYER, SUSAN M Et al (ED.
7844 KABIR IN THUMRI 29/04/2015
25042 TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH 13/04/2018
GUIDE / BAL,MIEKE.
001.3 BAL
在 excel 中,我使用 Data to Columns 使用固定寬度分隔符創建下表:
戶口號碼 | 標題 | 日期 |
---|---|---|
52683 | 水地形設計 / MATHUR,ANURADHA。 | 08-03-2019 |
6224 | 馬爾瓦的卡比爾 | 29-04-2015 |
25801 | 村庄是一個繁忙的地方 / GEETHA,V. | 05-06-2020 |
11439 | 卡比爾在美國 | 29-04-2015 |
25802 | 村庄是一個繁忙的地方 / GEETHA,V. | 05-06-2020 |
7843 | 在每個身體中 | 29-04-2015 |
13013 | MOBY-DICK:彈出式書/ITA,SAM。 | 22-01-2020 |
38110 | 發展設計和學科知識 | 29-11-2010 |
技術:結構 / OWEN-JACKSON, GWYNETH & | ||
邁爾索。 | ||
38118 | 理論與實踐中的科學傳播 | 06-12-2010 |
/ 斯德哥爾摩,蘇珊 M 等人(ED。 | ||
7844 | 圖姆里的卡比爾 | 29-04-2015 |
25042 | 人文旅行的概念——粗略 | 13-04-2018 |
指南 / 巴爾,米克。 | ||
001.3 巴爾 | ||
24655 | 評估研究:針對世衛組織人員的方法 | 05-11-2019 |
需要閱讀研究 / DANE,FRANCIS C。 | ||
001.4 丹 | ||
30170 | 案例研究方法/吉爾漢姆,比爾。 | 03-11-2011 |
這樣做的問題是,有些書的標題在多行中(例如“DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND...”,其中帳號為 38110),而它們應該與相應的 Account_No 在一行中。
我怎樣才能做到這一點?
數據輸出:
structure(list(Account_No = c("52683", "6224", "25801", "11439",
"25802", "7843", "13013", "38110", "", "", "38118", "", "7844",
"25042", "", "", "24655", "", "", "30170"), Title = c("DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.",
"KABIR IN MALWA", "A VILLAGE IS A BUSY PLACE / GEETHA,V.", "KABIR IN AMERICA",
"A VILLAGE IS A BUSY PLACE / GEETHA,V.", "IN EVERY BODY KABIR",
"MOBY-DICK : A POP-UP BOOK / ITA,SAM.", "DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND",
"TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &", "MYERSO.",
"SCIENCE COMMUNICATION IN THEORY AND PRACTICE", "/ STOCKLMAYER, SUSAN M Et al (ED.",
"KABIR IN THUMRI", "TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH",
"GUIDE / BAL,MIEKE.", "001.3 BAL", "EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO",
"NEED TO READ RESEARCH / DANE,FRANCIS C.", "001.4 DAN", "CASE STUDY RESEARCH METHODS / GILLHAM,BILL."
), Date = c("08-03-2019", "29-04-2015", "05-06-2020", "29-04-2015",
"05-06-2020", "29-04-2015", "22-01-2020", "29-11-2010", "", "",
"06-12-2010", "", "29-04-2015", "13-04-2018", "", "", "05-11-2019",
"", "", "03-11-2011")), row.names = c(NA, 20L), class = "data.frame")
``
可能有一些很好的方法可以做到這一點,但這有效。 我正在運行一個for
循環,但有人可以幫助我用更有效的方法替換它。
背后的邏輯是,首先我將空字符串""
轉換為NA
,然后這里需要修改的列號是第二列。 所以我正在從最后一行( nrow
將給我最后一行號)到第二行運行一個for
循環,即我將從最后一行向后執行操作。 因此,假設行號為 10,我將在第n-1
列中查找同一行,此處為2-1
is 1
。 如果在第一列中該行有NA
,那么我檢查同一行的第二列,如果不是NA
,則表示文本在下一行,所以我將把文本合並到上一行第二列,我重復這個過程,直到我到達第二行。
用文字解釋這一點對我來說有點棘手,但這個概念很簡單。
library(tidyverse)
df <- structure(list(Account_No = c("52683", "6224", "25801", "11439",
"25802", "7843", "13013", "38110", "", "", "38118", "", "7844",
"25042", "", "", "24655", "", "", "30170"), Title = c("DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.",
"KABIR IN MALWA", "A VILLAGE IS A BUSY PLACE / GEETHA,V.", "KABIR IN AMERICA",
"A VILLAGE IS A BUSY PLACE / GEETHA,V.", "IN EVERY BODY KABIR",
"MOBY-DICK : A POP-UP BOOK / ITA,SAM.", "DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND",
"TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &", "MYERSO.",
"SCIENCE COMMUNICATION IN THEORY AND PRACTICE", "/ STOCKLMAYER, SUSAN M Et al (ED.",
"KABIR IN THUMRI", "TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH",
"GUIDE / BAL,MIEKE.", "001.3 BAL", "EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO",
"NEED TO READ RESEARCH / DANE,FRANCIS C.", "001.4 DAN", "CASE STUDY RESEARCH METHODS / GILLHAM,BILL."
), Date = c("08-03-2019", "29-04-2015", "05-06-2020", "29-04-2015",
"05-06-2020", "29-04-2015", "22-01-2020", "29-11-2010", "", "",
"06-12-2010", "", "29-04-2015", "13-04-2018", "", "", "05-11-2019",
"", "", "03-11-2011")), row.names = c(NA, 20L), class = "data.frame")
roller_coaster <- function(df, col_numb){
for(i in nrow(df):2){
if(is.na(df[i,(col_numb-1)])){
if(!is.na(df[i,(col_numb)])){
paste(df[i-1,col_numb], df[i,col_numb], sep = ' ') -> df[i-1,col_numb]
NA -> df[i,col_numb]
}
}
}
df
}
df %>%
as_tibble() %>%
mutate(across(everything(), na_if, "")) %>%
roller_coaster(2) %>%
drop_na()
#> # A tibble: 13 x 3
#> Account_No Title Date
#> <chr> <chr> <chr>
#> 1 52683 DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA. 08-03-2…
#> 2 6224 KABIR IN MALWA 29-04-2…
#> 3 25801 A VILLAGE IS A BUSY PLACE / GEETHA,V. 05-06-2…
#> 4 11439 KABIR IN AMERICA 29-04-2…
#> 5 25802 A VILLAGE IS A BUSY PLACE / GEETHA,V. 05-06-2…
#> 6 7843 IN EVERY BODY KABIR 29-04-2…
#> 7 13013 MOBY-DICK : A POP-UP BOOK / ITA,SAM. 22-01-2…
#> 8 38110 DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND TECHNOLOGY : … 29-11-2…
#> 9 38118 SCIENCE COMMUNICATION IN THEORY AND PRACTICE / STOCKLMAY… 06-12-2…
#> 10 7844 KABIR IN THUMRI 29-04-2…
#> 11 25042 TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH GUIDE / … 13-04-2…
#> 12 24655 EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO NEED TO … 05-11-2…
#> 13 30170 CASE STUDY RESEARCH METHODS / GILLHAM,BILL. 03-11-2…
由代表 package (v0.3.0) 於 2021 年 2 月 11 日創建
以下是如何使用 tidyverse 說明:首先將 Account_No 的空白單元格更改為 NA,然后用上面的值填充 NA,直到下一個值,然后按 Account_No 分組,然后將 Title 的行組合到一行,然后調整 - > 得到所需 output
library(tidyverse)
df1 <- df %>%
mutate_at(vars(colnames(.)[names(.) %in% "Account_No"]),
.funs = funs(ifelse(.=="", NA, as.character(.)))) %>%
fill(Account_No) %>%
group_by(Account_No) %>%
summarise(Title = paste(Title, collapse = " ")) %>%
ungroup () %>%
left_join(df, df1, by="Account_No") %>%
select (Account_No, Title = Title.x, Date, -Title.y)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.