[英]Quicker way? Remove rows in book1, take row 4 values as column name, set some column name same as book2
下面是我要刪除前 3 行的第一個數據框:
book1 <- structure(list(Instructions..xyz = c("Note: abc", "", "Set1",
"id", "632592651", "633322173", "634703802", "634927873", "635812953",
"636004739", "636101211", "636157799", "636263106", "636752420"
), X = c("", "", "", "title", "asdf", "cat", "dog", "mouse",
"elephant", "goose", "rat", "mice", "kitty", "kitten"), X.1 = c("",
"", "", "hazard", "y", "y", "y", "n", "n", "y", "y", "n", "n",
"y"), X.2 = c("", "", "Set2", "id", "632592651", "633322173",
"634703802", "634927873", "635812953", "636004739", "636101211",
"636157799", "636263106", "636752420"), X.3 = c("", "", "", "title",
"asdf2", "cat2", "dog2", "mouse2", "elephant2", "goose2", "rat2",
"mice2", "kitty2", "kitten2"), X.4 = c("", "", "", "index", "0.664883807",
"0.20089779", "0.752228086", "0.124729276", "0.626285086", "0.134537909",
"0.612526768", "0.769622463", "0.682532524", "0.819015658")), class = "data.frame", row.names = c(NA,
-14L))
我做了book1 <- book1[-c(1:3),]
但我不確定如何將 id、title、hazard、id、title、index 作為列名而不是 Instructions..xyz 等。請參閱下圖以獲得所需的輸出
然后對於第二個數據幀,
book2 <- structure(list(identity = c(632592651L, 633322173L, 634703802L,
634927873L, 635812953L, 636004739L, 636101211L, 636157799L, 636263106L,
636752420L, 636809222L, 2004722036L, 2004894388L, 2005045755L,
2005535472L, 2005630542L, 2005788781L, 2005809679L, 2005838317L,
2005866692L), text = c("asdf_xyz", "cat", "dog", "mouse", "elephant",
"goose", "rat", "mice", "kitty", "kitten", "tiger_xyz", "lion",
"leopard", "ostrich", "kangaroo", "platypus", "fish", "reptile",
"mammals", "amphibians_xyz"), volume = c(1234L, 432L, 324L, 333L,
2223L, 412346L, 7456L, 3456L, 2345L, 2345L, 6L, 345L, 23L, 2L,
4778L, 234L, 8675L, 3459L, 8L, 9L)), class = "data.frame", row.names = c(NA,
-20L))
然后,我重命名 book2 中的第 1 列和第 2 列,使其與 book1 的names(book2)[1:2] <- c('id','title')
匹配,我稍后可以在其中執行 inner_join。 所需的輸出如下圖所示
library(dplyr)
book1 %>%
inner_join(book2, by = c("id", "title"))
這需要很多步驟,並且想知道是否有簡化版本?
像這樣的東西?
# split the data by columns
book2a <- book1[-(1:4), 1:3]
book2b <- book1[-(1:4), 4:6]
# take care of names
names(book2a) <- book1[4, 1:3, drop = TRUE]
names(book2b) <- book1[4, 4:6, drop = TRUE]
# book2b needs processing
book2b$title <- sub("2", "", book2b$title)
book2b$index <- as.numeric(book2b$index)
# join both data sets and clean-up
book2 <- merge(book2a, book2b, all = TRUE)
rm(book2a, book2b)
book2
#> id title hazard index
#> 1 632592651 asdf y 0.6648838
#> 2 633322173 cat y 0.2008978
#> 3 634703802 dog y 0.7522281
#> 4 634927873 mouse n 0.1247293
#> 5 635812953 elephant n 0.6262851
#> 6 636004739 goose y 0.1345379
#> 7 636101211 rat y 0.6125268
#> 8 636157799 mice n 0.7696225
#> 9 636263106 kitty n 0.6825325
#> 10 636752420 kitten y 0.8190157
由reprex 包於 2022-06-25 創建 (v2.0.1)
找到第一個問題的解決方案
library(janitor)
book1 <- row_to_names(dat=book1, row_number=4, remove_row = TRUE, remove_rows_above = TRUE)
我申請了
names(book1)[4:5] <- c('id1','title1')
獲得唯一的列名,然后按照前面的建議嘗試 inner_join 但出現錯誤,發現 book1$id 是 book2$id 是 int 的字符,所以我做了
book1$id <- as.integer(book1$id)
最后它適用於
Yeah <- book1 %>%
inner_join(book2, by = c("id", "title"))
下面的輸出:
id title hazard id1 title1 index volume
1 633322173 cat y 633322173 cat2 0.20089779 432
2 634703802 dog y 634703802 dog2 0.752228086 324
3 634927873 mouse n 634927873 mouse2 0.124729276 333
4 635812953 elephant n 635812953 elephant2 0.626285086 2223
5 636004739 goose y 636004739 goose2 0.134537909 412346
6 636101211 rat y 636101211 rat2 0.612526768 7456
7 636157799 mice n 636157799 mice2 0.769622463 3456
8 636263106 kitty n 636263106 kitty2 0.682532524 2345
9 636752420 kitten y 636752420 kitten2 0.819015658 2345
還在想有沒有更快的方法?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.