更快的方法？刪除 book1 中的行，將第 4 行的值作為列名，設置一些與 book2 相同的列名

Question

下面是我要刪除前 3 行的第一個數據框：

book1 <- structure(list(Instructions..xyz = c("Note: abc", "", "Set1", 
                                              "id", "632592651", "633322173", "634703802", "634927873", "635812953", 
                                              "636004739", "636101211", "636157799", "636263106", "636752420"
), X = c("", "", "", "title", "asdf", "cat", "dog", "mouse", 
         "elephant", "goose", "rat", "mice", "kitty", "kitten"), X.1 = c("", 
                                                                         "", "", "hazard", "y", "y", "y", "n", "n", "y", "y", "n", "n", 
                                                                         "y"), X.2 = c("", "", "Set2", "id", "632592651", "633322173", 
                                                                                       "634703802", "634927873", "635812953", "636004739", "636101211", 
                                                                                       "636157799", "636263106", "636752420"), X.3 = c("", "", "", "title", 
                                                                                                                                       "asdf2", "cat2", "dog2", "mouse2", "elephant2", "goose2", "rat2", 
                                                                                                                                       "mice2", "kitty2", "kitten2"), X.4 = c("", "", "", "index", "0.664883807", 
                                                                                                                                                                              "0.20089779", "0.752228086", "0.124729276", "0.626285086", "0.134537909", 
                                                                                                                                                                              "0.612526768", "0.769622463", "0.682532524", "0.819015658")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                -14L))

我做了book1 <- book1[-c(1:3),]但我不確定如何將 id、title、hazard、id、title、index 作為列名而不是 Instructions..xyz 等。請參閱下圖以獲得所需的輸出

然后對於第二個數據幀，

book2 <- structure(list(identity = c(632592651L, 633322173L, 634703802L, 
                                     634927873L, 635812953L, 636004739L, 636101211L, 636157799L, 636263106L, 
                                     636752420L, 636809222L, 2004722036L, 2004894388L, 2005045755L, 
                                     2005535472L, 2005630542L, 2005788781L, 2005809679L, 2005838317L, 
                                     2005866692L), text = c("asdf_xyz", "cat", "dog", "mouse", "elephant", 
                                                            "goose", "rat", "mice", "kitty", "kitten", "tiger_xyz", "lion", 
                                                            "leopard", "ostrich", "kangaroo", "platypus", "fish", "reptile", 
                                                            "mammals", "amphibians_xyz"), volume = c(1234L, 432L, 324L, 333L, 
                                                                                                     2223L, 412346L, 7456L, 3456L, 2345L, 2345L, 6L, 345L, 23L, 2L, 
                                                                                                     4778L, 234L, 8675L, 3459L, 8L, 9L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                              -20L))

然后，我重命名 book2 中的第 1 列和第 2 列，使其與 book1 的names(book2)[1:2] <- c('id','title')匹配，我稍后可以在其中執行 inner_join。 所需的輸出如下圖所示

library(dplyr)
book1 %>%
  inner_join(book2, by = c("id", "title"))

這需要很多步驟，並且想知道是否有簡化版本？

Answer 1

像這樣的東西？

# split the data by columns
book2a <- book1[-(1:4), 1:3]
book2b <- book1[-(1:4), 4:6]

# take care of names
names(book2a) <- book1[4, 1:3, drop = TRUE]
names(book2b) <- book1[4, 4:6, drop = TRUE]

# book2b needs processing
book2b$title <- sub("2", "", book2b$title)
book2b$index <- as.numeric(book2b$index)

# join both data sets and clean-up
book2 <- merge(book2a, book2b, all = TRUE)
rm(book2a, book2b)

book2
#>           id    title hazard     index
#> 1  632592651     asdf      y 0.6648838
#> 2  633322173      cat      y 0.2008978
#> 3  634703802      dog      y 0.7522281
#> 4  634927873    mouse      n 0.1247293
#> 5  635812953 elephant      n 0.6262851
#> 6  636004739    goose      y 0.1345379
#> 7  636101211      rat      y 0.6125268
#> 8  636157799     mice      n 0.7696225
#> 9  636263106    kitty      n 0.6825325
#> 10 636752420   kitten      y 0.8190157

^{由reprex 包於 2022-06-25 創建 (v2.0.1)}

Answer 2

找到第一個問題的解決方案

library(janitor)
book1 <- row_to_names(dat=book1, row_number=4, remove_row = TRUE, remove_rows_above = TRUE)

我申請了

names(book1)[4:5] <- c('id1','title1')

獲得唯一的列名，然后按照前面的建議嘗試 inner_join 但出現錯誤，發現 book1$id 是 book2$id 是 int 的字符，所以我做了

book1$id <- as.integer(book1$id)

最后它適用於

Yeah <- book1 %>%
  inner_join(book2, by = c("id", "title"))

下面的輸出：

 id    title hazard       id1    title1       index volume
1 633322173      cat      y 633322173      cat2  0.20089779    432
2 634703802      dog      y 634703802      dog2 0.752228086    324
3 634927873    mouse      n 634927873    mouse2 0.124729276    333
4 635812953 elephant      n 635812953 elephant2 0.626285086   2223
5 636004739    goose      y 636004739    goose2 0.134537909 412346
6 636101211      rat      y 636101211      rat2 0.612526768   7456
7 636157799     mice      n 636157799     mice2 0.769622463   3456
8 636263106    kitty      n 636263106    kitty2 0.682532524   2345
9 636752420   kitten      y 636752420   kitten2 0.819015658   2345

還在想有沒有更快的方法？

更快的方法？刪除 book1 中的行，將第 4 行的值作為列名，設置一些與 book2 相同的列名

問題描述

2 個解決方案

解決方案1
0 2022-06-25 11:18:48

解決方案2
0 已采納 2022-06-25 11:23:13

更快的方法？ 刪除 book1 中的行，將第 4 行的值作為列名，設置一些與 book2 相同的列名

問題描述

2 個解決方案

解決方案1 0 2022-06-25 11:18:48

解決方案2 0 已采納 2022-06-25 11:23:13

更快的方法？刪除 book1 中的行，將第 4 行的值作為列名，設置一些與 book2 相同的列名

解決方案1
0 2022-06-25 11:18:48

解決方案2
0 已采納 2022-06-25 11:23:13