在R中合並具有相同但不同大小寫列的2個數據框

Question

我有兩個數據框，但問題是合並“ by”列在不同情況下具有值。

sn1capx1e0001和SN1CAPX1E0001。

authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))

books <- data.frame(
name = I(c("tukey", "venables", "tierney",
           "tipley", "ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
          "Modern Applied Statistics ...",
          "LISP-STAT",
          "Spatial Statistics", "Stochastic Simulation",
          "Interactive Data Analysis",
          "An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
                 "Venables & Smith"))
m1 <- merge(authors, books, by.x = "surname", by.y = "name")

給

姓死者頭銜其他

McNeil Australia沒有交互式數據分析NA

所以我想通過不區分大小寫來合並它們。 我無法使用合並或聯接。

我看到我們可以使用正則表達式使用循環來匹配值。

Answer 1

為什么不轉換它們，使它們具有相同的形式？

library(stringr)

authors <- data.frame(
  surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
  nationality = c("US", "Australia", "US", "UK", "Australia"),
  deceased = c("yes", rep("no", 4)))

books <- data.frame(
  name = I(c("tukey", "venables", "tierney",
             "tipley", "ripley", "McNeil", "R Core")),
  title = c("Exploratory Data Analysis",
            "Modern Applied Statistics ...",
            "LISP-STAT",
            "Spatial Statistics", "Stochastic Simulation",
            "Interactive Data Analysis",
            "An Introduction to R"),
  other.author = c(NA, "Ripley", NA, NA, NA, NA,
                   "Venables & Smith"))

authors$surname <- str_to_title(authors$surname)
books$name <- str_to_title(books$name)

m1 <- merge(authors, books, by.x = "surname", by.y = "name")

給

   surname nationality deceased                         title other.author
1   Mcneil   Australia       no     Interactive Data Analysis         <NA>
2   Ripley          UK       no         Stochastic Simulation         <NA>
3  Tierney          US       no                     LISP-STAT         <NA>
4    Tukey          US      yes     Exploratory Data Analysis         <NA>
5 Venables   Australia       no Modern Applied Statistics ...       Ripley

Answer 2

我發現這很簡單

都使用“ toupper（）”進行隱蔽

books$name<-toupper(books$name)

簡單....

在R中合並具有相同但不同大小寫列的2個數據框

問題描述

2 個解決方案

解決方案1
1 2017-07-14 11:18:22

解決方案2
1 2017-07-16 11:41:39

在R中合並具有相同但不同大小寫列的2個數據框

問題描述

2 個解決方案

解決方案1 1 2017-07-14 11:18:22

解決方案2 1 2017-07-16 11:41:39

解決方案1
1 2017-07-14 11:18:22

解決方案2
1 2017-07-16 11:41:39