[英]Merging two dataframes with left_join produces NAs in 'right' columns
[英]left_join produces NAs when key has spaces
我從左連接中得到了意想不到的 NA 模式。 數據來自本周的整潔星期二<\/a>。
library(tidyverse)
breed_traits <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_traits.csv') %>%
select(Breed, `Affectionate With Family`)
# A tibble: 195 × 2
Breed `Affectionate With Family`
<chr> <dbl>
1 Retrievers (Labrador) 5
2 French Bulldogs 5
3 German Shepherd Dogs 5
4 Retrievers (Golden) 5
5 Bulldogs 4
6 Poodles 5
7 Beagles 3
8 Rottweilers 5
9 Pointers (German Shorthaired) 5
10 Dachshunds 5
breed_rank_all <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_rank.csv') %>%
select(Breed, `Rank 2013`)
# A tibble: 195 × 2
Breed `2013 Rank`
<chr> <dbl>
1 Retrievers (Labrador) 1
2 French Bulldogs 11
3 German Shepherd Dogs 2
4 Retrievers (Golden) 3
5 Bulldogs 5
6 Poodles 8
7 Beagles 4
8 Rottweilers 9
9 Pointers (German Shorthaired) 13
10 Dachshunds 10
我發現了這個問題。 憑直覺,我調查了空白。
# space that isn't a space (like non-breaking space?)
utf8::utf8_print(breed_traits$Breed[1], utf8 = FALSE)
# [1] "Retrievers\u00a0(Labrador)"
# this is a non-breaking space
您可以用正則表達式替換不間斷空格。
(replSp = str_replace_all(string = breed_traits$Breed[1],
pattern = "[[:space:]]",
replacement = " "))
# [1] "Retrievers (Labrador)"
breed_rank_all$Breed[[1]] == replSp
# [1] TRUE
無法重現該行為。 我添加了我使用的數據框,可能原始數據包含奇怪的字符
library(dplyr)
left_join(breed_rank_all, breed_traits, "Breed")
Breed 2013 Rank Affectionate With Family
1 Retrievers (Labrador) 1 5
2 French Bulldogs 11 5
3 German Shepherd Dogs 2 5
4 Retrievers (Golden) 3 5
5 Bulldogs 5 4
6 Poodles 8 5
7 Beagles 4 3
8 Rottweilers 9 5
9 Pointers (German Shorthaired) 13 5
breed_traits <- structure(list(Breed = c(" Retrievers (Labrador)", " French Bulldogs",
" German Shepherd Dogs", " Retrievers (Golden)", " Bulldogs",
" Poodles", " Beagles", " Rottweilers", " Pointers (German Shorthaired)",
"Dachshunds"), `Affectionate With Family` = c(5L, 5L, 5L, 5L,
4L, 5L, 3L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-10L))
breed_rank_all <- structure(list(Breed = c(" Retrievers (Labrador)", " French Bulldogs",
" German Shepherd Dogs", " Retrievers (Golden)", " Bulldogs",
" Poodles", " Beagles", " Rottweilers", " Pointers (German Shorthaired)"
), `2013 Rank` = c(1L, 11L, 2L, 3L, 5L, 8L, 4L, 9L, 13L)), class = "data.frame", row.names = c(NA,
-9L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.