簡體   English   中英

"當 key 有空格時 left_join 產生 NA"

[英]left_join produces NAs when key has spaces

我從左連接中得到了意想不到的 NA 模式。 數據來自本周的整潔星期二<\/a>。

library(tidyverse)

breed_traits <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_traits.csv') %>%
  select(Breed, `Affectionate With Family`)

# A tibble: 195 × 2
   Breed                         `Affectionate With Family`
   <chr>                                              <dbl>
 1 Retrievers (Labrador)                                  5
 2 French Bulldogs                                        5
 3 German Shepherd Dogs                                   5
 4 Retrievers (Golden)                                    5
 5 Bulldogs                                               4
 6 Poodles                                                5
 7 Beagles                                                3
 8 Rottweilers                                            5
 9 Pointers (German Shorthaired)                          5
10 Dachshunds                                             5     

breed_rank_all <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_rank.csv') %>%
  select(Breed, `Rank 2013`)

# A tibble: 195 × 2
   Breed                         `2013 Rank`
   <chr>                               <dbl>
 1 Retrievers (Labrador)                   1
 2 French Bulldogs                        11
 3 German Shepherd Dogs                    2
 4 Retrievers (Golden)                     3
 5 Bulldogs                                5
 6 Poodles                                 8
 7 Beagles                                 4
 8 Rottweilers                             9
 9 Pointers (German Shorthaired)          13
10 Dachshunds                             10  

我發現了這個問題。 憑直覺,我調查了空​​白。

# space that isn't a space (like non-breaking space?)
utf8::utf8_print(breed_traits$Breed[1], utf8 = FALSE)
# [1] "Retrievers\u00a0(Labrador)"
# this is a non-breaking space

您可以用正則表達式替換不間斷空格。

(replSp = str_replace_all(string = breed_traits$Breed[1],
                pattern = "[[:space:]]",
                replacement = " ")) 
# [1] "Retrievers (Labrador)" 

breed_rank_all$Breed[[1]] == replSp
# [1] TRUE 

無法重現該行為。 我添加了我使用的數據框,可能原始數據包含奇怪的字符

library(dplyr)

left_join(breed_rank_all, breed_traits, "Breed")
                           Breed 2013 Rank Affectionate With Family
1          Retrievers (Labrador)         1                        5
2                French Bulldogs        11                        5
3           German Shepherd Dogs         2                        5
4            Retrievers (Golden)         3                        5
5                       Bulldogs         5                        4
6                        Poodles         8                        5
7                        Beagles         4                        3
8                    Rottweilers         9                        5
9  Pointers (German Shorthaired)        13                        5

數據

breed_traits <- structure(list(Breed = c(" Retrievers (Labrador)", " French Bulldogs",
" German Shepherd Dogs", " Retrievers (Golden)", " Bulldogs",
" Poodles", " Beagles", " Rottweilers", " Pointers (German Shorthaired)",
"Dachshunds"), `Affectionate With Family` = c(5L, 5L, 5L, 5L,
4L, 5L, 3L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-10L))

breed_rank_all <- structure(list(Breed = c(" Retrievers (Labrador)", " French Bulldogs",
" German Shepherd Dogs", " Retrievers (Golden)", " Bulldogs",
" Poodles", " Beagles", " Rottweilers", " Pointers (German Shorthaired)"
), `2013 Rank` = c(1L, 11L, 2L, 3L, 5L, 8L, 4L, 9L, 13L)), class = "data.frame", row.names = c(NA,
-9L))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM