简体   繁体   中英

R joins returning all NA

I'm having a new issue where all my attempts to join data tables in R result in NA. I'm reasonably sure that this must be because of my join columns being different but I'm unsure how.

Data comes from CSVs, was originally factor but I've also tried converting it to character and joining.

Samples of data and what I've tried below

str(nst)
'data.frame':   890 obs. of  33 variables:
 $ X               : logi  NA NA NA NA NA NA ...
 $ Player          : chr  "Connor McDavid" "Claude Giroux" "Nikita Kucherov" "Evgeni Malkin" ...
 $ Team            : Factor w/ 88 levels "ANA","ANA, MTL",..: 42 73 82 74 32 60 49 74 87 74 ...
 $ Position        : Factor w/ 7 levels "C","C, L","C, R",..: 1 1 7 1 1 5 1 7 7 1 ...
 $ GP              : int  82 82 80 78 74 76 82 82 81 82 ...
 $ TOI             : num  1767 1670 1586 1481 1473 ...
 $ Goals           : int  41 34 39 42 39 39 35 34 23 29 ...

str(hockey_ref)
'data.frame':   1035 obs. of  28 variables:
 $ Rk    : int  1 2 2 2 3 4 5 6 7 7 ...
 $ Player: chr  "Justin Abdelkader" "Pontus Aberg" "Pontus Aberg" "Pontus Aberg" ...
 $ Age   : int  30 24 24 24 26 25 20 21 26 26 ...
 $ Pos   : Factor w/ 5 levels "C","D","LW","RW",..: 3 3 3 3 1

what I've tried:

merge1 <- merge(hockey_ref,nst,by.x='Player',by.y='Player',all=TRUE)

creates

   head(merge1)
        Player GP PIM       TOI  Rk Age  Pos   Tm  G  A PTS X... EV PP SH GW EV.1 PP.1 SH.1   S  S.  ATOI BLK HIT FOW FOL  FO.  PS  X Team
1   A.J. Greer 17  29  126.0000 315  21   LW  COL  0  3   3    2  0  0  0  0    3    0    0  13 0.0  7:24   5  30   1   2 33.3 0.2 NA <NA>
2   A.J. Greer 17  29  125.6833  NA  NA <NA> <NA> NA NA  NA   NA NA NA NA NA   NA   NA   NA  NA  NA  <NA>  NA  NA  NA  NA   NA  NA NA  COL
3 Aaron Ekblad 82  71 1918.0000 227  21    D  FLA 16 22  38    9 11  5  0  4   16    6    0 189 8.5 23:23 121  69   0   0   NA 7.8 NA <NA>
4 Aaron Ekblad 82  71 1917.9000  NA  NA <NA> <NA> NA NA  NA   NA NA NA NA NA   NA   NA   NA  NA  NA  <NA>  NA  NA  NA  NA   NA  NA NA  FLA

merge1 <- left_join(hockey_ref, nst, by = c("Player"="Player"))

creates

head(merge1)
  Rk            Player Age Pos  Tm GP.x  G  A PTS X... PIM.x EV PP SH GW EV.1 PP.1 SH.1   S   S. TOI.x  ATOI BLK HIT FOW FOL  FO.  PS  X Team
1  1 Justin Abdelkader  30  LW DET   75 13 22  35  -11    78  9  4  0  0   17    5    0 110 11.8  1241 16:33  40 174  47  50 48.5 2.5 NA <NA>
2  2      Pontus Aberg  24  LW TOT   53  4 12  16    9    10  4  0  0  3   11    1    0  70  5.7   645 12:10   8  24   4   8 33.3 1.3 NA <NA>
3  2      Pontus Aberg  24  LW NSH   37  2  6   8    8     8  2  0  0  2    6    0    0  39  5.1   411 11:06   7  16   4   6 40.0 0.6 NA <NA>
4  2      Pontus Aberg  24  LW EDM   16  2  6   8    1     2  2  0  0  1    5    1    0  31  6.5   234 14:38   1   8   0   2  0.0 0.7 NA <NA>
5  3      Noel Acciari  26   C BOS   60 10  1  11   -6     9  9  0  1  0    1    0    0  66 15.2   775 12:55  41 152  42  51 45.2 0.6 NA <NA>
6  4    Kenny Agostino  25  LW BOS    5  0  1   1   -1     4  0  0  0  0    0    1    0  11  0.0    60 12:03   1   4   0   1  0.0 0.0 NA <NA>
  Position GP.y TOI.y Goals Total.Assists First.Assists Second.Assists Total.Points Shots SH. iCF iFF iSCF iHDCF Rush.Attempts
1     <NA>   NA    NA    NA            NA            NA             NA           NA    NA  NA  NA  NA   NA    NA            NA
2     <NA>   NA    NA    NA            NA            NA             NA           NA    NA  NA  NA  NA   NA    NA            NA
3     <NA>   NA    NA    NA            NA            NA             NA           NA    NA  NA  NA  NA   NA    NA            NA
4     <NA>   NA    NA    NA            NA            NA             NA           NA    NA  NA  NA  NA   NA    NA            NA
5     <NA>   NA    NA    NA            NA            NA             NA           NA    NA  NA  NA  NA   NA    NA            NA
6     <NA>   NA    NA    NA            NA            NA             NA           NA    NA  NA  NA  NA   NA    NA            NA
  Rebounds.Created PIM.y Total.Penalties Minor Major Misconduct Penalties.Drawn Giveaways Takeaways Hits Hits.Taken Shots.Blocked Faceoffs.Won
1               NA    NA              NA    NA    NA         NA              NA        NA        NA   NA         NA            NA           NA
2               NA    NA              NA    NA    NA         NA              NA        NA        NA   NA         NA            NA           NA
3               NA    NA              NA    NA    NA         NA              NA        NA        NA   NA         NA            NA           NA
4               NA    NA              NA    NA    NA         NA              NA        NA        NA   NA         NA            NA           NA
5               NA    NA              NA    NA    NA         NA              NA        NA        NA   NA         NA            NA           NA
6               NA    NA              NA    NA    NA         NA              NA        NA        NA   NA         NA            NA           NA
  Faceoffs.Lost Faceoffs..
1            NA       <NA>
2            NA       <NA>
3            NA       <NA>
4            NA       <NA>
5            NA       <NA>
6            NA       <NA>

and so on.

I'm at my whits end here, anyone have any ideas why r won't recognize these variables as the same?

ok, so as @MichaelChirico guessed, the white spaces were encoded differently. this was found by calling charToRaw() on two variables that looked the same. charToRaw(nst[720,2]) for AJ Greer as mentioned. I fixed this by running:

nst[,2] <- gsub("\u00A0", " ", nst[,2], fixed = TRUE)

which removed the bad encoding and let me merge. Thanks to Michael for giving me the guideposts to find the problem!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM