I'm having a new issue where all my attempts to join data tables in R result in NA. I'm reasonably sure that this must be because of my join columns being different but I'm unsure how.
Data comes from CSVs, was originally factor but I've also tried converting it to character and joining.
Samples of data and what I've tried below
str(nst)
'data.frame': 890 obs. of 33 variables:
$ X : logi NA NA NA NA NA NA ...
$ Player : chr "Connor McDavid" "Claude Giroux" "Nikita Kucherov" "Evgeni Malkin" ...
$ Team : Factor w/ 88 levels "ANA","ANA, MTL",..: 42 73 82 74 32 60 49 74 87 74 ...
$ Position : Factor w/ 7 levels "C","C, L","C, R",..: 1 1 7 1 1 5 1 7 7 1 ...
$ GP : int 82 82 80 78 74 76 82 82 81 82 ...
$ TOI : num 1767 1670 1586 1481 1473 ...
$ Goals : int 41 34 39 42 39 39 35 34 23 29 ...
str(hockey_ref)
'data.frame': 1035 obs. of 28 variables:
$ Rk : int 1 2 2 2 3 4 5 6 7 7 ...
$ Player: chr "Justin Abdelkader" "Pontus Aberg" "Pontus Aberg" "Pontus Aberg" ...
$ Age : int 30 24 24 24 26 25 20 21 26 26 ...
$ Pos : Factor w/ 5 levels "C","D","LW","RW",..: 3 3 3 3 1
what I've tried:
merge1 <- merge(hockey_ref,nst,by.x='Player',by.y='Player',all=TRUE)
creates
head(merge1)
Player GP PIM TOI Rk Age Pos Tm G A PTS X... EV PP SH GW EV.1 PP.1 SH.1 S S. ATOI BLK HIT FOW FOL FO. PS X Team
1 A.J. Greer 17 29 126.0000 315 21 LW COL 0 3 3 2 0 0 0 0 3 0 0 13 0.0 7:24 5 30 1 2 33.3 0.2 NA <NA>
2 A.J. Greer 17 29 125.6833 NA NA <NA> <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA <NA> NA NA NA NA NA NA NA COL
3 Aaron Ekblad 82 71 1918.0000 227 21 D FLA 16 22 38 9 11 5 0 4 16 6 0 189 8.5 23:23 121 69 0 0 NA 7.8 NA <NA>
4 Aaron Ekblad 82 71 1917.9000 NA NA <NA> <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA <NA> NA NA NA NA NA NA NA FLA
merge1 <- left_join(hockey_ref, nst, by = c("Player"="Player"))
creates
head(merge1)
Rk Player Age Pos Tm GP.x G A PTS X... PIM.x EV PP SH GW EV.1 PP.1 SH.1 S S. TOI.x ATOI BLK HIT FOW FOL FO. PS X Team
1 1 Justin Abdelkader 30 LW DET 75 13 22 35 -11 78 9 4 0 0 17 5 0 110 11.8 1241 16:33 40 174 47 50 48.5 2.5 NA <NA>
2 2 Pontus Aberg 24 LW TOT 53 4 12 16 9 10 4 0 0 3 11 1 0 70 5.7 645 12:10 8 24 4 8 33.3 1.3 NA <NA>
3 2 Pontus Aberg 24 LW NSH 37 2 6 8 8 8 2 0 0 2 6 0 0 39 5.1 411 11:06 7 16 4 6 40.0 0.6 NA <NA>
4 2 Pontus Aberg 24 LW EDM 16 2 6 8 1 2 2 0 0 1 5 1 0 31 6.5 234 14:38 1 8 0 2 0.0 0.7 NA <NA>
5 3 Noel Acciari 26 C BOS 60 10 1 11 -6 9 9 0 1 0 1 0 0 66 15.2 775 12:55 41 152 42 51 45.2 0.6 NA <NA>
6 4 Kenny Agostino 25 LW BOS 5 0 1 1 -1 4 0 0 0 0 0 1 0 11 0.0 60 12:03 1 4 0 1 0.0 0.0 NA <NA>
Position GP.y TOI.y Goals Total.Assists First.Assists Second.Assists Total.Points Shots SH. iCF iFF iSCF iHDCF Rush.Attempts
1 <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA NA
4 <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA NA
5 <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA NA
6 <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Rebounds.Created PIM.y Total.Penalties Minor Major Misconduct Penalties.Drawn Giveaways Takeaways Hits Hits.Taken Shots.Blocked Faceoffs.Won
1 NA NA NA NA NA NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA NA NA
4 NA NA NA NA NA NA NA NA NA NA NA NA NA
5 NA NA NA NA NA NA NA NA NA NA NA NA NA
6 NA NA NA NA NA NA NA NA NA NA NA NA NA
Faceoffs.Lost Faceoffs..
1 NA <NA>
2 NA <NA>
3 NA <NA>
4 NA <NA>
5 NA <NA>
6 NA <NA>
and so on.
I'm at my whits end here, anyone have any ideas why r won't recognize these variables as the same?
ok, so as @MichaelChirico guessed, the white spaces were encoded differently. this was found by calling charToRaw() on two variables that looked the same. charToRaw(nst[720,2]) for AJ Greer as mentioned. I fixed this by running:
nst[,2] <- gsub("\u00A0", " ", nst[,2], fixed = TRUE)
which removed the bad encoding and let me merge. Thanks to Michael for giving me the guideposts to find the problem!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.