[英]Trouble with character column from a file read in with read.csv in r
在網站上:
頁面底部有一個下載csv的選項。 我下載了csv文件,並將其重命名為Team Season Totals-Natural Stat Trick 2007-2008 5 vs 5(Counts).csv。 我也將csv文件放在目錄中。
我使用read.csv成功讀取了文件。
teams <- read.csv(file = "Team Season Totals - Natural Stat Trick 2007-2008 5 vs 5 (Counts).csv", stringsAsFactors = FALSE)
head(teams)
ï.. Team GP TOI W L OTL ROW CF CA CF. FF FA FF. SF SA SF. GF GA GF. SCF SCA SCF. SCGF SCGA SCGF. SCSH.
1 1 Atlanta Thrashers 82 3539.050 34 40 8 25 2638 3512 42.89 2002 2717 42.42 1505 2052 42.31 125 172 42.09 1195 1500 44.34 83 126 39.71 6.95
2 2 Pittsburgh Penguins 82 3435.417 47 27 8 40 2820 3380 45.48 2192 2542 46.30 1580 1812 46.58 142 122 53.79 1343 1374 49.43 112 90 55.45 8.34
3 3 Los Angeles Kings 82 3502.333 32 43 7 27 3008 3576 45.69 2306 2787 45.28 1649 1961 45.68 137 174 44.05 1049 1286 44.93 63 80 44.06 6.01
4 4 Montreal Canadiens 82 3475.183 47 25 10 42 3089 3601 46.17 2266 2603 46.54 1617 1863 46.47 144 138 51.06 1156 1221 48.63 62 61 50.41 5.36
5 5 Edmonton Oilers 82 3442.633 41 35 6 26 2958 3424 46.35 2255 2585 46.59 1601 1830 46.66 143 166 46.28 1334 1398 48.83 104 116 47.27 7.80
6 6 Philadelphia Flyers 82 3374.800 42 29 11 39 2902 3343 46.47 2188 2505 46.62 1609 1857 46.42 125 137 47.71 919 1028 47.20 61 68 47.29 6.64
SCSV. HDCF HDCA HDCF. HDGF HDGA HDGF. HDSH. HDSV. SH. SV. PDO
1 91.60 388 468 45.33 51 82 38.35 13.14 82.48 8.31 91.62 0.999
2 93.45 503 444 53.12 79 49 61.72 15.71 88.96 8.99 93.27 1.023
3 93.78 270 356 43.13 29 36 44.62 10.74 89.89 8.31 91.13 0.994
4 95.00 271 322 45.70 25 31 44.64 9.23 90.37 8.91 92.59 1.015
5 91.70 443 452 49.50 57 61 48.31 12.87 86.50 8.93 90.93 0.999
6 93.39 257 266 49.14 24 24 50.00 9.34 90.98 7.77 92.62 1.004
我注意到的一件事是“團隊專欄”帶有重音:
teams$Team
[1] "Atlanta Thrashers" "Pittsburgh Penguins" "Los Angeles Kings" "Montreal Canadiens" "Edmonton Oilers" "Philadelphia Flyers"
[7] "St Louis Blues" "Colorado Avalanche" "Vancouver Canucks" "Minnesota Wild" "Florida Panthers" "Phoenix Coyotes"
[13] "Tampa Bay Lightning" "Buffalo Sabres" "Chicago Blackhawks" "New York Islanders" "Nashville Predators" "Anaheim Ducks"
[19] "Boston Bruins" "Ottawa Senators" "Dallas Stars" "Toronto Maple Leafs" "Carolina Hurricanes" "Columbus Blue Jackets"
[25] "New Jersey Devils" "Calgary Flames" "San Jose Sharks" "New York Rangers" "Washington Capitals" "Detroit Red Wings"
去除口音:
teams$Team <- sub(pattern = "Â", replacement = "", teams$Team)
teams$Team[1]
[1] "Atlanta Thrashers"
現在,當我想基於團隊對數據進行子集化時,所有值都返回FALSE:
teams$Team[1]
[1] "Atlanta Thrashers"
teams$Team[1] == "Atlanta Thrashers"
[1] FALSE
dplyr::filter(teams, Team == "Atlanta Thrashers")
[1] ï.. Team GP TOI W L OTL ROW CF CA CF. FF FA FF. SF SA SF. GF GA GF. SCF SCA SCF. SCGF SCGA
[26] SCGF. SCSH. SCSV. HDCF HDCA HDCF. HDGF HDGA HDGF. HDSH. HDSV. SH. SV. PDO
<0 rows> (or 0-length row.names)
每個團隊都返回FALSE,我不明白為什么? 我去除的口音有問題嗎? 它是否需要對編碼進行處理,例如utf-8? 如果有人可以幫助我,我將不勝感激。 謝謝。
我想到了。 我和口音有關。 我用了:
iconv(teams$Team,, "UTF-8", "UTF-8",sub=' ')
iconv(teams$Team, "UTF-8", "UTF-8",sub=' ')[1] == "Atlanta Thrashers"
[1] TRUE
我從未發生過這種情況,也沒有編碼和utf-8的經驗。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.