簡體   English   中英

R中使用read.csv讀取的文件中的字符列出現問題

[英]Trouble with character column from a file read in with read.csv in r

在網站上:

http://naturalstattrick.com/teamtable.php?season=20172018&stype=2&sit=pp&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2017-10-04&td=2018-04-07

頁面底部有一個下載csv的選項。 我下載了csv文件,並將其重命名為Team Season Totals-Natural Stat Trick 2007-2008 5 vs 5(Counts).csv。 我也將csv文件放在目錄中。

我使用read.csv成功讀取了文件。

teams <- read.csv(file = "Team Season Totals - Natural Stat Trick 2007-2008 5 vs 5 (Counts).csv", stringsAsFactors = FALSE)

head(teams)
  ï..                 Team GP      TOI  W  L OTL ROW   CF   CA   CF.   FF   FA   FF.   SF   SA   SF.  GF  GA   GF.  SCF  SCA  SCF. SCGF SCGA SCGF. SCSH.
1   1   Atlanta Thrashers 82 3539.050 34 40   8  25 2638 3512 42.89 2002 2717 42.42 1505 2052 42.31 125 172 42.09 1195 1500 44.34   83  126 39.71  6.95
2   2 Pittsburgh Penguins 82 3435.417 47 27   8  40 2820 3380 45.48 2192 2542 46.30 1580 1812 46.58 142 122 53.79 1343 1374 49.43  112   90 55.45  8.34
3   3   Los Angeles Kings 82 3502.333 32 43   7  27 3008 3576 45.69 2306 2787 45.28 1649 1961 45.68 137 174 44.05 1049 1286 44.93   63   80 44.06  6.01
4   4  Montreal Canadiens 82 3475.183 47 25  10  42 3089 3601 46.17 2266 2603 46.54 1617 1863 46.47 144 138 51.06 1156 1221 48.63   62   61 50.41  5.36
5   5     Edmonton Oilers 82 3442.633 41 35   6  26 2958 3424 46.35 2255 2585 46.59 1601 1830 46.66 143 166 46.28 1334 1398 48.83  104  116 47.27  7.80
6   6 Philadelphia Flyers 82 3374.800 42 29  11  39 2902 3343 46.47 2188 2505 46.62 1609 1857 46.42 125 137 47.71  919 1028 47.20   61   68 47.29  6.64
  SCSV. HDCF HDCA HDCF. HDGF HDGA HDGF. HDSH. HDSV.  SH.   SV.   PDO
1 91.60  388  468 45.33   51   82 38.35 13.14 82.48 8.31 91.62 0.999
2 93.45  503  444 53.12   79   49 61.72 15.71 88.96 8.99 93.27 1.023
3 93.78  270  356 43.13   29   36 44.62 10.74 89.89 8.31 91.13 0.994
4 95.00  271  322 45.70   25   31 44.64  9.23 90.37 8.91 92.59 1.015
5 91.70  443  452 49.50   57   61 48.31 12.87 86.50 8.93 90.93 0.999
6 93.39  257  266 49.14   24   24 50.00  9.34 90.98 7.77 92.62 1.004

我注意到的一件事是“團隊專欄”帶有重音:

teams$Team

[1] "Atlanta Thrashers"     "Pittsburgh Penguins"   "Los Angeles Kings"     "Montreal Canadiens"    "Edmonton Oilers"       "Philadelphia Flyers"  
 [7] "St Louis Blues"        "Colorado Avalanche"    "Vancouver Canucks"     "Minnesota Wild"        "Florida Panthers"      "Phoenix Coyotes"      
[13] "Tampa Bay Lightning"   "Buffalo Sabres"        "Chicago Blackhawks"    "New York Islanders"    "Nashville Predators"   "Anaheim Ducks"        
[19] "Boston Bruins"         "Ottawa Senators"       "Dallas Stars"          "Toronto Maple Leafs"   "Carolina Hurricanes"   "Columbus Blue Jackets"
[25] "New Jersey Devils"     "Calgary Flames"        "San Jose Sharks"       "New York Rangers"      "Washington Capitals"   "Detroit Red Wings"

去除口音:

teams$Team <- sub(pattern = "Â", replacement = "", teams$Team)
teams$Team[1]
[1] "Atlanta Thrashers"

現在,當我想基於團隊對數據進行子集化時,所有值都返回FALSE:

teams$Team[1]
[1] "Atlanta Thrashers"
teams$Team[1] == "Atlanta Thrashers"
[1] FALSE

dplyr::filter(teams, Team == "Atlanta Thrashers")

 [1] ï..   Team  GP    TOI   W     L     OTL   ROW   CF    CA    CF.   FF    FA    FF.   SF    SA    SF.   GF    GA    GF.   SCF   SCA   SCF.  SCGF  SCGA 
[26] SCGF. SCSH. SCSV. HDCF  HDCA  HDCF. HDGF  HDGA  HDGF. HDSH. HDSV. SH.   SV.   PDO  
<0 rows> (or 0-length row.names)

每個團隊都返回FALSE,我不明白為什么? 我去除的口音有問題嗎? 它是否需要對編碼進行處理,例如utf-8? 如果有人可以幫助我,我將不勝感激。 謝謝。

我想到了。 我和口音有關。 我用了:

iconv(teams$Team,, "UTF-8", "UTF-8",sub=' ')

iconv(teams$Team, "UTF-8", "UTF-8",sub=' ')[1] == "Atlanta Thrashers"

[1] TRUE

我從未發生過這種情況,也沒有編碼和utf-8的經驗。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM