简体   繁体   中英

Encoding strings with Sklearn pre-processing is giving an error

Im trying to use SK learn library and encode the strings so that i can run regression analysis and predict the winner but its giving me an error where the toss_winner encoding (see the image attached where toss winner is coded as 12 where competing teams are coded as 6 and 11 Output Code )

Im using a public IPL dataset and a newbie in data science so need your help and appreciate simple answers to explain:)

Code used:

from sklearn import preprocessing
encoder= preprocessing.LabelEncoder()
matchdf["Team1"]=encoder.fit_transform(matchdf["Team1"])
matchdf["Team2"]=encoder.fit_transform(matchdf["Team2"])
matchdf["match_winner"]=encoder.fit_transform(matchdf["match_winner"])
matchdf["Toss_Winner"]=encoder.fit_transform(matchdf["Toss_Winner"])  

the intent is then to find the relation to the team 1 and team2 in other columns as below code and then Building, Training & Testing the Model

matchdf.loc[matchdf["match_winner"]==matchdf["Team1"],"Team1_winning"]=1
matchdf.loc[matchdf["match_winner"]!=matchdf["Team1"],"Team1_winning"]=0

#outcome variable team1_toss_win as a value of team1 winning the toss
matchdf.loc[matchdf["Toss_Winner"]==matchdf["Team1"],"Team1_toss_winning"]=1
matchdf.loc[matchdf["Toss_Winner"]!=matchdf["Team1"],"Team1_toss_winning"]=0

I don't understand very well your use of the fit_transform method of LabelEncoder as I thought that each fit would erase previously memorized labels. I can't say if its a bug or what.. Maybe your input has already the problem you exhibit, that is the match winner is already not in the list of participants? Maybe it is a ill formated string (with trailing spaces or something?)

So I propose to instead first fit the LabelEncoder with all possible labels then transform the columns:

from sklearn import preprocessing
encoder= preprocessing.LabelEncoder()
team_values = matchdf[["Team1", "Team2"]].values.ravel()
unique_team_values =  pd.unique(team_values)
encoder.fit(team_values)
matchdf["Team1"]=encoder.transform(matchdf["Team1"].values)
matchdf["Team2"]=encoder.transform(matchdf["Team2"].values)
matchdf["match_winner"]=encoder.transform(matchdf["match_winner"].values)
matchdf["Toss_Winner"]=encoder.transform(matchdf["Toss_Winner"].values)  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM