I currently have a dataframe which contains several columns like this below:
print(df.WIN_COUNTRY_CODE[180:200])
WIN_COUNTRY_CODE
180 IT
181 IT
182 ES
183 DE---UK---UK---UK---UK
184 UK---UK---UK---UK
185 DE---UK---UK---UK
186 UK---UK---DE---UK---UK
187 SI
188 UK
189 FR
Each cells of the column contain country codes, which can be more than one for each record. Since I would like to convert the country code from 2-letter into 3-letter iso code and also calculate the appearance frequency for this country, i apply this code:
df['WIN_COUNTRY_CODE_2'] = df['WIN_COUNTRY_CODE'].str.split("---")
This results in the column to be like this:
print(df.WIN_COUNTRY_CODE[180:200])
WIN_COUNTRY_CODE
180 ['IT']
181 ['IT']
182 ['ES']
183 ['DE', 'UK', 'UK', 'UK', 'UK']
184 ['UK', 'UK', 'UK', 'UK']
185 ['DE', 'UK', 'UK', 'UK']
186 ['UK', 'UK', 'DE', 'UK', 'UK']
187 ['SI']
188 ['UK']
189 ['FR']
catdict= dict([(iso2,iso3) for iso2,iso3 in zip(cattable['iso_2_codes'], cattable['iso_3_codes'])])
df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE_2])
However whenever I apply the mapping it always return me this statement:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-df7aad8ca868> in <module>
1 cattable = pd.ExcelFile('D:/ROBERT LIBRARIES/Documents/ISD - LKPP Project/vardesc2.xlsx').parse('WIN_COUNTRY_CODE')
2 catdict= dict([(catnum,catdesc) for catnum,catdesc in zip(cattable['WIN_COUNTRY_CODE'], cattable['Description'])])
----> 3 df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE])
<ipython-input-13-df7aad8ca868> in <listcomp>(.0)
1 cattable = pd.ExcelFile('D:/ROBERT LIBRARIES/Documents/ISD - LKPP Project/vardesc2.xlsx').parse('WIN_COUNTRY_CODE')
2 catdict= dict([(catnum,catdesc) for catnum,catdesc in zip(cattable['WIN_COUNTRY_CODE'], cattable['Description'])])
----> 3 df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE])
TypeError: 'float' object is not iterable
It seems likely that the code returns an error as the entries in the WIN_COUNTRY_CODE column are still in a string format, instead of a list of strings. This I learn after inspecting the objects within the list by this code:
df.WIN_COUNTRY_CODE_2[183][0]
it always return one character instead of the 2-letter code as a string-object.
'['
whereas I expect the code to return a 'DE' object.
How to convert the WIN_COUNTRY_CODE
column from a column of list into a column of list? And how can I find the most frequent country in the entire column? Thank you.
This might help.
df['new_WIN_COUNTRY_CODE']=df['WIN_COUNTRY_CODE'].map(lambda x: x.split("---") if "---" in x else [x])
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.