Pandas DataFrame: Converting Column of String into Column of Lists

Question

I currently have a dataframe which contains several columns like this below:

print(df.WIN_COUNTRY_CODE[180:200])

           WIN_COUNTRY_CODE
180                        IT
181                        IT
182                        ES
183    DE---UK---UK---UK---UK
184         UK---UK---UK---UK
185         DE---UK---UK---UK
186    UK---UK---DE---UK---UK
187                        SI
188                        UK
189                        FR

Each cells of the column contain country codes, which can be more than one for each record. Since I would like to convert the country code from 2-letter into 3-letter iso code and also calculate the appearance frequency for this country, i apply this code:

1. I split the string by the 3-dash that separates the countrycodes to convert from string to list:

df['WIN_COUNTRY_CODE_2'] = df['WIN_COUNTRY_CODE'].str.split("---")

This results in the column to be like this:

print(df.WIN_COUNTRY_CODE[180:200])

           WIN_COUNTRY_CODE
180                            ['IT']
181                            ['IT']
182                            ['ES']
183    ['DE', 'UK', 'UK', 'UK', 'UK']
184          ['UK', 'UK', 'UK', 'UK']
185          ['DE', 'UK', 'UK', 'UK']
186    ['UK', 'UK', 'DE', 'UK', 'UK']
187                            ['SI']
188                            ['UK']
189                            ['FR']

2. I apply the mapping method to convert from 2-letter to 3-letter country codes from conversion table that (cattable) and make it a dictionary type (catdict)

catdict= dict([(iso2,iso3) for iso2,iso3 in zip(cattable['iso_2_codes'], cattable['iso_3_codes'])])
df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE_2])

However whenever I apply the mapping it always return me this statement:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-df7aad8ca868> in <module>
      1 cattable = pd.ExcelFile('D:/ROBERT LIBRARIES/Documents/ISD - LKPP Project/vardesc2.xlsx').parse('WIN_COUNTRY_CODE')
      2 catdict= dict([(catnum,catdesc) for catnum,catdesc in zip(cattable['WIN_COUNTRY_CODE'], cattable['Description'])])
----> 3 df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE])

<ipython-input-13-df7aad8ca868> in <listcomp>(.0)
      1 cattable = pd.ExcelFile('D:/ROBERT LIBRARIES/Documents/ISD - LKPP Project/vardesc2.xlsx').parse('WIN_COUNTRY_CODE')
      2 catdict= dict([(catnum,catdesc) for catnum,catdesc in zip(cattable['WIN_COUNTRY_CODE'], cattable['Description'])])
----> 3 df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE])

TypeError: 'float' object is not iterable

It seems likely that the code returns an error as the entries in the WIN_COUNTRY_CODE column are still in a string format, instead of a list of strings. This I learn after inspecting the objects within the list by this code:

df.WIN_COUNTRY_CODE_2[183][0]

it always return one character instead of the 2-letter code as a string-object.

'['

whereas I expect the code to return a 'DE' object.

Question:

How to convert the WIN_COUNTRY_CODE column from a column of list into a column of list? And how can I find the most frequent country in the entire column? Thank you.

Answer 1

df1=df.copy()
df1["WIN_COUNTRY_CODE"]=df['WIN_COUNTRY_CODE'].str.split('---')
df1["Max_code"]=df1["WIN_COUNTRY_CODE"].apply(lambda x: max(set(x), key = x.count))

output

Answer 2

This might help.

df['new_WIN_COUNTRY_CODE']=df['WIN_COUNTRY_CODE'].map(lambda x: x.split("---") if "---" in x else [x])

print(df)

Pandas DataFrame: Converting Column of String into Column of Lists

Question

1. I split the string by the 3-dash that separates the countrycodes to convert from string to list:

2. I apply the mapping method to convert from 2-letter to 3-letter country codes from conversion table that (cattable) and make it a dictionary type (catdict)

Question:

2 answers

solution1
1 ACCPTED 2020-01-03 13:50:34

output

solution2
0 2020-01-03 13:41:44

Pandas DataFrame: Converting Column of String into Column of Lists

Question

1. I split the string by the 3-dash that separates the countrycodes to convert from string to list:

2. I apply the mapping method to convert from 2-letter to 3-letter country codes from conversion table that (cattable) and make it a dictionary type (catdict)

Question:

2 answers

solution1 1 ACCPTED 2020-01-03 13:50:34

output

solution2 0 2020-01-03 13:41:44

solution1
1 ACCPTED 2020-01-03 13:50:34

solution2
0 2020-01-03 13:41:44