I have a dataframe that looks like this:
I need to replace "European Union" and split (explode) into the countries that are member of it like the following example:
I have tried to replace "European Union" for a dictionary containing its members, and then spliting it with the following line of code:
test_disc['countryname'] = test_disc['countryname'].replace({'European Union': 'Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland,Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands,Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden'})
test_disc[['iso_2', 'iso_3', 'countryname', 'país afetado','year',
'SPS emergenciais', 'SPS regulares']].astype(str).apply(lambda x:
x.str.split(',').explode()).reset_index()
However, I have been getting the following error: "ValueError: cannot reindex from a duplicate axis"
when you use explode
, you should only convert the target column to list content, not all columns.
demo data
data = [{'iso_2': 0, 'iso_3': 'NaN', 'countryname': 'JP', 'país afetado': 'US', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 1, 'iso_3': 'NaN', 'countryname': 'European Union', 'país afetado': 'China', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 2, 'iso_3': 'NaN', 'countryname': 'US', 'país afetado': 'European Union', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 3, 'iso_3': 'NaN', 'countryname': 'European Union', 'país afetado': 'European Union', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}]
df = pd.DataFrame(data)
df
iso_2 iso_3 countryname país afetado year SPS emergenciais \
0 0 NaN JP US 2015 0
1 1 NaN European Union China 2015 0
2 2 NaN US European Union 2015 0
3 3 NaN European Union European Union 2015 0
SPS regulares
0 0
1 0
2 0
3 0
process:
for col in ['país afetado', 'countryname']:
df[col] = df[col].replace({'European Union': 'Austria, Belgium, Netherlands,Poland'})
df[col] = df[col].str.split(',\s*')
df_result = df.explode('countryname').explode('país afetado')
result:
iso_2 iso_3 countryname país afetado year SPS emergenciais
0 0 NaN JP US 2015 0
1 1 NaN Austria China 2015 0
1 1 NaN Belgium China 2015 0
1 1 NaN Netherlands China 2015 0
1 1 NaN Poland China 2015 0
2 2 NaN US Austria 2015 0
2 2 NaN US Belgium 2015 0
2 2 NaN US Netherlands 2015 0
2 2 NaN US Poland 2015 0
3 3 NaN Austria Austria 2015 0
3 3 NaN Austria Belgium 2015 0
3 3 NaN Austria Netherlands 2015 0
3 3 NaN Austria Poland 2015 0
3 3 NaN Belgium Austria 2015 0
3 3 NaN Belgium Belgium 2015 0
3 3 NaN Belgium Netherlands 2015 0
3 3 NaN Belgium Poland 2015 0
3 3 NaN Netherlands Austria 2015 0
3 3 NaN Netherlands Belgium 2015 0
3 3 NaN Netherlands Netherlands 2015 0
3 3 NaN Netherlands Poland 2015 0
3 3 NaN Poland Austria 2015 0
3 3 NaN Poland Belgium 2015 0
3 3 NaN Poland Netherlands 2015 0
3 3 NaN Poland Poland 2015 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.