简体   繁体   中英

Split (explode) pandas dataframe string entry to separate rows. Multiple columns

I have a dataframe that looks like this:包含我的数据框的前五行

I need to replace "European Union" and split (explode) into the countries that are member of it like the following example:数据框应如下图所示

I have tried to replace "European Union" for a dictionary containing its members, and then spliting it with the following line of code:

test_disc['countryname'] = test_disc['countryname'].replace({'European Union': 'Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland,Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands,Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden'})

test_disc[['iso_2', 'iso_3', 'countryname', 'país afetado','year',
       'SPS emergenciais', 'SPS regulares']].astype(str).apply(lambda x: 
       x.str.split(',').explode()).reset_index()

However, I have been getting the following error: "ValueError: cannot reindex from a duplicate axis"

when you use explode , you should only convert the target column to list content, not all columns.


demo data

data = [{'iso_2': 0, 'iso_3': 'NaN', 'countryname': 'JP', 'país afetado': 'US', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 1, 'iso_3': 'NaN', 'countryname': 'European Union', 'país afetado': 'China', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 2, 'iso_3': 'NaN', 'countryname': 'US', 'país afetado': 'European Union', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 3, 'iso_3': 'NaN', 'countryname': 'European Union', 'país afetado': 'European Union', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}]
df = pd.DataFrame(data)
df

       iso_2 iso_3     countryname    país afetado  year  SPS emergenciais  \
    0      0   NaN              JP              US  2015                 0   
    1      1   NaN  European Union           China  2015                 0   
    2      2   NaN              US  European Union  2015                 0   
    3      3   NaN  European Union  European Union  2015                 0   

       SPS regulares  
    0              0  
    1              0  
    2              0  
    3              0  

process:

for col in ['país afetado', 'countryname']:
    df[col] = df[col].replace({'European Union': 'Austria, Belgium, Netherlands,Poland'})
    df[col] = df[col].str.split(',\s*')

df_result = df.explode('countryname').explode('país afetado')

result:

   iso_2 iso_3  countryname país afetado  year  SPS emergenciais  
0      0   NaN           JP           US  2015                 0   
1      1   NaN      Austria        China  2015                 0   
1      1   NaN      Belgium        China  2015                 0   
1      1   NaN  Netherlands        China  2015                 0   
1      1   NaN       Poland        China  2015                 0   
2      2   NaN           US      Austria  2015                 0   
2      2   NaN           US      Belgium  2015                 0   
2      2   NaN           US  Netherlands  2015                 0   
2      2   NaN           US       Poland  2015                 0   
3      3   NaN      Austria      Austria  2015                 0   
3      3   NaN      Austria      Belgium  2015                 0   
3      3   NaN      Austria  Netherlands  2015                 0   
3      3   NaN      Austria       Poland  2015                 0   
3      3   NaN      Belgium      Austria  2015                 0   
3      3   NaN      Belgium      Belgium  2015                 0   
3      3   NaN      Belgium  Netherlands  2015                 0   
3      3   NaN      Belgium       Poland  2015                 0   
3      3   NaN  Netherlands      Austria  2015                 0   
3      3   NaN  Netherlands      Belgium  2015                 0   
3      3   NaN  Netherlands  Netherlands  2015                 0   
3      3   NaN  Netherlands       Poland  2015                 0   
3      3   NaN       Poland      Austria  2015                 0   
3      3   NaN       Poland      Belgium  2015                 0   
3      3   NaN       Poland  Netherlands  2015                 0   
3      3   NaN       Poland       Poland  2015                 0  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM