简体   繁体   中英

DataFrame Pandas - Flatten column of lists to multiple columns

Here's my problem. I have a dataframe with x columns and y lines. Some columns are actually lists. I want to transform those columns to multiple columns containing single values.

An example speaks by itself :

My dataframe :

            ans_length ans_unigram_numbers  ...  levenshtein_dist  que_entropy
0             [19, 14]             [12, 8]  ...              9.00     3.189898
1                 [19]                [12]  ...              4.00     3.189898
2                  [0]                 [0]  ...            170.00     4.299996
3                  [0]                 [0]  ...            170.00     4.303341
4                  [0]                 [0]  ...            170.00     4.304335
5                  [0]                 [0]  ...            170.00     4.311820
28                [56]                [23]  ...             24.00     4.110291
29                 [0]                 [0]  ...             56.00     4.181720
...                ...                 ...  ...               ...          ...
1976              [24]                [11]  ...             24.00     3.084963
1977              [24]                [11]  ...             24.00     3.084963
1992  [31, 24, 32, 28]    [14, 15, 17, 11]  ...             18.75     3.292770
1993  [31, 24, 32, 28]    [14, 15, 17, 11]  ...             18.75     3.292770

[1998 rows x 9 columns]

What I expect :

    ans_length_0    ans_length_1    ans_length_2    ans_length_3    \
0             19              14            
1             19                
2              0                
3              0                
4              0                
5              0                
28            56                
29             0                
1976          24                
1977          24                
1992          31              24               32             28    
1993          31              24               32             28    

ans_unigram_numbers_0   ans_unigram_numbers_1   ans_unigram_numbers_2   ans_unigram_numbers_3   \
                   12                       8           
                   12               
                   0                
                   0                
                   0                
                   0                
                   23               
                   0                
                   11               
                   11               
                   14                      15                      17                      11   
                   14                      15                      17                      11   

levenshtein_dist    que_entropy
               9       3.189898
               4       3.189898
             170       4.299996
             170       4.303341
             170       4.304335
             170        4.31182
              24       4.110291
              56        4.18172
              24       3.084963
              24       3.084963
            18.75       3.29277
            18.75       3.29277

The newly generated columns should take the name of the old one, adding an index at the end of it.

I think you can use:

cols = ['ans_length','ans_unigram_numbers']

df1 = pd.concat([pd.DataFrame(df[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
df = pd.concat([df1, df.drop(cols, axis=1)], axis=1)

Based on @jezrael answer, I created a function that do what is asked, from a given dataframe and a given list of columns :

def flattencolumns(df1, cols):
    df = pd.concat([pd.DataFrame(df1[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
    return pd.concat([df, df1.drop(cols, axis=1)], axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM