简体   繁体   English

DataFrame Pandas-将列表的列展平为多列

[英]DataFrame Pandas - Flatten column of lists to multiple columns

Here's my problem. 这是我的问题。 I have a dataframe with x columns and y lines. 我有一个带有x列和y行的数据框。 Some columns are actually lists. 一些列实际上是列表。 I want to transform those columns to multiple columns containing single values. 我想将这些列转换为包含单个值的多个列。

An example speaks by itself : 一个例子说明了一切:

My dataframe : 我的数据框:

            ans_length ans_unigram_numbers  ...  levenshtein_dist  que_entropy
0             [19, 14]             [12, 8]  ...              9.00     3.189898
1                 [19]                [12]  ...              4.00     3.189898
2                  [0]                 [0]  ...            170.00     4.299996
3                  [0]                 [0]  ...            170.00     4.303341
4                  [0]                 [0]  ...            170.00     4.304335
5                  [0]                 [0]  ...            170.00     4.311820
28                [56]                [23]  ...             24.00     4.110291
29                 [0]                 [0]  ...             56.00     4.181720
...                ...                 ...  ...               ...          ...
1976              [24]                [11]  ...             24.00     3.084963
1977              [24]                [11]  ...             24.00     3.084963
1992  [31, 24, 32, 28]    [14, 15, 17, 11]  ...             18.75     3.292770
1993  [31, 24, 32, 28]    [14, 15, 17, 11]  ...             18.75     3.292770

[1998 rows x 9 columns]

What I expect : 我期望的是:

    ans_length_0    ans_length_1    ans_length_2    ans_length_3    \
0             19              14            
1             19                
2              0                
3              0                
4              0                
5              0                
28            56                
29             0                
1976          24                
1977          24                
1992          31              24               32             28    
1993          31              24               32             28    

ans_unigram_numbers_0   ans_unigram_numbers_1   ans_unigram_numbers_2   ans_unigram_numbers_3   \
                   12                       8           
                   12               
                   0                
                   0                
                   0                
                   0                
                   23               
                   0                
                   11               
                   11               
                   14                      15                      17                      11   
                   14                      15                      17                      11   

levenshtein_dist    que_entropy
               9       3.189898
               4       3.189898
             170       4.299996
             170       4.303341
             170       4.304335
             170        4.31182
              24       4.110291
              56        4.18172
              24       3.084963
              24       3.084963
            18.75       3.29277
            18.75       3.29277

The newly generated columns should take the name of the old one, adding an index at the end of it. 新生成的列应使用旧列的名称,并在其末尾添加索引。

I think you can use: 我认为您可以使用:

cols = ['ans_length','ans_unigram_numbers']

df1 = pd.concat([pd.DataFrame(df[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
df = pd.concat([df1, df.drop(cols, axis=1)], axis=1)

Based on @jezrael answer, I created a function that do what is asked, from a given dataframe and a given list of columns : 基于@jezrael的答案,我创建了一个函数,该函数根据给定的数据框和给定的列列表执行所要求的操作:

def flattencolumns(df1, cols):
    df = pd.concat([pd.DataFrame(df1[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
    return pd.concat([df, df1.drop(cols, axis=1)], axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM