[英]DataFrame Pandas - Flatten column of lists to multiple columns
Here's my problem. 这是我的问题。 I have a dataframe with x columns and y lines. 我有一个带有x列和y行的数据框。 Some columns are actually lists. 一些列实际上是列表。 I want to transform those columns to multiple columns containing single values. 我想将这些列转换为包含单个值的多个列。
An example speaks by itself : 一个例子说明了一切:
My dataframe : 我的数据框:
ans_length ans_unigram_numbers ... levenshtein_dist que_entropy
0 [19, 14] [12, 8] ... 9.00 3.189898
1 [19] [12] ... 4.00 3.189898
2 [0] [0] ... 170.00 4.299996
3 [0] [0] ... 170.00 4.303341
4 [0] [0] ... 170.00 4.304335
5 [0] [0] ... 170.00 4.311820
28 [56] [23] ... 24.00 4.110291
29 [0] [0] ... 56.00 4.181720
... ... ... ... ... ...
1976 [24] [11] ... 24.00 3.084963
1977 [24] [11] ... 24.00 3.084963
1992 [31, 24, 32, 28] [14, 15, 17, 11] ... 18.75 3.292770
1993 [31, 24, 32, 28] [14, 15, 17, 11] ... 18.75 3.292770
[1998 rows x 9 columns]
What I expect : 我期望的是:
ans_length_0 ans_length_1 ans_length_2 ans_length_3 \
0 19 14
1 19
2 0
3 0
4 0
5 0
28 56
29 0
1976 24
1977 24
1992 31 24 32 28
1993 31 24 32 28
ans_unigram_numbers_0 ans_unigram_numbers_1 ans_unigram_numbers_2 ans_unigram_numbers_3 \
12 8
12
0
0
0
0
23
0
11
11
14 15 17 11
14 15 17 11
levenshtein_dist que_entropy
9 3.189898
4 3.189898
170 4.299996
170 4.303341
170 4.304335
170 4.31182
24 4.110291
56 4.18172
24 3.084963
24 3.084963
18.75 3.29277
18.75 3.29277
The newly generated columns should take the name of the old one, adding an index at the end of it. 新生成的列应使用旧列的名称,并在其末尾添加索引。
I think you can use: 我认为您可以使用:
cols = ['ans_length','ans_unigram_numbers']
df1 = pd.concat([pd.DataFrame(df[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
df = pd.concat([df1, df.drop(cols, axis=1)], axis=1)
Based on @jezrael answer, I created a function that do what is asked, from a given dataframe and a given list of columns : 基于@jezrael的答案,我创建了一个函数,该函数根据给定的数据框和给定的列列表执行所要求的操作:
def flattencolumns(df1, cols):
df = pd.concat([pd.DataFrame(df1[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
return pd.concat([df, df1.drop(cols, axis=1)], axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.