将Pandas数据框列中的列表拆分为多列

Question

I am working with movie data and have a dataframe column for movie genre. 我正在处理电影数据，并且具有电影类型的数据框列。 Currently the column contains a list of movie genres for each movie (as most movies are assigned to multiple genres), but for the purpose of this analysis, I would like to parse the list and create a new dataframe column for each genre. 当前，该列包含每个电影的电影流派列表（因为大多数电影都分配给多个流派），但是出于分析目的，我想解析该列表并为每个流派创建一个新的数据框列。 So instead of having genre=['Drama','Thriller'] for a given movie, I would have two columns, something like genre1='Drama' and genre2='Thriller'. 因此，与给定电影的流派= ['Drama'，'Thriller']相比，我将拥有两列，例如genre1 ='Drama'和genre2 ='Thriller'。

Here is a snippet of my data: 这是我的数据的摘要：

{'color': {0: [u'Color::(Technicolor)'],
  1: [u'Color::(Technicolor)'],
  2: [u'Color::(Technicolor)'],
  3: [u'Color::(Technicolor)'],
  4: [u'Black and White']},
 'country': {0: [u'USA'],
  1: [u'USA'],
  2: [u'USA'],
  3: [u'USA', u'UK'],
  4: [u'USA']},
 'genre': {0: [u'Crime', u'Drama'],
  1: [u'Crime', u'Drama'],
  2: [u'Crime', u'Drama'],
  3: [u'Action', u'Crime', u'Drama', u'Thriller'],
  4: [u'Crime', u'Drama']},
 'language': {0: [u'English'],
  1: [u'English', u'Italian', u'Latin'],
  2: [u'English', u'Italian', u'Spanish', u'Latin', u'Sicilian'],
  3: [u'English', u'Mandarin'],
  4: [u'English']},
 'rating': {0: 9.3, 1: 9.2, 2: 9.0, 3: 9.0, 4: 8.9},
 'runtime': {0: [u'142'],
  1: [u'175'],
  2: [u'202', u'220::(The Godfather Trilogy 1901-1980 VHS Special Edition)'],
  3: [u'152'],
  4: [u'96']},
 'title': {0: u'The Shawshank Redemption',
  1: u'The Godfather',
  2: u'The Godfather: Part II',
  3: u'The Dark Knight',
  4: u'12 Angry Men'},
 'votes': {0: 1793199, 1: 1224249, 2: 842044, 3: 1774083, 4: 484061},
 'year': {0: 1994, 1: 1972, 2: 1974, 3: 2008, 4: 1957}}

Any help would be greatly appreciated! 任何帮助将不胜感激！ Thanks! 谢谢！

Answer 1

I think you need DataFrame constructor with add_prefix and last concat to original: 我认为你需要DataFrame与构造add_prefix和最后concat以原文：

df1 = pd.DataFrame(df.genre.values.tolist()).add_prefix('genre_')
df = pd.concat([df.drop('genre',axis=1), df1], axis=1)

Timings : 时间：

df = pd.DataFrame(d)
print (df)
#5000 rows 
df = pd.concat([df]*1000).reset_index(drop=True)

In [394]: %timeit (pd.concat([df.drop('genre',axis=1), pd.DataFrame(df.genre.values.tolist()).add_prefix('genre_')], axis=1))
100 loops, best of 3: 3.4 ms per loop

In [395]: %timeit (pd.concat([df.drop(['genre'],axis=1),df['genre'].apply(pd.Series).rename(columns={0:'genre_0',1:'genre_1',2:'genre_2',3:'genre_3'})],axis=1))
1 loop, best of 3: 757 ms per loop

Answer 2

这应该为您工作：

pd.concat([df.drop(['genre'],axis=1),df['genre'].apply(pd.Series).rename(columns={0:'genre_0',1:'genre_1',2:'genre_2',3:'genre_3'})],axis=1)

将Pandas数据框列中的列表拆分为多列

问题描述

2 个解决方案

解决方案1
2 2017-04-04 14:19:46

解决方案2
0 已采纳 2017-04-04 14:21:54

将Pandas数据框列中的列表拆分为多列

问题描述

2 个解决方案

解决方案1 2 2017-04-04 14:19:46

解决方案2 0 已采纳 2017-04-04 14:21:54

解决方案1
2 2017-04-04 14:19:46

解决方案2
0 已采纳 2017-04-04 14:21:54