![](/img/trans.png)
[英]How to create a Pandas DataFrame from a list of lists with different lengths?
[英]Create pandas dataframe from list of lists, but there are different seperators
我有一份清单清单:
[['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
我想最终得到一个包含这些列的pandas数据帧。
cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']
对于'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance'
栏目,数据将为1或0。
我试过了:
for row in movies_list:
for element in row:
if '|' in element:
element = element.split('|')
然而,原始列表没有任何反应..完全难倒在这里。
使用带有str.get_dummies
DataFrame
构造str.get_dummies
:
L = [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
df = pd.DataFrame(L, columns=['MovieID','Name','Data'])
df1 = df['Data'].str.get_dummies()
print (df1)
Adventure Animation Children's Comedy Fantasy Romance
0 0 1 1 1 0 0
1 1 0 1 0 1 0
2 0 0 0 1 0 1
对于列, Name
和Year
需要split
和rstrip
for remove trailing )
, Year
也会转换为int
。
df[['Name','Year']] = df['Name'].str.split('\s\(', expand=True)
df['Year'] = df['Year'].str.rstrip(')').astype(int)
最后删除列Data
并通过join
将df1
添加到original:
df = df.drop('Data', axis=1).join(df1)
print (df)
MovieID Name Year Adventure Animation Children's Comedy \
0 1 Toy Story 1995 0 1 1 1
1 2 Jumanji 1995 1 0 1 0
2 3 Grumpier Old Men 1995 0 0 0 1
Fantasy Romance
0 0 0
1 1 0
2 0 1
这是我的版本,对于一行答案还不够好,但希望它可以帮到你!
import pandas as pd
import numpy as np
data = [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']
final = []
for x in data:
output = []
output.append(x[0])
output.append(x[1].split("(")[0].lstrip().rstrip())
output.append(x[1].split("(")[1][:4])
for h in ['Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']:
output.append(h in x[2])
final.append(output)
df = pd.DataFrame(final, columns=cols)
print(df)
OUTPUT:
MovieID Name Year Adventure Children Comedy Fantasy \
0 1 Toy Story 1995 False True True False
1 2 Jumanji 1995 True True False True
2 3 Grumpier Old Men 1995 False False True False
Romance
0 False
1 False
2 True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.