简体   繁体   English

从其他 pandas 列创建新列

[英]Creating New columns from other pandas column

I would like to create a new Column from the genres column.我想从流派列创建一个新 The genres column contains one or multiple genres and I would like to create a column for each genre name.流派列包含一个或多个流派,我想为每个流派名称创建一个列。 Then, I would like to fill in 1 and 0 in each column depending on whether they have the genre.然后,我想根据他们是否有流派,在每列中填写 1 和 0。

第一张图片

Dataframe should look like in the image below.数据框应如下图所示。

以下

I don't have any clue on this.我对此一无所知。

Using one hot encoder or pandas dummies function straight away didn't work as I got something like this立即使用一个热编码器或熊猫假人功能不起作用,因为我得到了这样的东西

就在这儿

I don't need something like this我不需要这样的东西

It looks like the values in the Genre column were one-hot encoded.看起来Genre列中的值是一次性编码的。 One-hot encoding is also know as referred to as creating dummy variables. One-hot 编码也称为创建虚拟变量。

Pandas has a function pd.get_dummies() that should enable you one-hot encode the Genre column. Pandas 有一个函数pd.get_dummies()可以让你对Genre列进行一次热编码。 Pass in your data frame and use the columns parameter to select the Genre column.传入您的数据框并使用columns参数来选择Genre列。

See the function documentation and other options here: https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html在此处查看函数文档和其他选项: https ://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html

You can use CategoricalDtype as below:您可以使用CategoricalDtype如下:

import pandas as pd
from pandas.api.types import CategoricalDtype

df = pd.DataFrame({'country': ['Brazil', 'Australia', 
'Canada','Brazil','Germany']})

pd.get_dummies(df,prefix=['country'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM