简体   繁体   English

将不同列中的列值拆分为 1 或 0 的最佳方法(一种热编码)

[英]optimal way to split values of a column in different columns as 1 or 0 (one hot encoding)

I have a pandas column named coverage , the values can be:我有一个名为coverage的 pandas 列,值可以是:

'DAMAGE', 'DAMAGE-THEFT', 'DAMAGE-THEFT-WARRANTY_EXTENSION', 'DAMAGE-FRAUDULENT_USE', etc. 'DAMAGE'、'DAMAGE-THEFT'、'DAMAGE-THEFT-WARRANTY_EXTENSION'、'DAMAGE-FRAUDULENT_USE'等。

What should be the optimal way to have a column named DAMAGE, another column named THEFT, another named WARRANTY_EXTENSION and another named FRAUDULENT_USE and add for each row 1 or 0 in case it has that type of coverage or not.拥有一个名为 DAMAGE 的列、另一个名为 THEFT 的列、另一个名为 WARRANTY_EXTENSION 和另一个名为 FRAUDULENT_USE 的列并为每一行添加 1 或 0 以防它是否具有这种类型的覆盖范围的最佳方式应该是什么。

I thought about creating a lambda function, but I thing would need to do every time:我想过创建一个 lambda function,但我每次都需要做:

df['DAMAGE'] = df.apply (lambda row: my_function_to_split(row), axis=1)
df['THEFT'] = df.apply (lambda row: my_function_to_split(row), axis=1)
etc...

thanks in advance提前致谢

I think the method you're looking for is this one我认为您正在寻找的方法是这个

So, if you have a dataframe with multiple columns and want to apply this method to only some of them, you can do:因此,如果您有一个包含多列的 dataframe 并且只想将此方法应用于其中一些列,您可以执行以下操作:

import pandas as pd
names = ["a", "b", "a", "c"]
df = pd.DataFrame({"name": names, "value": list(range(len(names)))})
pd.get_dummies(df, columns=["name"])

[EDIT] [编辑]

The question is trickier, but you can solve it like that:这个问题比较棘手,但你可以这样解决:

import pandas as pd
df = pd.DataFrame({"name": ["a-b", "b-c", "a", "a-c", "c"]})
df["name"].str.get_dummies(sep="-")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM