简体   繁体   English

LinearSVC是否接受定性数据?

[英]Does LinearSVC take in qualitative data?

I am trying to predict whether a song is played using the open version or not using python, ski kit learn and the LinearSVC method. 我正在尝试预测是否使用开放版本播放歌曲,或者是否使用python,skikit kit学习和LinearSVC方法播放歌曲。

My input data: 我的输入数据:

在此处输入图片说明

I already encoded the product column as 1s and 0s (1 if open 0 if not). 我已经将product列编码为1和0(如果未打开则为0,否则为1)。

Things like context will have an impact on the product type. 诸如上下文之类的事物将对产品类型产生影响。 I was wondering if I need to make all of the categorical variables numerical for LinearSVC to handle them. 我想知道是否需要使所有分类变量成为数值,以便LinearSVC处理它们。

In general, turning categorical features into continuous features is a sub-optimal solution. 通常,将分类特征转换为连续特征是次优的解决方案。

When using a support vector machine as a classifier (or even logistic regression), there should be no issue with handling categorical features that are 0-1 encoded. 当使用支持向量机作为分类器(甚至逻辑回归)时,处理以0-1编码的分类特征应该没有问题。 In cases where you have categorical features that cannot be converted to binary (eg, your "context" column), I would recommend one-hot-encoding the data (see here first. 如果您具有无法转换为二进制的分类功能(例如,“上下文”列),我建议对数据进行一次热编码(请首先参见此处)

There might be a problem if there are too many unique entries for a particular feature. 如果特定功能的唯一条目过多,可能会出现问题。 In that case, the one-hot-encoding will produce as many features as there are unique entries, which could be computationally expensive. 在那种情况下,一键编码将产生与唯一条目一样多的特征,这可能在计算上很昂贵。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM