简体   繁体   English

删除所有观察值具有相同值的列会影响我的模型吗?

[英]Will removing a column having same values for all observations affect my model?

One of the columns in my dataset has the same value for all observations/rows.我的数据集中的一列对于所有观察/行具有相同的值。 Should I remove that column while building a machine learning model?我应该在构建机器学习模型时删除该列吗?

Will removing this column affect my model/performance metric?删除此列会影响我的模型/性能指标吗?

If I replace all the values with a different constant value, will it change the model/performance metric?如果我用不同的常量值替换所有值,它会改变模型/性能指标吗?

If one of your column in the dataset is having the same values, you can drop this column as it will not do any help to your model to differentiate between two different labels while on the other hand, it can even negatively affect your model by creating a bias in the data.如果数据集中的一列具有相同的值,您可以删除此列,因为它对您的模型区分两个不同的标签没有任何帮助,而另一方面,它甚至可能通过创建对您的模型产生负面影响数据中的偏差。

For Example: Consider you have two different fruits, like one is Green Apple and one is Guava.例如:假设您有两种不同的水果,例如一种是青苹果,一种是番石榴。 Then, both of these fruits will have the same color ie "Green", so that basically means that you just can not differentiate both these fruits on the basis of their color, but if they have been two different colored fruits, you could have used this feature to differentiate between them.然后,这两种水果将具有相同的颜色,即“绿色”,因此基本上意味着您无法根据颜色区分这两种水果,但是如果它们是两种不同颜色的水果,您可以使用这个特性来区分它们。

Hope it helps clarifying what you should do with such a column with same set of observations.希望它有助于澄清您应该如何处理具有相同观察集的此类列。

Thanks.谢谢。

A Machine Learning Model is nothing but a mathematical equation ie机器学习模型只不过是一个数学方程,即

y = f(x) y = f(x)

in which其中

y = Target/Dependent Variable y = 目标/相关变量

f(x) = Independent Variables(In our case a DataFrame containing the Train/Test Data) f(x) = 自变量(在我们的例子中是一个包含训练/测试数据的 DataFrame)

So technically, ML models quantifies and estimates about for what value of X, what will the probable output y.所以从技术上讲,ML 模型量化和估计 X 的值是多少,可能的输出 y 是什么。

Assuming a single whole column is constant.假设单个整列是常数。 So, a relationship between y and f(x=constant) is meaningless because for whatever value of y, that x will remain same.因此,y 和 f(x=constant) 之间的关系是没有意义的,因为对于 y 的任何值,x 将保持不变。 No mathematical relationship is possible except for the only option that y is also an constant.除了 y 也是常数的唯一选项之外,没有任何数学关系是可能的。 Which we can safely assume isn't the case, else why else you will build a model to get a constant value.我们可以安全地假设情况并非如此,否则为什么要构建一个模型来获得恒定值。

Hence, we can safely drop any constant column, which doesn't add any variation in data to the DataFrame to save computational time, as that column won't affect y in any sense.因此,我们可以安全地删除任何常量列,它不会向 DataFrame 添加任何数据变化以节省计算时间,因为该列在任何意义上都不会影响 y。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除对于非唯一 id 具有相同列值的观察 - Removing observations that have the same column values for a non unique id 删除熊猫中所有具有同一列值而另一列具有不同值的行 - Dropping all rows in pandas having same values in one column and different values in another 通过相同的列值删除和处理熊猫中的日期 - Removing and manipulating dates in pandas by same Column values 添加具有相同列的观测值,并为每个观测值创建一个唯一的行 - Adding observations having same column and create one unique row for each observation 在Django管理员端对具有相同数据的模型的my-sql列对象进行分组 - Group my-sql column objects of a model having same data on Django admin side 为什么我的 .drop 删除了数据框中的所有值? - Why is my .drop removing all values in my dataframe? 将特定于类别的值分配为新列中的观察值 - Assigning Category specific values as observations in a new column 如果所有值都是某个字符串,则删除 pandas dataframe 中的列 - Removing a column in a pandas dataframe if all the values are a certain string Pandas 在一列中为另一列中的所有相同值添加值 - Pandas add values in one column for all the same values in another column 检查数据框列中的所有值是否相同 - Check if all values in dataframe column are the same
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM