简体繁体 English

QuantileTransformer 和 PowerTransformer 的区别

[英]Differences between QuantileTransformer and PowerTransformer

原文 2022-08-12 14:27:52 8 2 machine-learning/ scikit-learn

In sklearn, the document of QuantileTransformer says在 sklearn 中，QuantileTransformer 的文档说

This method transforms the features to follow a uniform or a normal distribution该方法将特征转换为遵循均匀或正态分布

the document of PowerTransformer says, PowerTransformer 的文件说，

Apply a power transform featurewise to make data more Gaussian-like应用幂变换特征使数据更像高斯

It seems both of them can transform features to a gaussian/normal distribution.似乎它们都可以将特征转换为高斯/正态分布。 What are the differences in terms of this aspect and when to use which?在这方面以及何时使用哪个方面有什么区别？

2 个解决方案

It is confusing terminology that they use because Gaussian and normal distribution are actually the SAME.他们使用的术语令人困惑，因为高斯分布和正态分布实际上是相同的。

QuantileTransformer and PowerTransformer are both non-linear. QuantileTransformer 和 PowerTransformer 都是非线性的。

To answer your question about what exactly is the difference it is this according to https://scikit-learn.org :要根据https://scikit-learn.org回答您关于究竟有什么区别的问题：

"QuantileTransformer provides non-linear transformations in which distances between marginal outliers and inliers are shrunk. PowerTransformer provides non-linear transformations in which data is mapped to a normal distribution to stabilize variance and minimize skewness. " “QuantileTransformer 提供非线性变换，其中边缘异常值和内点之间的距离缩小。PowerTransformer 提供非线性变换，其中数据映射到正态分布以稳定方差并最小化偏度。”

Source and more info here: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#:~:text=QuantileTransformer%20provides%20non%2Dlinear%20transformations,stabilize%20variance%20and%20minimize%20skewness .来源和更多信息在这里： https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#:~:text=QuantileTransformer%20provides%20non%2Dlinear%20transformations,stabilize%20variance%20and%20minimize%20skewness .

The main difference is PowerTransformer() being parametric and QuantileTransformer() being non-parametric.主要区别在于PowerTransformer()是参数化的，而QuantileTransformer()是非参数化的。 Box-Cox or Yeo-Johnson will make your data look more 'normal' (ie less skewed and more centered) but it's often still far from the perfect gaussian. Box-Cox 或 Yeo-Johnson 将使您的数据看起来更“正常”（即不那么偏斜且更居中），但它通常与完美的高斯相距甚远。 QuantileTransformer(output_distribution='normal') results usually look much closer to gaussian, at the cost of distorting linear relationships somewhat more. QuantileTransformer(output_distribution='normal')结果通常看起来更接近高斯分布，但代价是线性关系扭曲得更多。 I believe there's no rule of thumb to decide which one would work better in a certain case, but it's worth noting you can select an optimal scaler in a pipeline when doing eg GridSearchCV() .我相信没有经验法则可以决定在特定情况下哪个会更好，但值得注意的是，在执行GridSearchCV()时，您可以在管道中使用 select 最佳缩放器。