[英]How can I install XGBoost package in python on Windows
I tried to install XGBoost package in python.我尝试在 python 中安装 XGBoost 包。 I am using windows os, 64bits .我正在使用 windows os, 64bits 。 I have gone through following.我经历了以下。
The package directory states that xgboost is unstable for windows and is disabled: pip installation on windows is currently disabled for further invesigation, please install from github.包目录指出xgboost在windows下不稳定,已禁用:windows上pip安装目前已禁用以进一步调查,请从github安装。 https://pypi.python.org/pypi/xgboost/ https://pypi.python.org/pypi/xgboost/
I am not well versed in Visual Studio, facing problem building XGBoost.我不太精通 Visual Studio,在构建 XGBoost 时遇到了问题。 I am missing opportunities to utilize xgboost package in data science.我错过了在数据科学中使用 xgboost 包的机会。
Please guide, so that I can import the XGBoost package in python.请指导,以便我可以在python中导入XGBoost包。
Thanks谢谢
If you are using anaconda
(or miniconda
) you can use the following:如果您使用anaconda
(或miniconda
),您可以使用以下内容:
conda install -c anaconda py-xgboost
UPDATED 2019-09-20 conda install -c anaconda py-xgboost
更新 2019-09-20Check install by:检查安装:
conda list
运行conda list
To activate an environment : 激活环境:
On Windows, in your Anaconda Prompt, run (assumes your environment is named myenv
):在 Windows 上,在您的 Anaconda Prompt 中,运行(假设您的环境名为myenv
):
activate myenv
On macOS and Linux, in your Terminal Window, run (assumes your environment is named myenv
):在 macOS 和 Linux 上,在您的终端窗口中,运行(假设您的环境名为myenv
):
source activate myenv
Conda prepends the path name myenv onto your system command. Conda 将路径名 myenv 附加到您的系统命令中。
Build it from here:从这里构建它:
You first need to build the library through "make", then you can install using anaconda prompt (if you want it on anaconda) or git bash (if you use it in Python only).您首先需要通过“make”构建库,然后您可以使用 anaconda prompt(如果您希望在 anaconda 上使用)或 git bash(如果您仅在 Python 中使用它)进行安装。
First follow the official guide with the following procedure (in Git Bash on Windows):首先按照以下步骤(在 Windows 上的 Git Bash 中) 按照官方指南进行操作:
git clone --recursive https://github.com/dmlc/xgboost
git submodule init
git submodule update
then install TDM-GCC here and do the following in Git Bash:然后在此处安装 TDM-GCC并在 Git Bash 中执行以下操作:
alias make='mingw32-make'
cp make/mingw64.mk config.mk; make -j4
Last, do the following using anaconda prompt or Git Bash:最后,使用 anaconda prompt 或 Git Bash 执行以下操作:
cd xgboost\python-package
python setup.py install
Also refer to these great resources:另请参阅这些重要资源:
Installing Xgboost on Windows 在 Windows 上安装 Xgboost
Installing XGBoost For Anaconda on Windows 在 Windows 上为 Anaconda 安装 XGBoost
You can pip install catboost.您可以 pip install catboost。 It is a recently open-sourced gradient boosting library, which is in most cases more accurate and faster than XGBoost, and it has categorical features support.它是最近开源的梯度提升库,在大多数情况下比 XGBoost 更准确、更快,并且具有分类特征支持。 Here is the site of the library: https://catboost.ai这是图书馆的网站: https : //catboost.ai
The following command should work but, If you have a problem with this command以下命令应该可以工作,但是,如果此命令有问题
conda install -c conda-forge xgboost conda install -c conda-forge xgboost
First activate your environment .首先激活您的环境。 Assume your environment is named simply write in conda terminal :假设您的环境命名为在 conda 终端中简单写入:
activate <MY_ENV>
and then进而
pip install xgboost
在 macOS 上,以下命令可以运行 conda install -c conda-forge xgboost 但在此之前我已经阅读了一些其他文章,因此确实使用 brew 安装了 gcc
pip install xgboost
也适用于python 3.8 ,而上面提到的其他选项对我不起作用
I have installed xgboost in windows os following the above resources, which is not available till now in pip. 按照上述资源,我已经在Windows操作系统中安装了xgboost,到目前为止,在pip中尚不可用。 However, I tried with the following function code, to get cv parameters tuned: 但是,我尝试使用以下功能代码来调整CV参数:
#Import libraries:
import pandas as pd
import numpy as np
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import cross_validation, metrics #Additional sklearn functions
from sklearn.grid_search import GridSearchCV #Perforing grid search
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12, 4
train = pd.read_csv('train_data.csv')
target = 'target_value'
IDcol = 'ID'
A function is created to get the optimum parameters and display the output in visual form. 创建一个函数以获取最佳参数并以可视形式显示输出。
def modelfit(alg, dtrain, predictors,useTrainCV=True, cv_folds=5, early_stopping_rounds=50):
if useTrainCV:
xgb_param = alg.get_xgb_params()
xgtrain = xgb.DMatrix(dtrain[predictors].values, label=dtrain[target].values)
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], nfold=cv_folds,
metrics='auc', early_stopping_rounds=early_stopping_rounds, show_progress=False)
alg.set_params(n_estimators=cvresult.shape[0])
#Fit the algorithm on the data
alg.fit(dtrain[predictors], dtrain[target_label],eval_metric='auc')
#Predict training set:
dtrain_predictions = alg.predict(dtrain[predictors])
dtrain_predprob = alg.predict_proba(dtrain[predictors])[:,1]
#Print model report:
print "\nModel Report"
print "Accuracy : %.4g" % metrics.accuracy_score(dtrain[target_label].values, dtrain_predictions)
print "AUC Score (Train): %f" % metrics.roc_auc_score(dtrain[target_label], dtrain_predprob)
feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')
Now, when the function is called to get the optimum parameters: 现在,当调用函数以获取最佳参数时:
#Choose all predictors except target & IDcols
predictors = [x for x in train.columns if x not in [target]]
xgb = XGBClassifier(
learning_rate =0.1,
n_estimators=1000,
max_depth=5,
min_child_weight=1,
gamma=0,
subsample=0.7,
colsample_bytree=0.7,
objective= 'binary:logistic',
nthread=4,
scale_pos_weight=1,
seed=198)
modelfit(xgb, train, predictors)
Although the feature importance chart is displayed, but the parameters info in red box at the top of chart is missing: 虽然显示了功能重要性图表,但是缺少图表顶部红色框中的参数信息: Consulted people who use linux/mac OS and got xgboost installed. 咨询了使用linux / mac OS并安装xgboost的人员。 They are getting the above info. 他们正在获取以上信息。 I was wondering whether it is due to specific implementation , I build and installed in windows. 我想知道是否是由于特定的实现,所以我在Windows中构建并安装了它。 And how I can get the parameters info displayed above the chart. 以及如何获取显示在图表上方的参数信息。 As of now, I am getting the chart and not the red box and info within it. 到目前为止,我正在获取图表,而不是其中的红色框和信息。 Thanks. 谢谢。
Besides what's already on developers' github, which is building from source(creating a c++ environment, etc.), I have found an easier way to do it, which I explained here with details.除了开发人员的 github 上已有的内容(从源代码构建(创建 C++ 环境等))之外,我还找到了一种更简单的方法,我在此处详细解释了该方法。 Basically, you have to go a website by UC Irvine and download a .whl file, then cd to the folder and install xgboost with pip.基本上,您必须访问 UC Irvine 的网站并下载 .whl 文件,然后 cd 到该文件夹并使用 pip 安装 xgboost。
XGBoost is used in Applied Machine Learning and is known for its gradient boost algorithm and it is available as a library in python but has to be compiled using cmake . XGBoost 用于应用机器学习,以其梯度提升算法而闻名,它可作为 Python 中的库使用,但必须使用cmake进行编译。
Alternatively what you can do is from this link you can download the C pre-compiled library and install it using the pip install < FILE-NAME.whl> command.或者,您可以通过此链接下载 C 预编译库并使用pip install < FILE-NAME.whl>命令安装它。 Ensure you have downloaded the library which is compatible with your python version.确保您已下载与您的 Python 版本兼容的库。
I experienced this problem while I was using the same in Anaconda(Spyder).我在 Anaconda(Spyder) 中使用相同的问题时遇到了这个问题。 Then just restart the kernel and your error will go away.然后只需重新启动内核,您的错误就会消失。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.