简体   繁体   English

使用lightGBM时如何处理“不支持功能名称中的非ASCII字符”错误?

[英]How to deal with "Do not support non-ASCII characters in feature name" error when I use lightGBM?

I want to do machine learning with lightGBM in python.我想用 python 中的 lightGBM 进行机器学习。
I'm using pandas.DataFrame with column names in Japanese as input for learning.我正在使用带有日语列名的 pandas.DataFrame 作为学习输入。
Until the other day, I was able to learn without any error.直到有一天,我能够毫无错误地学习。

However, I had the opportunity to reinstall anaconda , and at the same time, installed lightGBM using conda .但是,我有机会重新安装anaconda ,并在同一时间,安装lightGBM使用conda
Since then, the following error has appeared.从那以后,出现了以下错误。

LightGBMError: Do not support non-ASCII characters in feature name. LightGBMError:不支持功能名称中的非 ASCII 字符。

When I changed the column name from 0 to a natural number, I learned as usual.当我将列名从 0 更改为自然数时,我像往常一样学习。
This is probably because the column name is in Japanese as indicated by the error.这可能是因为列名是日语,如错误所示。
(This error occurs both for training with train() and learning with fit().) (使用 train() 进行训练和使用 fit() 进行学习都会发生此错误。)

I want you to know the following two points.我想让你知道以下两点。

  • Why can't I use Japanese column names as before?为什么我不能像以前一样使用日文列名?

  • Is there a way to use Japanese column names as before?有没有办法像以前一样使用日文列名?

The environment I am using is as follows.我使用的环境如下。

OS: Windows 10 home  
Coding environment: Jupyter notebook  
python version: 3.7.6  
lightGBM version: 2.3.1  

If you know the answer to my question, please tell me.如果你知道我的问题的答案,请告诉我。
Sorry for my poor English.对不起,我的英语不好。

Recently, the previous code could not be run.最近,之前的代码无法运行。 I think it seems that I upgraded the version of lgb in the middle and then reported an error.我觉得好像是我中间升级了lgb的版本然后报错了。 Now I roll back 2.2.3 and return to normal.现在我回滚2.2.3并恢复正常。

you can clean up column names with a simple instruction:您可以使用简单的指令清理列名:

import re
df = df.rename(columns = lambda x:re.sub('[^A-Za-z0-9_]+', '', x))

lightgbm 3.0.0 (August 2020) added support for non-ASCII feature names back to LightGBM. lightgbm 3.0.0(2020 年 8 月)lightgbm添加了对非 ASCII 功能名称的支持。

Upgrade to at least lightgbm 3.0.0 (the newest version is 3.1.0).至少升级到lightgbm 3.0.0(最新版本是 3.1.0)。

pip install --upgrade 'lightgbm>=3.0.0'

You can test with this example code I've provided below, which was originally provided in microsoft/LightGBM#2976 .您可以使用我在下面提供的示例代码进行测试,该代码最初在microsoft/LightGBM#2976 中提供 In the future, please provide a small, reproducible code sample in your question if possible.将来,如果可能,请在您的问题中提供一个小的、可重现的代码示例。

import lightgbm
import numpy
from matplotlib import pyplot

numpy.random.seed(42)

X = numpy.random.normal(size=(1000, 3))
y = numpy.random.random(1000)

train_lgb = lightgbm.Dataset(X, y)

feature_names = ['F_零', 'F_一', 'F_二']

params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': 'l2',
    'num_leaves': 31,
    'verbose': 0,
}

print('Starting training...')
gbm = lightgbm.train(
    params,
    train_lgb,
    num_boost_round=10,
    feature_name=feature_names,
)

print('Plotting feature importances...')
ax = lightgbm.plot_importance(gbm, ignore_zero=False)
pyplot.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM