I want to do machine learning with lightGBM in python.
I'm using pandas.DataFrame with column names in Japanese as input for learning.
Until the other day, I was able to learn without any error.
However, I had the opportunity to reinstall anaconda
, and at the same time, installed lightGBM
using conda
.
Since then, the following error has appeared.
LightGBMError: Do not support non-ASCII characters in feature name.
When I changed the column name from 0 to a natural number, I learned as usual.
This is probably because the column name is in Japanese as indicated by the error.
(This error occurs both for training with train() and learning with fit().)
I want you to know the following two points.
Why can't I use Japanese column names as before?
Is there a way to use Japanese column names as before?
The environment I am using is as follows.
OS: Windows 10 home
Coding environment: Jupyter notebook
python version: 3.7.6
lightGBM version: 2.3.1
If you know the answer to my question, please tell me.
Sorry for my poor English.
Recently, the previous code could not be run. I think it seems that I upgraded the version of lgb in the middle and then reported an error. Now I roll back 2.2.3 and return to normal.
you can clean up column names with a simple instruction:
import re
df = df.rename(columns = lambda x:re.sub('[^A-Za-z0-9_]+', '', x))
lightgbm
3.0.0 (August 2020) added support for non-ASCII feature names back to LightGBM.
Upgrade to at least lightgbm
3.0.0 (the newest version is 3.1.0).
pip install --upgrade 'lightgbm>=3.0.0'
You can test with this example code I've provided below, which was originally provided in microsoft/LightGBM#2976 . In the future, please provide a small, reproducible code sample in your question if possible.
import lightgbm
import numpy
from matplotlib import pyplot
numpy.random.seed(42)
X = numpy.random.normal(size=(1000, 3))
y = numpy.random.random(1000)
train_lgb = lightgbm.Dataset(X, y)
feature_names = ['F_零', 'F_一', 'F_二']
params = {
'boosting_type': 'gbdt',
'objective': 'regression',
'metric': 'l2',
'num_leaves': 31,
'verbose': 0,
}
print('Starting training...')
gbm = lightgbm.train(
params,
train_lgb,
num_boost_round=10,
feature_name=feature_names,
)
print('Plotting feature importances...')
ax = lightgbm.plot_importance(gbm, ignore_zero=False)
pyplot.show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.