![](/img/trans.png)
[英]GridSearchCV on LogisticRegression in scikit-learn
[英]scikit-learn GridSearchCV() fit() performance improvement
我正在使用GridSearchCV()
及其fit()
方法來構建 model。 我目前正在進行這項工作,但想通過提供更多圖像進行訓練來提高 model 的准確性。 現在, fit()
需要一個多小時才能完成 500 張圖像。 隨着圖像數量翻倍,處理時間呈指數增長。 最終,我想在數千張圖像上進行訓練,甚至在我的概念證明中包括除了這兩個之外的其他類別。 我嘗試了幾種提高性能的方法,但無法解決。 減少處理時間的唯一方法是顯着降低train_test_split()
中的train_size
/ test_size
,但這樣做會破壞使用更大數據集進行訓練的目的。 我對這個有點難過。 下面是我用來參考的代碼。 謝謝你。
categories = ['Cat', 'Dog']
flat_data_arr = []
target_arr = []
datadir = 'C:\\Users\\Name\\Python\\images'
for i in categories:
path = os.path.join(datadir, i)
for image in os.listdir(path):
image_array = imread(os.path.join(path, image))
image_resized = resize(image_array, (150, 150, 3))
flat_data_arr.append(image_resized.flatten())
target_arr.append(categories.index(i))
flat_data = np.array(flat_data_arr)
target = np.array(target_arr)
df = pd.DataFrame(flat_data)
df['Target'] = target
x = df.iloc[:,:-1]
y = df.iloc[:,-1]
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.75, test_size=0.25, shuffle=True, stratify=y)
param_grid={'C':[0.1,1,10,100],'gamma':[0.0001,0.001,0.1,1],'kernel':['rbf','poly']}
svc=svm.SVC(probability=True)
model=GridSearchCV(svc,param_grid)
model.fit(x_train,y_train) #this takes hours depending on number of images
Probably best to use tensorflow or keras or pytorch for computer vision and with GPUs on top, this will run in mili/seconds... even without GPU you will see significant speed up.
但是,如果您決定繼續,您可以嘗試以下方法(基本上是減小尺寸並添加功能):
import Image from PIL
from PIL import Image
import numpy as np
from skimage.feature import hog
from skimage.color import rgb2grey
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
grey_scaled = rgb2grey(imread(os.path.join(path, image))..
hog_features = hog(grey_scaled, block_norm='L2-Hys', pixels_per_cell=(10,10))
color_features = imread(os.path.join(path, image).flatten()
final_features = np.hstack((color_features,hog_features))
standard_sc = StandardScaler()
matrix_scaled = standard_sc.fit_transform(matrix)
### read up on how to select # of components
### there are methods to help you with that
pca = PCA(n_components=300)
matrix_scaled_pca = pca.fit_transform(matrix_scaled)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.