[英]Is there a faster way to get the Local Binary Pattern of the MNIST dataset?
I need to know if there's a faster way to get the LBP and the resulting histograms of the MNIST dataset. 我需要知道是否有更快的方法来获取LBP和MNIST数据集的结果直方图。 This will be used for handwritten text recognition, through a model that I haven't decided yet..
这将用于手写文本识别,通过我还没有决定的模型..
I've loaded the MNIST dataset and split it to its x, y training sets and x, y test sets based on tensorflow
tutorials. 我已经加载了MNIST数据集并将其拆分为x,y训练集和基于
tensorflow
教程的x,y测试集。
I've then used cv2
to invert the images. 然后我用
cv2
来反转图像。
From there I've defined a function using skimage
to get the LBP and the corresponding histogram of an input image 从那里我已经定义了一个使用
skimage
来获取LBP和输入图像的相应直方图的函数
I finally used a classic for
loop to iterate through the images, get their histograms, store these in a separate list, and return the new list and the unaltered label list of both training and test sets. 我最后使用经典
for
循环遍历图像,获取直方图,将它们存储在单独的列表中,并返回新列表和训练集和测试集的未更改标签列表。
Here is the function to load the MNIST dataset: 以下是加载MNIST数据集的功能:
def loadDataset():
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# should I invert it or not?
x_train = cv2.bitwise_not(x_train)
x_test = cv2.bitwise_not(x_test)
return (x_train, y_train), (x_test, y_test)
Here is the function to get the LBP and the corresponding histogram: 这是获取LBP和相应直方图的函数:
def getLocalBinaryPattern(img, points, radius):
lbp = feature.local_binary_pattern(img, points, radius, method="uniform")
hist, _ = np.histogram(lbp.ravel(),
bins=np.arange(0, points + 3),
range=(0, points + 2))
return lbp, hist
And lastly here's the function to iterate over the images: 最后这里是迭代图像的函数:
def formatDataset(dataset):
(x_train, y_train), (x_test, y_test) = dataset
x_train_hst = []
for i in range(len(x_train)):
_, hst = getLocalBinaryPattern(x_train[i], 8, 1)
print("Computing LBP for training set: {}/{}".format(i, len(x_train)))
x_train_hst.append(hst)
print("Done computing LBP for training set!")
x_test_hst=[]
for i in range(len(x_test)):
_, hst = getLocalBinaryPattern(x_test[i], 8, 1)
print("Computing LBP for test set: {}/{}".format(i, len(x_test)))
x_test_hst.append(hst)
print("Done computing LBP for test set!")
print("Done!")
return (x_train_hst, y_train), (x_test_hst, y_test)
I know it'll be slow, and indeed, it is slow. 我知道它会很慢,而且确实很慢。 So I'm kind of looking for more ways to speed it up or if there is already a version of the dataset that has this info I needed.
所以我正在寻找更多方法来加快速度,或者是否已经有一个我需要这个信息的数据集版本。
I don't think there's a straightforward way to speed up the iteration over the images. 我认为没有一种简单的方法可以加快图像的迭代速度。 One might expect that using NumPy's
vectorize
or apply_along_axis
would improve performance, but these solutions are actually slower than a for
loop (or a list comprehension). 人们可能期望使用NumPy的
vectorize
或apply_along_axis
可以提高性能,但这些解决方案实际上比for
循环(或列表理解)慢。
Different alternatives for iterating through the images: 迭代图像的不同选择:
def compr(imgs):
hists = [getLocalBinaryPattern(img, 8, 1)[1] for img in imgs]
return hists
def vect(imgs):
lbp81riu2 = lambda img: getLocalBinaryPattern(img, 8, 1)[1]
vec_lbp81riu2 = np.vectorize(lbp81riu2, signature='(m,n)->(k)')
hists = vec_lbp81riu2(imgs)
return hists
def app(imgs):
lbp81riu2 = lambda img: getLocalBinaryPattern(img.reshape(28, 28), 8, 1)[1]
pixels = np.reshape(imgs, (len(imgs), -1))
hists = np.apply_along_axis(lbp81riu2, 1, pixels)
return hists
Results: 结果:
In [112]: (x_train, y_train), (x_test, y_test) = loadDataset()
In [113]: %timeit -r 3 compr(x_train)
1 loop, best of 3: 14.2 s per loop
In [114]: %timeit -r 3 vect(x_train)
1 loop, best of 3: 17.1 s per loop
In [115]: %timeit -r 3 app(x_train)
1 loop, best of 3: 14.3 s per loop
In [116]: np.array_equal(compr(x_train), vect(x_train))
Out[116]: True
In [117]: np.array_equal(compr(x_train), app(x_train))
Out[117]: True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.