[英]Understanding Handwritten digit by computer
i would like to ask you one question: wanted to implement a code which clarifies a picture done by hand ( by pen), let us consider such image我想问你一个问题:想要实现一个代码来澄清手工(用笔)完成的图片,让我们考虑这样的图像
it is done by blue pen, which should be converted to the gray scale image using following code它是由蓝笔完成的,应使用以下代码将其转换为灰度图像
from PIL import Image
user_test = filename
col = Image.open(user_test)
gray = col.convert('L')
bw = gray.point(lambda x: 0 if x<100 else 255, '1')
bw.save("bw_image.jpg")
bw
img_array = cv2.imread("bw_image.jpg", cv2.IMREAD_GRAYSCALE)
img_array = cv2.bitwise_not(img_array)
print(img_array.size)
plt.imshow(img_array, cmap = plt.cm.binary)
plt.show()
img_size = 28
new_array = cv2.resize(img_array, (img_size,img_size))
plt.imshow(new_array, cmap = plt.cm.binary)
plt.show()
idea is that i am taking image from camera directly, but it is losing structure of digit and comes only empty and black picture, like this想法是我直接从相机拍摄图像,但它正在失去数字结构并且只有空白和黑色图片,像这样
therefore computer can't understand which digit it is and neural networks fails to predict its label correctly, could you please tell me which transformation should i apply in order to detect this image much more precisely?因此计算机无法理解它是哪个数字,并且神经网络无法正确预测其 label,您能否告诉我应该应用哪种转换才能更准确地检测此图像?
edit :
i have apply following code我已经申请了以下代码
from PIL import Image
user_test = filename
col = Image.open(user_test)
gray = col.convert('L')
plt.hist(img_array)
plt.show()
You have several issues here, and you can methodically address them.您在这里有几个问题,您可以有条不紊地解决它们。 First of all you're having an issue with thresholding properly.
首先,您在正确设置阈值方面遇到问题。
As I suggested in earlier comments, you can easily see why your original thresholding was unsuccessful.正如我在之前的评论中所建议的,您可以很容易地看到为什么您的原始阈值设置不成功。
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from matplotlib import cm
im = Image.open('whatever_path_you_choose.jpg').convert("L")
im = np.asarray(im)
plt.hist(im.flatten(), bins=np.arange(255));
Looking at the image you gave:查看您提供的图像:
Clearly the threshold should be somewhere between 100-200, not as in your original code.显然,阈值应该在 100-200 之间,而不是在您的原始代码中。 Also note that this distribution isn't very bimodal - so I'm not sure otsu's method would work well here.
另请注意,这种分布不是很双峰 - 所以我不确定 otsu 的方法在这里是否能很好地工作。
If we eyeball it (this can be tuned), we can see that thresholding at 145-ish gives decent results in terms of segmentation.如果我们观察它(这可以调整),我们可以看到在 145-ish 的阈值在分割方面给出了不错的结果。
im_thresh = (im >= 145)
plt.imshow(im_thresh, cmap=cm.gray)
Now you might have an additional issue that you have horizontal lines, you can address this by writing on blank paper as suggested.现在您可能有一个额外的问题,即您有水平线,您可以按照建议在白纸上书写来解决这个问题。 This wasn't exactly your question but I will try to address it anyways (in a naive fashion).
这不完全是您的问题,但无论如何我都会尝试解决它(以一种天真的方式)。 You can try a naive solution of using a sobel filter (think of it as the derivative of the image to get the lines), followed by a median filter to get the approximately most common pixel intensity - the size of the filter might have to vary for different digits though.
您可以尝试使用 sobel 滤波器的简单解决方案(将其视为图像的导数以获取线条),然后使用中值滤波器来获得近似最常见的像素强度 - 滤波器的大小可能必须有所不同但是对于不同的数字。 This should clear up some of the lines.
这应该清除一些线。 For a more rigorous approach try reading up on hough line transform for detecting horizontal lines and try to whiten them out.
对于更严格的方法,请尝试阅读霍夫线变换以检测水平线并尝试将它们变白。
This is my very naive approach:这是我非常天真的方法:
from skimage.filters import sobel
from scipy.ndimage import median_filter
#Sobel filter reverses intensities so subtracting the result from 1.0 turns it back to the original
plt.imshow(1.0 - median_filter(sobel(im_thresh), [10, 3]), cmap=cm.gray)
You can try cropping automatically afterwards.之后您可以尝试自动裁剪。 Honestly I think most neural networks that could recognize MNIST-like digits could recognize the result I posted at the end as well.
老实说,我认为大多数可以识别类似 MNIST 数字的神经网络也可以识别我最后发布的结果。
Try using skimage package like this.尝试像这样使用 skimage package。 This has inbuilt functions for image processing:
这具有用于图像处理的内置功能:
from skimage import io
from skimage.restoration import denoise_tv_chambolle
from skimage.filters import threshold_otsu
image = io.imread('path/to/your/image', as_gray=True)
# Denoising
denoised_image = denoise_tv_chambolle(image, weight=0.1, multichannel=True)
# Thresholding
threshold = threshold_otsu(denoised_image)
thresholded_image = denoised_image > threshold
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.