帶有 OCR 識別文本的去模糊圖像

Question

我有一張模糊的圖像：
這是名片的一部分，它是相機拍攝的幀之一，沒有正確對焦。

清晰的圖像看起來像這樣我正在尋找可以為我提供更好質量圖像的方法，以便 OCR 可以識別該圖像，但速度也應該相當快。 圖像沒有太模糊（我認為是這樣），但對 OCR 不利。 我試過：

不同種類的HPF，
拉普拉斯，
精明的探測器，
形態學操作（開、閉）的組合。

我也試過：

用維納濾波器反卷積，
反卷積和 Lucy-Richardson 方法。

但是要找到合適的 PSF（點擴展函數）並不容易。 這些方法被認為是有效的，但還不夠快。 我也嘗試過 FFT，然后使用高斯掩碼進行 IFFT，但結果並不令人滿意。 我正在尋找某種用文本去模糊圖像的通用方法，而不僅僅是這個圖像。 有人可以幫我解決這個問題嗎？ 我將不勝感激任何建議。 我正在使用 OpenCV 3（C++，有時是 Python）。

Answer 1

你知道盲解卷積嗎？

盲解卷積是一種眾所周知的天文圖像恢復技術。 這對於很難找到 PSF 的應用程序特別有用。

這是該技術的一個 C++ 實現。 這篇論文也與您正在尋找的內容非常相關。 這是他們算法的示例輸出：

Answer 2

我最近也遇到了這個問題，並用更多細節和最近的方法提出了一個類似的問題。 到目前為止，這似乎是一個未解決的問題。 最近有一些研究工作試圖通過深度學習解決此類問題。 不幸的是，沒有一件作品達到我們的預期。 但是，我正在分享這些信息，以防它對任何人有所幫助。

1. 野外場景文本圖像超分辨率

就我們而言，這可能是我們最后的選擇； 相對而言，表現足夠好。 這是最近的一項研究工作（ TSRN ），主要集中在此類案例上。 它的主要直覺是引入超分辨率（SR）技術作為預處理。 到目前為止，這種實現看起來是最有前途的。 這是他們成就的說明，改善模糊以清潔圖像。

2. 神經增強

從他們的repo演示來看，它似乎也可能有一些改善模糊文本的潛力。 然而，作者可能在大約 4 年的時間里都沒有維護這個 repo。

3. 使用 GAN 進行盲動去模糊

吸引人的部分是其中的Blind Motion Deblurring機制，名為DeblurGAN 。 它看起來很有希望。

4. 通過核估計和噪聲注入實現真實世界的超分辨率

關於他們工作的一個有趣事實是，與其他文學作品不同，他們首先通過估計各種模糊核以及真實噪聲分布，為真實世界的圖像設計了一個新穎的退化框架。 基於此，他們獲取與真實世界圖像共享公共域的LR圖像。 然后，他們提出了一個旨在更好感知的現實世界超分辨率模型。 從他們的文章：

但是，在我的觀察中，我無法得到預期的結果。 我在github上提出了一個問題，直到現在沒有得到任何回應。

用於直接文本去模糊的卷積神經網絡

該論文是由@Ali共享看起來很有趣，結果是非常好的。 很高興他們分享了他們訓練模型的預訓練權重，還分享了 python 腳本以便於使用。 但是，他們已經對Caffe庫進行了試驗。 我更願意轉換為PyTorch以更好地控制。 下面是提供的帶有Caffe導入的 Python 腳本。 請注意，由於缺乏 Caffe 知識，直到現在我無法完全移植它，如果您知道，請糾正我。

from __future__ import print_function
import numpy as np
import os, sys, argparse, glob, time, cv2, Queue, caffe

# Some Helper Functins 
def getCutout(image, x1, y1, x2, y2, border):
    assert(x1 >= 0 and y1 >= 0)
    assert(x2 > x1 and y2 >y1)
    assert(border >= 0)
    return cv2.getRectSubPix(image, (y2-y1 + 2*border, x2-x1 + 2*border), (((y2-1)+y1) / 2.0, ((x2-1)+x1) / 2.0))

def fillRndData(data, net):
    inputLayer = 'data'
    randomChannels = net.blobs[inputLayer].data.shape[1]
    rndData = np.random.randn(data.shape[0], randomChannels, data.shape[2], data.shape[3]).astype(np.float32) * 0.2
    rndData[:,0:1,:,:] = data
    net.blobs[inputLayer].data[...] = rndData[:,0:1,:,:]

def mkdirp(directory):
    if not os.path.isdir(directory):
        os.makedirs(directory)

主函數從這里開始

def main(argv):
    pycaffe_dir = os.path.dirname(__file__)

    parser = argparse.ArgumentParser()
    # Optional arguments.
    parser.add_argument(
        "--model_def",
        help="Model definition file.",
        required=True
    )
    parser.add_argument(
        "--pretrained_model",
        help="Trained model weights file.",
        required=True
    )
    parser.add_argument(
        "--out_scale",
        help="Scale of the output image.",
        default=1.0,
        type=float
    )
    parser.add_argument(
        "--output_path",
        help="Output path.",
        default=''
    )
    parser.add_argument(
        "--tile_resolution",
        help="Resolution of processing tile.",
        required=True,
        type=int
    )
    parser.add_argument(
        "--suffix",
        help="Suffix of the output file.",
        default="-deblur",
    )
    parser.add_argument(
        "--gpu",
        action='store_true',
        help="Switch for gpu computation."
    )
    parser.add_argument(
        "--grey_mean",
        action='store_true',
        help="Use grey mean RGB=127. Default is the VGG mean."
    )
    parser.add_argument(
        "--use_mean",
        action='store_true',
        help="Use mean."
    )
    parser.add_argument(
        "--adversarial",
        action='store_true',
        help="Use mean."
    )
    args = parser.parse_args()

    mkdirp(args.output_path)

    if hasattr(caffe, 'set_mode_gpu'):
        if args.gpu:
            print('GPU mode', file=sys.stderr)
            caffe.set_mode_gpu()
        net = caffe.Net(args.model_def, args.pretrained_model, caffe.TEST)
    else:
        if args.gpu:
            print('GPU mode', file=sys.stderr)
        net = caffe.Net(args.model_def, args.pretrained_model, gpu=args.gpu)


    inputs = [line.strip() for line in sys.stdin]

    print("Classifying %d inputs." % len(inputs), file=sys.stderr)


    inputBlob = net.blobs.keys()[0] # [innat]: input shape 
    outputBlob = net.blobs.keys()[-1]

    print( inputBlob, outputBlob)
    channelCount = net.blobs[inputBlob].data.shape[1]
    net.blobs[inputBlob].reshape(1, channelCount, args.tile_resolution, args.tile_resolution)
    net.reshape()

    if channelCount == 1 or channelCount > 3:
        color = 0
    else:
        color = 1

    outResolution = net.blobs[outputBlob].data.shape[2]
    inResolution = int(outResolution / args.out_scale)
    boundary = (net.blobs[inputBlob].data.shape[2] - inResolution) / 2

    for fileName in inputs:
        img = cv2.imread(fileName, flags=color).astype(np.float32)
        original = np.copy(img)
        img = img.reshape(img.shape[0], img.shape[1], -1)
        if args.use_mean:
            if args.grey_mean or channelCount == 1:
                img -= 127
            else:
                img[:,:,0] -= 103.939
                img[:,:,1] -= 116.779
                img[:,:,2] -= 123.68
        img *= 0.004

        outShape = [int(img.shape[0] * args.out_scale) ,
                    int(img.shape[1] * args.out_scale) ,
                    net.blobs[outputBlob].channels]
        imgOut = np.zeros(outShape)

        imageStartTime = time.time()
        for x, xOut in zip(range(0, img.shape[0], inResolution), range(0, imgOut.shape[0], outResolution)):
            for y, yOut in zip(range(0, img.shape[1], inResolution), range(0, imgOut.shape[1], outResolution)):

                start = time.time()

                region = getCutout(img, x, y, x+inResolution, y+inResolution, boundary)
                region = region.reshape(region.shape[0], region.shape[1], -1)
                data = region.transpose([2, 0, 1]).reshape(1, -1, region.shape[0], region.shape[1])

                if args.adversarial:
                    fillRndData(data, net)
                    out = net.forward()
                else:
                    out = net.forward_all(data=data)

                out = out[outputBlob].reshape(out[outputBlob].shape[1], out[outputBlob].shape[2], out[outputBlob].shape[3]).transpose(1, 2, 0)

                if imgOut.shape[2] == 3 or imgOut.shape[2] == 1:
                    out /= 0.004
                    if args.use_mean:
                        if args.grey_mean:
                            out += 127
                        else:
                            out[:,:,0] += 103.939
                            out[:,:,1] += 116.779
                            out[:,:,2] += 123.68

                if out.shape[0] != outResolution:
                    print("Warning: size of net output is %d px and it is expected to be %d px" % (out.shape[0], outResolution))
                if out.shape[0] < outResolution:
                    print("Error: size of net output is %d px and it is expected to be %d px" % (out.shape[0], outResolution))
                    exit()

                xRange = min((outResolution, imgOut.shape[0] - xOut))
                yRange = min((outResolution, imgOut.shape[1] - yOut))

                imgOut[xOut:xOut+xRange, yOut:yOut+yRange, :] = out[0:xRange, 0:yRange, :]
                imgOut[xOut:xOut+xRange, yOut:yOut+yRange, :] = out[0:xRange, 0:yRange, :]

                print(".", end="", file=sys.stderr)
                sys.stdout.flush()


        print(imgOut.min(), imgOut.max())
        print("IMAGE DONE %s" % (time.time() - imageStartTime))
        basename = os.path.basename(fileName)
        name = os.path.join(args.output_path, basename + args.suffix)
        print(name, imgOut.shape)
        cv2.imwrite( name, imgOut)

if __name__ == '__main__':
    main(sys.argv)

運行程序：

cat fileListToProcess.txt | python processWholeImage.py --model_def ./BMVC_nets/S14_19_200.deploy --pretrained_model ./BMVC_nets/S14_19_FQ_178000.model --output_path ./out/ --tile_resolution 300 --suffix _out.png --meangpu

權重文件和上述腳本可以從這里 (BMVC_net)下載。 但是，您可能想要轉換caffe2pytorch 。 為了做到這一點，這里是基本的起點：

安裝原型鏡頭
克隆caffemodel2pytorch

接下來，

# BMVC_net, you need to download it from authors website, link above
model = caffemodel2pytorch.Net(
    prototxt = './BMVC_net/S14_19_200.deploy', 
    weights = './BMVC_net/S14_19_FQ_178000.model',
    caffe_proto = 'https://raw.githubusercontent.com/BVLC/caffe/master/src/caffe/proto/caffe.proto'
)

model.cuda()
model.eval()
torch.set_grad_enabled(False)

運行演示張量，

# make sure to have right procedure of image normalization and channel reordering
image = torch.Tensor(8, 3, 98, 98).cuda()

# outputs dict of PyTorch Variables
# in this example the dict contains the only key "prob"
#output_dict = model(data = image)

# you can remove unneeded layers:
#del model.prob
#del model.fc8

# a single input variable is interpreted as an input blob named "data"
# in this example the dict contains the only key "fc7"
output_dict = model(image)
# print(output_dict)
print(output_dict.keys())

請注意，有一些基本的事情需要考慮； 網絡期望 DPI 為 120-150 的文本、合理的方向和合理的黑白水平。 網絡期望從輸入中減去 [103.9, 116.8, 123.7]。 輸入應進一步乘以 0.004。

帶有 OCR 識別文本的去模糊圖像

問題描述

2 個解決方案

解決方案1
16 2020-03-06 18:22:09

解決方案2
9 2020-11-15 09:51:27

1. 野外場景文本圖像超分辨率

2. 神經增強

3. 使用 GAN 進行盲動去模糊

4. 通過核估計和噪聲注入實現真實世界的超分辨率

用於直接文本去模糊的卷積神經網絡

帶有 OCR 識別文本的去模糊圖像

問題描述

2 個解決方案

解決方案1 16 2020-03-06 18:22:09

解決方案2 9 2020-11-15 09:51:27

1. 野外場景文本圖像超分辨率

2. 神經增強

3. 使用 GAN 進行盲動去模糊

4. 通過核估計和噪聲注入實現真實世界的超分辨率

用於直接文本去模糊的卷積神經網絡

解決方案1
16 2020-03-06 18:22:09

解決方案2
9 2020-11-15 09:51:27