简体繁体 English

人工神经网络图像变换

[英]Artificial neural network image transformation

原文 2015-07-24 11:48:18 1 4 image/ neural-network/ transform

I have a pairs of images (input-output) but I don't know the transformation to going from A (input) to B (output). 我有一对图像（输入 - 输出），但我不知道从A（输入）到B（输出）的转换。 I want to record image A and get image B. Physically I can change the setup to get A or B, but I want to do it by software. 我想记录图像A并获得图像B.物理上我可以更改设置以获得A或B，但我想通过软件来完成。

If I understood well, a trained Artificial Neural Network is able to do that, having an input can give the corresponding output, is it right? 如果我理解得很好，训练有素的人工神经网络能够做到这一点，输入可以给出相应的输出，是不是？ Is there any software/ANN that just "training" it with entering a number of input-output pairs will be able to provide the correct output if the input is a new (but similar to the others) image? 是否有任何软件/人工神经网络仅仅通过输入多个输入 - 输出对来“训练”它将能够提供正确的输出，如果输入是新的（但与其他相似）图像？

Thanks 谢谢

4 个解决方案

If you have some relevant amount of image pairs (input/output pair) and you don't know transformation between input and output you could train ANN on that training set to imitate that unknown transformation. 如果您有一些相关数量的图像对（输入/输出对）并且您不知道输入和输出之间的转换，您可以在该训练集上训练ANN来模仿未知的转换。 You will be able to well train your ANN only if you have sufficient amount of training image pairs, but it could be pretty impossible when that unknown transformation is complicated. 只有当你有足够数量的训练图像对时，你才能很好地训练你的神经网络，但是当这个未知的变换很复杂时，这可能是非常不可能的。

For example if that transformation simply increases intensity values of pixels at input image by given value, ANN will very fast learn to imitate that behavior, but if that unknown transformation is some complicated convolution or few serial convolutions or something more complicated it will be very hard, near impossible to train ANN to imitate that transformation. 例如，如果该变换仅仅通过给定值增加输入图像处的像素的强度值，则ANN将非常快速地学习模仿该行为，但是如果该未知变换是一些复杂的卷积或少数串行卷积或更复杂的东西将是非常困难的，几乎不可能训练人工神经网络来模仿这种转变。 So, more complex transformation will need bigger training set and more complex ANN design. 因此，更复杂的转型需要更大的训练集和更复杂的ANN设计。

There are plenty of free opensource ANN libraries implemented in many languages. 有许多免费的开源ANN库以多种语言实现。 You could start for example with that tutorial: http://www.codeproject.com/Articles/13091/Artificial-Neural-Networks-made-easy-with-the-FANN 您可以从该教程开始，例如： http ： //www.codeproject.com/Articles/13091/Artificial-Neural-Networks-made-easy-with-the-FANN

What you are asking is possible in principle -- in theory, an ANN with sufficiently many hidden units can learn an arbitrary function to map inputs to outputs. 您所要求的原则是可能的 - 理论上，具有足够多隐藏单元的ANN可以学习将输入映射到输出的任意函数。 However, as the comments and other answers have mentioned, there may be many technical issues with your particular problem that could make it impractical. 但是，正如评论和其他答案所提到的，您的特定问题可能存在许多技术问题，这些问题可能会使其变得不切实际。 I would classify these problems as (a) mapping complexity, (b) model complexity, (c) scaling complexity, and (d) implementation complexity. 我将这些问题归类为（a）映射复杂性，（b）模型复杂性，（c）扩展复杂性，以及（d）实现复杂性。 They are all somewhat related, but hopefully this is a useful way to break things down. 它们都有些相关，但希望这是一种有用的方法来解决问题。

Mapping complexity 映射复杂性

As mentioned by Springfield762, there are many possible functions that map from one image to another image. 正如Springfield762所提到的，有许多可能的函数可以从一个图像映射到另一个图像。 If the relationship between your input images and your output images is relatively simple -- like increasing the intensity of each pixel by a constant amount -- then an ANN would be able to learn this mapping without much difficulty. 如果输入图像和输出图像之间的关系相对简单 - 比如将每个像素的强度增加一定量 - 那么ANN就能够毫不费力地学习这种映射。 There are probably many more transformations that would be similarly easy to learn, such as skewing, flipping, rotating, or translating an image -- basically any affine transformation would be easy to learn. 可能还有许多转换同样易于学习，例如倾斜，翻转，旋转或翻译图像 - 基本上任何仿射变换都很容易学习。 Other, nonlinear transformations could also be feasible, such as squaring the intensity of each pixel. 其他非线性变换也是可行的，例如平方每个像素的强度。

As a general rule, the more complicated the relationship between your input and output images, the more difficult it will be to get a model to learn this mapping for you. 作为一般规则，输入和输出图像之间的关系越复杂，让模型为您学习此映射就越困难。

Model complexity 模型复杂性

The more complex the mapping from inputs to outputs, the more complex your ANN model will be to be able to capture this mapping. 从输入到输出的映射越复杂，您的ANN模型就越能够捕获此映射。 Models with many hidden layers have been shown in the past 10 years to perform quite well on tasks that people had previously thought impossible, but often these state-of-the-art models have millions or even billions of parameters and take weeks to train on GPU hardware. 在过去10年中，已经展示了具有许多隐藏层的模型，以便在人们以前认为不可能完成的任务上表现得非常好，但这些最先进的模型通常具有数百万甚至数十亿的参数并需要数周时间才能进行训练。 GPU硬件。 A simple model can capture many simple mappings, but if you have a complex input-output map to learn, you'll need a large, complex model. 一个简单的模型可以捕获许多简单的映射，但是如果你有一个复杂的输入输出映射来学习，你将需要一个庞大的复杂模型。

Scaling complexity 扩展复杂性

Yves mentioned in the comments that it can be difficult to scale models up to typical image sizes. Yves在评论中提到，将模型缩放到典型的图像尺寸可能很困难。 If your images are relatively small (currently the state of the art is to model images on the order of 100x100 pixels), then you can probably just throw a bunch of raw pixel data at an ANN model and see what happens. 如果您的图像相对较小（目前最先进的是对大小为100x100像素的图像进行建模），那么您可以在ANN模型中抛出一堆原始像素数据，看看会发生什么。 But if you're using 6000x4000 images from your shiny Nikon DSLR, it's going to be quite difficult to process those in a reasonable amount of time. 但如果您使用闪亮的尼康数码单反相机中的6000x4000图像，则在合理的时间内处理这些图像将非常困难。 You'd be better off compressing your image data somehow ( PCA is a common technique) and then trying to learn the mapping in the compressed space. 你最好以某种方式压缩图像数据（ PCA是一种常用技术），然后尝试学习压缩空间中的映射。

In addition, larger images will have a larger space of possible mappings between them, so you'll need more of your larger images as training data than you would if you had small images. 此外，较大的图像将在它们之间具有更大的可能映射空间，因此您需要比您拥有小图像时更多的较大图像作为训练数据。

Springfield762 also mentioned this: If the mapping between your input and output images is simple, then you'll only need a few examples to learn the mapping successfully. Springfield762也提到了这一点：如果您的输入和输出图像之间的映射很简单，那么您只需要几个示例即可成功学习映射。 But if you have a complicated mapping, then you'll need much more training data to have a chance at learning the mapping properly. 但是如果你有一个复杂的映射，那么你需要更多的训练数据才有机会正确地学习映射。

Implementation complexity 实施复杂性

It's unlikely that a tool already exists that would let you just throw image data into an ANN model and have a mapping appear. 一个已经存在的工具不太可能让您只将图像数据投入到ANN模型中并显示映射。 Most likely you'll need, at a minimum, to implement some code that will pre-process your image data. 最有可能的是，您至少需要实现一些预处理图像数据的代码。 In addition, if you have lots of large images you'll probably need to write code to handle loading data from disk, etc. (There are a lot of "big data" tools for things like this, but they all require some amount of work to get set up.) 另外，如果你有很多大图像，你可能需要编写代码来处理从磁盘加载数据等。（有很多“大数据”工具用于这样的事情，但它们都需要一些努力设置。）

There are many, many open source ANN toolkits out there nowadays. 现在有很多很多开源的ANN工具包。 FANN (already mentioned) is a popular one in C++ with bindings in other languages. FANN（已经提到过）是C ++中的一种流行的，带有其他语言的绑定。 Caffe is quite popular, and is also implemented in C++ with bindings. Caffe很受欢迎，也是用C ++实现的绑定。 There seem to be many toolkits that use Python and Theano or some other GPU acceleration library -- Keras , Lasagne , Hebel , Pylearn2 , neon , and Theanets (I wrote this one). 似乎有许多工具包使用Python和Theano或其他一些GPU加速库 - Keras ， Lasagne ， Hebel ， Pylearn2 ， neon和Theanets （我写了这个）。 Many people use Torch , written in Lua. 许多人使用用Lua编写的Torch 。 Matlab has at least one neural network toolbox. Matlab至少有一个神经网络工具箱。 I'm less familiar with other ecosystems, but Java seems to have Deeplearning4j , C# has Accord , and even R has darch . 我对其他生态系统不熟悉，但Java似乎有Deeplearning4j ，C＃有Accord ，甚至R还有darch 。

But with any of these neural network toolkits, you're going to have to write some code to load the data, process it into the appropriate input format, construct (or load) a network model, train the model, etc. 但是对于任何这些神经网络工具包，您将不得不编写一些代码来加载数据，将其处理为适当的输入格式，构建（或加载）网络模型，训练模型等。

The problem you're trying to solve is a canonical classification problem that neural networks can help you solve. 您尝试解决的问题是神经网络可以帮助您解决的规范分类问题。 You treat the B images as a set of labels that you match to A, and once trained, the neural network will be able to match the B images to new input based on where the network locates new input in a high-dimensional vector space. 您将B图像视为与A匹配的一组标签，并且一旦经过训练，神经网络将能够根据网络在高维向量空间中定位新输入的位置将B图像与新输入进行匹配。 I assume you'd use some combination of convolutional networks to create your features, and softmax for multinomial classification on the output layer. 我假设您使用卷积网络的某种组合来创建您的功能，并使用softmax进行输出层上的多项分类。 More here: http://deeplearning4j.org/convolutionalnets.html 更多信息： http ： //deeplearning4j.org/convolutionalnets.html