简体   繁体   English

Python中的图像注册和仿射变换

[英]Image Registration and affine transformation in Python

I have been reading Programming Computer Vision with Python by Jan Erik Solem which is a pretty good book, however I haven't been able to clarify a question regarding image registration. 我一直在阅读Jan Erik Solem 用Python编写的计算机视觉编程 ,这是一本非常好的书,但我无法澄清有关图像注册的问题。

Basically, we have a bunch of images (faces) that need to be aligned a bit so the first thing needed is to perform a rigid transformation via a similarity transformation: 基本上,我们有一堆图像(面)需要稍微对齐,所以首先需要通过相似变换执行刚性变换:

x' = | sR t | x
     | 0  1 |

where x is the vector (a set of coordinates in this case) to be transform into x' via a rotation R, a translation t and maybe a scaling s. 其中x是要通过旋转R,平移t和缩放s变换为x'的向量(在这种情况下是一组坐标)。

Solem calculates this rigid transformation for each image which returns the rotation matrix R and a translation vector as tx and ty: Solem为每个图像计算这个刚性变换,它返回旋转矩阵R和平移向量tx和ty:

R,tx,ty = compute_rigid_transform(refpoints, points)

However, he reorders the elements of R for some reason: 但是,由于某种原因,他重新排列R的元素:

T = array([[R[1][1], R[1][0]], [R[0][1], R[0][0]]])

and later he performs an affine transformation : 后来他进行仿射变换

im2[:,:,i] = ndimage.affine_transform(im[:,:,i],linalg.inv(T),offset=[-ty,-tx])

In this example, this affine transformation is performed on each channel but that's not relevant. 在此示例中,此仿射变换在每个通道上执行,但这不相关。 im[:,:,i] is the image to be processed and this procedure returns another image. im[:,:,i]是要处理的图像,此过程返回另一个图像。

What is T and why are we inverting that matrix in the affine transformation? 什么是T ,为什么我们在仿射变换中反转那个矩阵? And what are the usual steps to achieve image registration? 实现图像注册的常用步骤是什么?

Update 更新

Here you can find the relevant part of this code in Google Books. 您可以在此处在Google图书中找到此代码的相关部分。 Starts at the bottom of page 67. 从第67页的底部开始。

It looks like an error in the code to me. 对我来说,这似乎是代码中的错误。 T appears to just be the transpose of R , which for a rotation matrix is the same as the inverse. T似乎只是R的转置,对于旋转矩阵,它与反转相同。 Then he takes the inverse (again) in the call to ndimage.affine_transform . 然后他在调用ndimage.affine_transform (再次)。 I think it should be either T or linalg.inv(R) passed to that function. 我认为它应该是传递给该函数的Tlinalg.inv(R)

I will try to answer your question and point out a mistake (?) in the book. 我会尝试回答你的问题并指出书中的错误(?)。 (1) Why using T = array([[R[1][1], R[1][0]], [R[0][1], R[0][0]]]) ? (1)为什么使用T =数组([[R [1] [1],R [1] [0]],[R [0] [1],R [0] [0]]])? since R,tx,ty = compute_rigid_transform(refpoints, points) computes rotation matrix and translation in the form: 因为R,tx,ty = compute_rigid_transform(refpoints,points)以下列形式计算旋转矩阵和平移:

|x'| = s|R[0][0] R[0][1]||x| + |tx|             Equation (1)
|y'|    |R[1][0] R[1][1]||y|   |ty|

HOWEVER, OUT = ndimage.affine_transform(IN,A,b) requires the coordinate in the form of (y,x) NOT in the order of (x,y). 但是,OUT = ndimage.affine_transform(IN,A,b)要求(y,x)形式的坐标不是(x,y)的顺序。 So the above Equation (1) will become 所以上面的等式(1)将成为

|y'| = s|R[1][1] R[1][0]||y| + |ty| = T|y| + |ty|        Equation(2)
|x'|    |R[0][1] R[0][0]||x|   |tx|    |x|   |tx|

Then, in function ndimage.affine_transform() the matrix will be linalg.inv(T), not linalg.inv(R). 然后,在函数ndimage.affine_transform()中,矩阵将是linalg.inv(T),而不是linalg.inv(R)。

(2) The affine transform OUT = ndimage.affine_transform(IN,A,b) in fact is A*OUT + b => IN . (2)仿射变换OUT = ndimage.affine_transform(IN,A,b)实际上是A * OUT + b => IN。 According to Equation (2), rewrite it as 根据等式(2),将其重写为

|y| = inv(T)|y'| - inv(T)|ty|
|x|         |x'|         |tx|

So the offset in function ndimage.affine_transform() is inv(T)[-ty, -tx], not [-ty -tx]. 因此函数ndimage.affine_transform()中的偏移量为inv(T)[ - ty,-tx],而不是[-ty -tx]。 I think this is a bug in the original code. 我认为这是原始代码中的一个错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM