SLAM(Simultaneaus Localization and Mapping) locates the sensor and percepts environment based on obeserved data. The camera is the most accessible and widely used sensor, so it is important to know how the real world is projected to the image through the camera.
Pinhole Camera Models
The pinhole camera models is a simple mathematical model used to describe how a camera captures an image of a scene. It is based on the principle of perspective projection, that light rays from a scene go through the hole(camera lens) and create an image on a plane.
We will deal with the camera coordinate system in whuch the lens of the camera is the origin as shown in Figure 2. The figure shows how the point \(P[X,Y,Z]^{T}\) is projected on the image plane at point \(P^{\prime}\). By the similarity of the triangles:
$$ \frac{Z}{f} = -\frac{X}{X^{\prime}} = -\frac{Y}{Y^{\prime}} $$
As light passes through the hole, the image is projected upside down. The negative sign indicates that image is inverted. However, since the modern cameras do not invert the image, we remove the negative sign assuming that image is formed in front of the lens, not behind the lens.
$$ \frac{Z}{f} = \frac{X}{X^{\prime}} = \frac{Y}{Y^{\prime}} $$
Thus, the point \(P^{\prime}[X^{\prime},Y^{\prime}]\) can be computed as:
$$ X^{\prime} = f\frac{X}{Z}, Y^{\prime} = f\frac{Y}{Z} $$
When we represent image in pixel coordinate, the origin of the pixel coordinate place in the upper left corner of the image as shown in figure 4. Thus, the pixel coordinate \([u,v]^{T}\) is:
$$ \left\{\begin{matrix}
u = f\frac{X}{Z} + c_{x}\\ v = f\frac{Y}{Z} + c_{y}
\end{matrix}\right. $$
where \(c_{x}, c_{y}\) is the principle point(It is the point where the optical axis intersects the image plane, and it is typically not located at the center of the image). For image magnification, \(\alpha\), \(\beta \) are multiplied to each \(u, v\) axis. Then:
$$\left\{\begin{matrix}
u = f_{x}\frac{X}{Z}+c_{x} \\
v = f_{y}\frac{Y}{Z}+c_{y}
\end{matrix}\right.$$
So, when we picture a point \(P\) in a camera coordinate, the 3d-point is projected to image plane as:
$$\begin{pmatrix}u \\v\\1\end{pmatrix} = \frac{1}{Z}\begin{pmatrix}f_x & 0 & c_x \\0 & f_y & c_y \\0 & 0 & 1\end{pmatrix}\begin{pmatrix}X \\Y\\Z\end{pmatrix} = \frac{1}{Z}KP $$
\(\frac{1}{Z}\) is multiplied to make the image to locate in the \(z=1\), so we get the projection of the point \(P\) on the normalized plane. If the point \(P\) is not in the camera coordinate, but in world coordinate, It should be converted to the camera coordinate system based on the current pose of the camera. The conversion between two different coordinates was covered in the previous chapter 2023.02.14 - [SLAM] - SLAMBOOK Chapter2: 3D Rigid Body Motion.
Distortion
We now understand how 3d points are projected onto the image,but the image itself may be incomplete and distorted by several reasons which can impact on the quality and accuracy of the information extracted from the image. Maybe the shape of the lens or mechanical assembly is not perfect.
To undistort the image, there are sime mathematical models to describe the distortion caused by the shape of the lens. The distortions fall into two main categories: barrel-like distortion and cushion-like distortion.
Barrel distortion causes a reduction in the radius of pixels as the distance from the optical axis increases, whereas cushion distortion results in the opposite effect. However, the line that passes through the center of the image and the optical axis remains unchanged in both distortions. Apart from the lens shape that leads to radial distortion, tangential distortion occurs during camera assembly because the lens and the imaging surface cannot be precisely parallel. The radial distortion is computed as:
$$ \left\{\begin{matrix}
x_{radial} = x(1+k_{1}r^2+k_{2}r^4+k_{3}r^6) \\
y_{radial} = y(1+k_{1}r^2+k_{2}r^4+k_{3}r^6)
\end{matrix}\right.$$
and the tangential distortion:
$$\left\{\begin{matrix}
x_{tangential} = x+2p_{1}xy+p_{2}(r^2+2x^2) \\
y_{tangential} = y+2p_{1}(r^2+2y^2)+p_{2}xy
\end{matrix}\right.$$
Putting above two expression together, we get a joint model with 5 distortion coefficients as:
$$\left\{\begin{matrix}
x_{tangential} = x(1+k_{1}r^2+k_{2}r^4+k_{3}r^6)+2p_{1}xy+p_{2}(r^2+2x^2) \\
y_{tangential} = y(1+k_{1}r^2+k_{2}r^4+k_{3}r^6)+2p_{1}(r^2+2y^2)+p_{2}xy
\end{matrix}\right.$$
'SLAM' 카테고리의 다른 글
SLAMBOOK Chapter3: Lie Group and Lie Algebra (0) | 2023.02.19 |
---|---|
SLAMBOOK Chapter2: 3D Rigid Body Motion (0) | 2023.02.14 |
SLAMBOOK (0) | 2023.02.13 |
IMU Preintergration on Maniforld for Efficient Visual-Inertial Maximum-a-Posteriori Estimation(2) (0) | 2022.07.25 |
IMU Preintergration on Maniforld for Efficient Visual-Inertial Maximum-a-Posteriori Estimation (0) | 2022.07.23 |
댓글