Elphel Development Blog

Updated: 43 min 25 sec ago

Subpixel Registration and Distortion Measurement

Thu, 06/13/2013 - 13:46

Motivation While working on the second generation of the Eyesis panoramic cameras, we decided to try go from capturing the series of the individual panoramic images to the 3d reconstruction. There are multiple successful implementations of such process, we just plan to achieve higher precision of capturing the 3d worlds using Elphel ability to design and build the hardware specific for such purpose. While most projects are designed to work with the standard off-the-shelf cameras, we are working on building the cameras together with the devices and methods for these cameras calibration. In order to be able to precisely determine the 3-d locations of the features registered with the cameras we plan first go as far as possible to precisely map each pixel of each sub-camera (of the composite camera) image to the ray in space. That would require at least two distinctive steps: map each pixel of each sub-image to the ray in the composite camera frame system and measure the composite camera egomotion to find the absolute pixels-to-rays mapping. The first step can be done by calibration of the camera in the laboratory, measuring the position and orientation of each sub-camera relative to the composite camera frame and simultaneously measuring camera lens distortions. To implement the second step we plan to combine the inertial data from the high resolution/high bandwidth inertial measurement unit (IMU) with the optical data from the camera itself. This post is dedicated to the first part, the static pixel mapping of the composite camera. The complete camera has 26 sub-camera modules and we are building a goniometer type camera calibration machine that will rotate the finished camera around two axes exposing the large calibration pattern (3.0m by 2.6m) to different areas of each of the sub-camera sensors. Registered images will be also used to correct each lens aberration as described in the earlier blog posts [1] Registering the pattern grid We first designed the grid pattern [2] for measuring (and correcting) the lens aberrations - it has less aliasing artifacts and more uniform spectrum than plain checkerboard pattern normally used for distortion measurement while still being easy to detect and register. Processing starts from the Bayer mosaic image acquired from the camera and for distortion measurements only the green channel is used. Figure 1a shows the grid that occupies most of the image sensor area, other images may have the pattern in smaller areas, so the pattern registration starts with detecting pattern in at least some areas. For that the smaller (normally 128x128 pixels) square areas of green pixels are extracted as shown on Fig.1b the image is rotated 45 degrees from the original on fig.1a because green channel uses half of the sensor pixels, located in say "black" cells of the checker board pixel grid. [caption id="attachment_3565" align="alignleft" width="537" caption="Fig.1 Pattern grid registration: a) registered image. b) 128pix*128pix region for green channels. c) power spectrum (gamma=0.2). d) clusters around first maximums in 2 directions. e) 64x64 region centered around the estimated grid node. f) simulated grid with window function applied. g) mixed cross/phase correlation of registered and simulated arrays. h) combination of correlations with 4 half-pixel shifted models. i) result pixel X coordinate for each of the registered grid nodes. j) same for the pixel Y coordinate"] [3][/caption] Next step is to calculate the spectrum of this pattern - Fig. 1c shows the power spectrum to the power of 0.2 to somewhat equalize the harmonics amplitude (it worked better than logarithm function). Pixel clusters around the first several maximums along the two pattern axes were extracted and used for initial measurement of the two pattern translation vectors (Fig. 1d). If no pattern is detected the scanning continues, instead of the sequential scanning I used reverse-bit order to reduce number of trials before hitting the area with the pattern. After the program detects the first area with the pattern it continues with the more precise measurement of the correlation between the acquired image and the simulated pattern, calculated for the same grid translation vectors. Fig. 1e shows the 64x64 pixels fragment of the registered image centered around the estimated grid node (before multiplying by a window function) and Fig. 1f shows the matching simulated one with the window applied. Correlation is performed through direct and reverse Fourier transforms, run-time parameter allows changing from the cross-correlation to the phase correlation. Fig. 1g shows result of such operation applied to images on Fig. 1e and 1f. While there is only one copy of the acquired image around the grid node shown on Fig. 1e, it is possible to generate the simulated image (Fig. 1f) with different sub-pixel shifts, combining the results of their correlation with the same source image can reduce effects of the aliasing and increase the resolution of the correlation result. The next image (Fig. 1h) shows the result of combining 4 of 64x64 pixels correlations where the simulated images are generated with 0 and 0.5 pixel shifts in each of the two directions, using the center 64x64 of the combined 128x128 image and application of a low-pass filter. This data is then used to calculate the coordinates of the grid node with one of the several methods - fitting the quadratic function or finding the centroid of the top of the maximum, cut at a specified threshold or area. The whole process of calculating the locations of the grid nodes is ran twice. On the first pass program starts from the first node detected and uses the grid parameters to estimate the locations of the neighbor nodes and use this estimation to generate the local pattern simulation used for correlation. This process generates a wave from the initially detected point that propagates until all the connected grid nodes are processed and the correlation contrast (calculated as the ratio of the average signal inside the center circle and that in a ring that falls between the expected maximums) falls below the specified threshold. The program uses quadratic approximation for the simulated pattern generation to compensate for the local lens distortions - while not critical for the lens used in this work, that was important to fit the grid to the fisheye lens used in the first Eyesis camera. The second pass refines the location of the grid nodes detected in the first pass - this time the locations of the neighbors already known are used for the local pattern grid generation. The result of this procedure is the array of grid points organized by the grid coordinates (u,v) with half of them (even sum of u+v) being "positive" and the other half (odd sum of u+v) being inverted. This data is stored as a multi-slice TIFF image, two of the slices: sensor pixel X coordinate and sensor pixel Y are shown on the Fig. 1i and Fig. 1j, respectively. [caption id="attachment_3538" align="alignleft" width="230" caption="Fig.2a Grid curvature - distance in pixels from the registered pattern grid node location and the average of it 8 neighbors (full scale 0.2 pixels). Visible center seam of the two pattern panels."] [4][/caption] Evaluating results of the pattern registration When several methods of pattern grid registration were implemented I needed some way to estimate the precision of the results, to be able to compare the influence of the different variable parameters (correlation size, window function, cross/phase correlation ratio) on this "registration quality". Most interesting for us was how repetitive the registration results are so how their invariants would differ for the images acquired in at least approximately the same conditions. [caption id="attachment_3539" align="alignright" width="270" caption="Fig. 2c Grid curvature - distance in pixels from the registered pattern grid node location and the average of it 8 neighbors. One half (corresponding to top of the Fig.2a) is shown."] [5][/caption] [caption id="attachment_3683" align="alignleft" width="230" caption="Fig. 2b Grid curvature calculated with equalization of the pattern brightness"] [6][/caption] I used the following intra-frame estimate of the registration errors. First the program calculated the grid curvature as a difference between the selected grid node location (pixel X and Y values) and the average value calculated from the node neighbors. This value (shown on Fig. 2a-c) is dependent on the actual curvature of the registered grid (caused by the lens distortion), the errors of the physical pattern attached to the wall and the variations of the results of the registration method. All of these three contributing factors reveal themselves on these images. The circular structures are related the optical distortions, the bright column in the middle of the Fig. 2a is caused by the small misalignment of the 2 halves of the physical pattern - it consists of the two 1.5m by 2.6m plastic panels attached to the wall. Each of the panels maybe (and actually is as we'll see later) slightly distorted from the perfectly rectangular shape, and there is some error in alignment of the panels to each other. [caption id="attachment_3691" align="alignright" width="324" caption="Fig. 2d Pattern grid flat field correction"] [7][/caption] There is also some "noise" visible on Fig. 2a (has to be smoothed out for 3d visualization on Fig.2c), and checker-board artifacts mostly visible near the corners of the images. These artifacts are likely caused by vignetting which has more effect on white parts of the pattern and so biases the registered location when the the overall brightness gradient coincides with the direction connecting the two opposite white quadrants of the pattern around the node. Vignetting correction that we'll incorporate in the same measurement cycle will reduce this effect and improve the pattern grid node registration. Update: yes, these artifacts were in fact caused by the lens vignetting and uneven illumination of the pattern. When the pattern images were equalized (see Fig. 2d) the artifacts were successfully eliminated (Fig. 2b). At this stage it is insignificant what causes the brightness gradients - lens or the light source, and the "flat-field" correction is only needed for the areas of the image where the test pattern is detected. The equalization is implemented after the first pass of of measuring the grid node coordinates. The program calculates average value of the pixels around these already known coordinates using the same window function as for the correlation. This method seems to work as is, but it can be improved to probe average pixel values around the estimated centers of white and black cells, in that case the result will less depend on the initial precision so it will work for lenses with higher distortions (as the fisheye ones). When the average intensity is calculated for every known grid node, the border node values are removed because their value is influenced by the pixels outside of the pattern area. Next the average intensity map is expanded by extrapolating the measured intensity data outwards from the known cells to estimate the average intensity values near the pattern edges and beyond, so the low-pass filter would not skew the result. The average intensity map is then applied to the original image before refining the pattern grid node locations during the second pass (black area on Fig. 2d is not a privacy filter, it is just outside of extrapolation area ). [caption id="attachment_3537" align="alignleft" width="230" caption="Fig.3a Registration errors as distances from the registered grid nodes to the calculated from the neighbors, assuming constant curvature. Full scale is 0.2 pixels"] [8][/caption] The high frequency variations on the Fig. 2a are caused by the registration errors - measurements described later did not show any detectable node-to-node differences on the pattern, only the gradual changes. To separate gradual variations caused by the lens and pattern distortions I subtracted the estimated (from the neighbor nodes) grid node location from the registered one, assuming the constant local curvature, the results are shown on Fig. 3a and 3b. [caption id="attachment_3540" align="alignright" width="307" caption="Fig.3b Registration errors as distances from the registered grid nodes to the calculated from the neighbors, assuming constant curvature."] [9][/caption] The pattern discontinuity in the center is still present, but in the other areas the effects of the gradual variations are compensated. The checkerboard artifacts visible the Fig. 2a are hidden on Fig 3a, because the variations from the estimated (from the constant curvature) locations in these areas are opposite in direction, but same in length, so these artifacts just add to the average error value in the corners. The root mean square of these differences was used to compare different registration methods. We plan to make more measurements of the registration errors with the complete multi-channel cameras when the precise rotation machine will be built, currently I used just a single camera mounted on a tripod. Absolute mapping of the pattern grid with the laser pointers The pattern grid registration procedure described above resulted in arrays of the pixel coordinates of the pattern grid nodes,but there was no information about the absolute position of the measured pattern cells relative to the pattern as a whole. In some cases when the corner of the whole pattern is visible and the approximate camera orientation is known (so one of the 4 corners can be identified) it is possible to make absolute mapping, but we decided to supplement the pattern with the four laser diodes pointed to the known cells in the four quadrants of the pattern. With the lasers controlled by the software it was easy to detect location of the laser spots by subtracting a pair of images (with and without the laser spot). The precision of the laser spot location is not critical as long as those spots safely identify known white cells of the pattern grid. Absolute mapping of the pattern grid nodes is needed for both finding the extrinsic parameters of each sub-camera (position and orientation relative to the composite camera frame) and accounting for the non-ideal pattern grid when measuring the camera lens distortions. Composite camera and the camera rotation machine combined model Starting calibration of the composite camera I created a model to describe the location and orientation of each of the camera 26 sub-cameras relative to the composite camera frame, rotation of the composite camera in the calibration machine we are building (fixed horizontal rotation axis and moving camera vertical axis), deviations of the machine (distance between the axes, difference from the 90 degree angle between the axes, two angles to determine actual orientation of the horizontal rotation axis) and location of the machine reference point (point on the horizontal rotation axis closest to the camera rotation axis) relative to the frame attached to the target. Each of the sub-cameras is described by a set of 7 intrinsic parameters based on a radial distortion model (focal length, pixel coordinates of the lens axis and 4 coefficients describing distortion polynomial) and a pixel-mapped array to represent the difference from the radial distortion model. The pattern grid is represented as a two-dimensional array (U,V) of the 3-d points. For the first experiments where all the images were acquired from approximately the same point the grid was considered flat (z=0), but the provision was made to store the full 3-d data for each pattern grid node to account for non-flatness of the physical pattern. The composite camera is designed to capture all images at the same time and each of the result images is uniquely identified by a timestamp (common to all sub-cameras) and the sub-camera number. While only the sub-camera intrinsic parameters (7 with the selected radial distortion) and the 6 extrinsic ( 3 rotations and 3 translation) are sufficient to project each of the pattern grid points to the sensors I used a complete camera kinematic model to be able to impose constraints for the groups of images, such as that all the relative positions/orientations of the sub-cameras relative to the composite camera are common, all the parameters describing the calibration machine but the 2 variable rotations are also common, and these two variable angles are common for the images acquired from the different sub-cameras that share the same timestamp. Such model makes each image dependent on 15 extrinsic parameters and the conversion from 15 to 6 is common for each pattern node projection in the image. [caption id="attachment_3567" align="alignleft" width="486" caption="Fig.4 Images registered from the same camera position (9 of 15 shown)"] [10][/caption] The fitting of the parameters of the model to the series of pattern grid images is based on Levenberg-Marquardt algorithm (LMA) and the deviations to minimize are differences in X and Y directions between the calculated locations of the pattern nodes projections on the sensor and their registered locations. So each grid node on each registered image being processed constitutes two separate samples for the algorithm. This algorithm involves calculating Jacobian matrix of the partial derivatives. Most of the partial derivatives with respect to the minimal set of 7+6=13 parameters are different for each pattern node of each image, we do not need to calculate them for each of the 22 parameters because conversion from 22 to 13 (both values and partial derivatives) can be calculated once per image and then the Jacobian calculation for 22 parameters can be done by multiplying Jacobian for 13 parameters by the 22x13 matrix. The implementation of the LMA is designed to have sequences of programmable strategies, each specifying a subset of the grid images to process, set of parameters to adjust and if they are individual for each image, shared by all or by the groups of the images. Program can run strategies automatically or stop after each iteration step or one strategy series. [caption id="attachment_3542" align="alignright" width="253" caption="Fig. 5a Reprojection error of the pattern grid after fitting camera extrinsic parameters and the intrinsic ones assuming polynomial radial distortions, shown separately for X and Y directions. Full scale is +/- 2mm"] [11][/caption] The experiment used a set of 15 images with a single camera mounted on a tripod at 2.3m from the pattern and rotated around approximately vertical axis to five positions in the range of ±30° for three different tilts in the range of ±15°. Nine of the 15 such images are shown on Fig. 4. The actual acquired images were already converted to the grid arrays and stored as multi-slice TIFF images as described above, so each new run of the program did not require re-running the pattern registration process. In addition to the two slices shown on Fig. 1e and 1f (pixel X and pixel Y coordinates) those files had additional slices of integer values U and V representing absolute grid coordinates of the same pattern nodes available after the absolute mapping with laser pointers. Measuring the pattern geometry Radial distortion model provides rather accurate projection of the pattern grid nodes to the image sensor pixel coordinates so I expected the physical pattern distortions (non-flatness and stretching of the material) to be the next source of mismatch between the model and the registered image, we've already seen them in the registration process as illustrated on Fig.2 and 3. And after I ran the LMA on a set of 15 images and it converged on parameter vector, I mapped the reprojection errors (differences between calculated and measured pixel coordinates) to the grid coordinates. The nine images of Fig. 5a (animated GIF) and Fig. 5b (color) match the acquired images shown on Fig.4, i.e. top left (of nine) image shows the top left portion of the pattern. [caption id="attachment_3541" align="alignright" width="252" caption="Fig. 5b Reprojection pattern grid errors in pseudo-colors. Different colors indicate different directions, brightness - the absolute value of the difference."] [12][/caption] Each image frame on Fig. 5a and 5b represents the minimal straight rectangle that includes the full pattern, each sub-image covers different but overlapping part of the whole pattern. The rectangular physical pattern itself with the sides vertical and horizontal is represented here as tilted, because it is tilted in pattern UV coordinates as the pattern rows and columns are tilted by 5° to reduce aliasing artifacts. Animated images on Fig. 5a alternate between vertical and horizontal deviations of the measured grid node point coordinates and the ideally periodic ones used when the LMA was initially ran, color image Fig. 5b combines the vector components into a single image where direction is represented by the color and the absolute value - by the intensity. These partial images demonstrated a good match in the overlapping areas and the full set of images was acquired with a good overlapping coverage of the full pattern area. So the next step was to calculate the smooth masks to reduce the influence of the sensor-cropped grids and combine the individual images into the one covering the full pattern. The individual images multiplied by the weight function of the camera mapped to the pattern coordinates were added together and then per-pixel divided by the sum of those mapped weight functions. The result was close to the images shown of Fig. 6a-6c, but they contained slightly visible borders between the overlapping areas. [caption id="attachment_3544" align="alignright" width="177" caption="Fig. 6a Pattern grid correction. Combined from the individual measurements shown on Fig. 5a-5b. Full scale is +/- 2mm"] [13][/caption] Re-running LMA showed some improvement so I repeated the grid re-calculation followed by LMA several times until the decrement in the calculated root mean square of the residual vector fell below the specified threshold - it took about 10 iterations to achieve this. [caption id="attachment_3546" align="alignleft" width="310" caption="Fig. 6c Reprojection errors after fitting camera extrinsic parameters and intrinsic ones assuming polynomial radial distortions, combined from the individual measurements shown on Fig. 5a-5b. Alternating X and Y differences."] [14][/caption] Simultaneously with reducing the RMS of the residual vector this procedure made all the seams caused by overlapping of individual images to fade out and the pattern grid coordinate array converged to the one shown on Fig. 6a-6c. [caption id="attachment_3543" align="alignright" width="177" caption="Fig. 6b Pattern grid correction in pseudo-colors. Full scale is +/- 2mm"] [15][/caption] Fig. 6a and Fig. 6b represent the combined pattern grid corrections similarly to the Fig. 5a and Fig. 5b for the partial grid images - the first one with the alternating horizontal and vertical corrections, the second combines both directions as a pseudo-color image. Fig. 6c shows the alternating images of the pattern grid correction as 3d plots. Coordinate correction, applied to the pattern grid reduced the root mean square of the residual vector from 0.35 to 0.14 pixels and made possible to come back to the measuring the lens distortions. Measuring and compensating the residual lens distortion As the grid correction converged to an array and at the same time the radial distortion parameters adjusted by the LMA also reached their final values, I mapped the residual differences of the pattern grid node locations acquired from the set of images to the sensor coordinates, fixed relative to the lens. [caption id="attachment_3547" align="alignleft" width="308" caption="Fig. 7 Residual lens correction after the radial distortion model, alternating for X and Y directions"] [16][/caption] That allowed me to combine such data from multiple images and measure the difference between the radial lens distortion model used with the registered pattern. Similarly as I did when combining the grid corrections, the program calculated per-image weight functions and then combined the residuals from the 15 images. When I applied the additional lens correction to each image, I had to freeze the lens distortion coefficients used by the LMA as it could become unstable because the correction array could be calculated for any combination of radial distortion parameters. The averaged correction additionally reduced the RMS of the residual vector to 0.075 pixels. Repetition of the grid correction, LMA (with fixed radial distortion parameters) and calculation of the additional lens correction was stable for the set of the images, but the result did not improve much and the final RMS for the set of images reached 0.069 pixels for the set of 15 images. This final distortion correction applied "on top" of the radial distortion correction is shown in the last column of Fig. 8, the RMS indicated (0.14px) refers to the correction amount. The same final correction is also shown on Fig. 7 as alternating X and Y corrections, in pixels. [caption id="attachment_3566" align="alignright" width="447" caption="Fig. 8 Residual lens distortion, measured by combining individual images and compensating for the pattern grid errors. The columns correspond to different degrees of polynomials of the radial distortion model, rows show the same residual with the different scale"] [17][/caption] In addition to applying correction to the lens radial distortion model after all the camera parameters were determined for the best fit, I measured the required correction after the radial polynomial model with reduced number of terms. That helped me to find out how much these terms add to the precision of the model for the particular type of lens that we are using in Eyesis cameras, what is the optimal number of polynomial coefficients in our case. The results of this comparison are shown in columns of the Fig. 8 where rows show the same functions with different full scale - from 0.375 pixels in the top to 12 pixels in the bottom row. No radial distortion (pinhole camera model) required additional correction with RMS=3.44 pix, single term (Rd/Rs term proportional to radius) reduced RMS to 1.96 pix, two terms - to 1.05 pix and the third term (commonly used, i.e. in Hugin/Panotools [18] ) reduced RMS in our case four times - down to 0.25 pix. Adding one more term reduced the residual correction, but at this step the image almost lost the prominent so far radial structure, so just adding more terms to the radial polynomial would not help improve the approximation result. Measuring the 3D locations of the pattern grid nodes So far I used a two-dimensional model of the pattern grid even as I could see that the pattern on the wall was not flat - there is a rather thick plate that connects two halves of the pattern, same with the reinforcement near the top and wall molding near the floor. The calibration machine will rotate the camera around the center of it's optical head so most of the camera lenses (but the two specifically designed for distance measurements) will see the pattern with very little parallax and the 5-10mm non-flatness would not lead to significant errors (camera will be located 6 meters from the target pattern). [caption id="attachment_3698" align="alignright" width="158" caption="Fig.9a 3D Pattern grid correction (Full scale 2mm for X and Y, 8mm for Z corerctions)"] [19][/caption] On the other hand, as there is still a long way ahead before we'll be able to create 3d models of the nice scenery of the Southern Utah, I was tempted to do a simple 3-d reconstruction of the pattern. And we'll still need the precise 3-d model for calibration the 2 camera bottom lenses located 1 meter below the rest 24 ones. [caption id="attachment_3703" align="alignleft" width="281" caption="Fig. 9b. X and Y corrections to the pattern grid"] [20][/caption] For that purpose I added more captured images to the initial set, this time some were acquired from the tripod located some 0.8m to the left of the initial center location, and some - to the right, the total number of simultaneously processed images was 32. Next was to modify the grid correction procedure to work with 3 coordinates while each image provided only two. And I was looking to the robust method that can still work for the original set of images that were acquired approximately from the same point and it will not try to move the points far in the Z direction. The first simple modification of the 2-d target grid correction algorithm was to make each image to "vote" for the correction not in the target XY plane, but rather in the direction perpendicular to the line connecting the center of the lens with the grid point being corrected, minimal distance correction that for each camera is equivalent to the required one. Working with the set containing images from multiple points, the result correction will reduce the error in Z direction , but do it at much slower rate than calculated precisely. This method added just a few lines in the code and is still used as a fall-back, in the case of the small parallax of the images resulting in small determinant of the matrix that provided the direct calculation of the correction. That second method was minimizing the sum of squared distances from the corrected 3-d location to the lines connecting each lens location with the 2-d point on the target plane that each camera was voting for. [caption id="attachment_3702" align="alignright" width="274" caption="Fig. 9c Non-flatness of the pattern. Pattern dimensions are 3022mm by 2667mm."] [21][/caption] Both methods were followed by the normalization of the correction to reduce dependence between the effects of the correcting the grid node coordinates and moving the camera. Corrections in X and Y directions have their average values subtracted from each point, correction in Z direction is additionally compensated for tilt, equivalent to rotation of the whole pattern around vertical and horizontal axes. When the 3-d correction was applied, most of the pattern grid differences between the final grid points coordinates and the initial ideal pattern (with equally spaced nodes located on the Z=0 plane) were in the Z direction. That means that the plastic sheets on which the pattern is printed are more bended than stretched. This "redistribution" of the difference from the X and Y to Z direction is visible when comparing Fig. 6a-c and Fig. 9a-c. Large part of the residual errors in X and Y directions is caused by the mismatch of the otherwise uniform pattern halves - shift reveals itself as a step between the halves, rotation - as a different tilt of the halves on both X and Y differences graphs. It was not so easy to independently verify the measured shape of the pattern, we plan to do more measurements later, so I only measured a single profile in the center of the pattern. For that I used a laser beam parallel to the pattern surface and marked the spots on paper touching the pattern at different locations along the middle horizontal line. The precision of such quick test was about 1 mm and it did match the data from the cross section of Fig. 9c. One more experiment: camera location from the images While for the camera calibration we are mostly interested in measuring the mapping from the pixels acquired by the camera at known location to the lines lines in space, I made a separate reverse experiment to test the measurements. That was similar to a rather common task - finding the camera location in space from the available image. Contrary to usual setups where the camera distortion or even lens focal length are either completely unknown or known only approximately, and the shape of the captured objects has to be guessed, in our case the visible object (the pattern grid) was precisely measured and the camera lens was calibrated for distortions - first using the radial distortion model and then additional non-radial correction was applied. During previous measurements the model was constrained as much as possible, not only the camera intrinsic parameters and the fixed distances and angles in the kinematic model of the camera (including the tripod) were set to be common for all the images, but the location of the tripod as a whole was set not to change for the images belonging to the same group (center, right, left). Program calculated unknown parameters of the model as well as the coordinates of the pattern grid locations while these constrains were imposed. Table 1. Measured camera locations Center Right Left Number of Images 13 6 13 X position (mm) -22.55 854.00 -675.39 Y position (mm) 67.83 64.10 64.89 Z position (mm) 2322.57 2361.53 2448.79 X std. dev. (mm) 0.62 0.46 0.62 Y std. dev. (mm) 0.57 0.51 0.52 Z std. dev. (mm) 0.21 0.33 0.42 Full std. dev. (mm) 0.87 0.76 0.91 Then I reran the fitting program while "freezing" the camera intrinsic parameters and the pattern grid correction, but allowing LMA to adjust each of the 3 coordinates of the tripod position and 3 angles defining camera orientation (2 that were actually changed in the tripod head and additional camera roll) for each of the 32 images independently. That means that each (calibrated) camera location was determined by a single image. Ideally the results of these calculations should be that the tripod position would be exactly the same for all the images of the same group, but there is some discrepancy between these coordinates. The Table 1 summarizes these results, each column average and standard deviation are calculated from the corresponding subset of the images. "Full" standard deviation in the last row is calculated as a square root of the sum of squared deviations in each of the 3 orthogonal directions. [caption id="attachment_3780" align="alignright" width="369" caption="Fig. 10a Derivatives of the pixel coordinates with respect to yaw, pitch and distance (animated)"] [22][/caption] The uncertainty of the camera location along the Z axis is smaller than in X and Y directions, because camera movement along the axis causes more apparent changes in the registered image than when camera is moved perpendicular to the axis (and simultaneously rotating so the optical axis crosses the target plane at the same point. [caption id="attachment_3781" align="alignright" width="369" caption="Fig. 10b Derivatives of the pixel coordinates with respect to yaw, pitch and distance"] [23][/caption] This can be demonstrated if we consider partial derivatives of the pixel coordinates with respect to the two rotations of the camera around the fixed point on the target (that corresponds to X and Y movements) and movement to/from the camera (Z-axis). This derivatives for the center image are shown on the Fig. 10 a,b. The calculated average absolute value (rms) of the reprojection error for horizontal rotation (yaw) around the vertical axis on the target is 6.2 pix/degree, rotation around horizontal axis - 4.0 pix/degree (smaller as the camera is in landscape mode) and moving camera along the Z-axis would cause average error of 0.36pix/mm. Substituting distance to the camera (2322 mm) that results in the following average errors (per mm): X - 0.15 pix/mm, Y- 0.1 pix/mm and Z (unchanged) - 0.36 mm. Multiplying these values by the measured standard deviations in the Table 1. column for the center images result in X - 0.093 pix, Y - 0.057 pix and Z - 0.076 pix - registration precision in pixels needed to provide the measured variations in the camera coordinates. Results When we'll use the calibration machine to calibrate the cameras, both the location of the machine reference point and the coordinates of the pattern grid points will be already known - they can be measured from larger number of evenly overlapping images, and the distance from the camera to the target will be almost 3 times larger than now, so the angular errors caused by linear uncertainty will be proportionally smaller. According to the results of simultaneous fitting of the camera parameters and the target pattern geometrical properties, it will be possible to make angular measurements in the camera field of view with the precision of better than 0.2 pixels over the camera field of view - current constrained fitting of 32 images provided root mean square of combined X and Y mismatch on the sensor of 0.14 pixels (or 0.1 pixels in each direction). With the sensor 2.2 μm pixel period and 4.45 mm focal length, 1 pixel corresponds to 1.7 angular minutes, so 0.2 pixels correspond to 0.35' angle. Precision of measurements of the angles between the features in the real-life images will be limited by other factors - registration of the features that have to have enough pixels and high spatial frequency components. The precise calculation of the camera motion using the logged IMU data to compensate for the rolling shutter effect is also the project to be completed, but at least this stage - static pixel mapping of the individual camera is working and provides data to combine the mapping of the sub-cameras into the combined camera and still keep the mapping precision in sub-pixel range. We will work on minimizing the accumulation of the angular errors during "stitching" of the individual pixel maps (in each camera orientation the current pattern wills visible as 10% of the sensor area) by using the precise rotation data and possibly adding additional targets in some other directions - at shorter range but covering larger angle. All the software described in this post is written in Java as a plugin for ImageJ [24] and is available [25] on our Sourceforge project page under GNU GPLv3 license. [1] http://blog.elphel.com/2010/12/zoom-in-now-enhance-results/ [2] http://blogs.elphel.com/2010/11/zoom-in-now-enhance/ [3] http://blog.elphel.com/wp-content/uploads/2011/10/pattern-measurement.png [4] http://blog.elphel.com/wp-content/uploads/2011/10/curv-56A-rms0036-curvature-half.png [5] http://blog.elphel.com/wp-content/uploads/2011/10/Surface_Plot_of_curv-56A-rms0036-curvature-half.png [6] http://blog.elphel.com/wp-content/uploads/2011/10/curv-rms00317_curvature_half.png [7] http://blog.elphel.com/wp-content/uploads/2011/10/1317925285_579597-00.tiff-flat-quater.gif [8] http://blog.elphel.com/wp-content/uploads/2011/10/curv-56A-rms0036-errors.png [9] http://blog.elphel.com/wp-content/uploads/2011/10/Surface_Plot_of_curv-56A-rms0036.png [10] http://blog.elphel.com/wp-content/uploads/2011/10/nine-sm.jpeg [11] http://blog.elphel.com/wp-content/uploads/2011/10/grid-first-pass.gif [12] http://blog.elphel.com/wp-content/uploads/2011/10/grid-corr-first-pass-color.png [13] http://blog.elphel.com/wp-content/uploads/2011/10/grid-diff.gif [14] http://blog.elphel.com/wp-content/uploads/2011/10/Surface_Plot_of_a20-grid-difference-1pix-1mm.gif [15] http://blog.elphel.com/wp-content/uploads/2011/10/grid-diff-color.png [16] http://blog.elphel.com/wp-content/uploads/2011/10/Surface_Plot_of_Cumulative_chn_0_corrections.gif [17] http://blog.elphel.com/wp-content/uploads/2011/10/distortion-orders.png [18] http://hugin.sourceforge.net/docs/manual/Lens_correction_model.html [19] http://blog.elphel.com/wp-content/uploads/2011/10/grid44-minus-default.gif [20] http://blog.elphel.com/wp-content/uploads/2011/10/Surface_Plot_of_grid44-minus-default.gif [21] http://blog.elphel.com/wp-content/uploads/2011/10/Surface_Plot_of_grid44-minus-default-Z.png [22] http://blog.elphel.com/wp-content/uploads/2011/10/derivatives-5.gif [23] http://blog.elphel.com/wp-content/uploads/2011/10/derivatives-5.png [24] http://rsbweb.nih.gov/ij/ [25] http://elphel.git.sourceforge.net/git/gitweb.cgi?p=elphel/ImageJ-Elphel

Elphel new camera calibration facility

Wed, 06/05/2013 - 16:54

Fig.1. Elphel new calibration pattern

Elphel has moved to a new calibration facility in May 2013. The new office is designed with the calibration room being it’s most important space, expandable when needed to the size of the whole office with the use of wide garage door. Back wall in the new calibration room is covered with the large, 7m x 3m pattern, illuminated with bright fluorescent lights. The length of the room allows to position the calibration machine 7.5 meters away from the pattern. The long space and large pattern will allow to calibrate Eyesis4π positioned far enough from the pattern to be withing depth of field of its lenses focused for infinity, while still keeping wide angular size, preferred for accuracy of measurements.

We already hit the precision limits using the previous, smaller pattern 2.7m x 3.0m. While the software was designed to accommodate for the pattern where each of the nodes had to have individually corrected position (from the flat uniform grid), the process assumed that the 3d coordinates of the nodes do not change between measurements.

The main problem with the old pattern was that the material it was printed on was attached to the wall along the top edge but still had a freedom to slightly move perpendicular to the wall. We noticed that while combining measurements made at different time, as most of our cameras need to be calibrated at several “stations” – positions relative to the target (rotation around 2 axes is performed automatically). We ran calibration during night time to reduce variations caused by vibrations in the building, so next station measurements were performed at different dates. Modified software was able to deal with variations in Z (perpendicular to the surface) direction between station measurements (that actually did help in the overall adjustment of variables), but the shape of the target pattern could change if the temperature in the building was changing during measurements. The PVC material has high thermal expansion, and small expansion in the X,Y directions could cause much higher variations perpendicular when the target is attached to the wall with lower thermal coefficient in multiple points.

Fig. 2. Floor plan

Calibration Room

The new space is designed to accommodate various camera calibration procedures.

First of all we made the pattern as large as possible – it is 7,01m x 3.07m – we even raised the ceiling near the target.
The target itself is now printed on the film attached to the wall as a wallpaper, so there is no movement relative to the wall, and thermal expansion is defined by a lower coefficient of the drywall. We also provided the air channels inside the wall to make it possible to implement thermal stabilization of the wall.
The calibration room allows to move camera under test up to 7.5m away from the pattern, the room is separated from the rest of the facility with the wide “garage” door, so changing the lighting conditions outside of the room do not influence calibration.
Other rooms are designed in such a way that the camera can be moved up to 24 meters from the target (with the garage door open) and have unobstructed view of virtually the full pattern – that may be needed for the long focal length lenses.

Fig. 3. Pattern wall during construction

Preparing the wall for the target pattern

During construction of the new facility we were carefully watching the progress as our temporary space was located just on the next floor and we were mostly concerned about the quality of the target wall. Yes, software can accommodate for the non-flatness of the wall but it is better to start with the good “hardware” – to achieve subpixel precision the software averages correlation over rather large areas of the image (currently 64×64 pixels) so sharp variations will produce different measurements from different distances or viewing angles. When we first measured the wall flatness, we noticed large steps between the gypsum board panels, so the construction people promised to make it level 5 finish and flatten the surface. They put “mud” all over the wall, sanded it and that removed all of the sharp discontinuities on the target surface, but still leaving some smooth ones up to ±3mm as we measured later with the camera.

When the wall was made flat it had to be prepared for application of the self-adhesive vinyl film, so the wall finish will not make it bubble later. Ideally we wanted it to be able to withstand peeling off the film if we’ll have to do that. When we searched Internet about vinyl film application to the painted wall we found that most fresh paint needs some 60(!) days to cure before the film can be applied. So we decided to go with two-component epoxy paint that requires only one week before the film can be applied. When we inspected that epoxy painted wall (the paint was applied with the regular rollers) – it did not look flat. Well, it was just a roller-painted wall, so it had those small bumps and we were concerned that the vinyl film will conform to these bumps, and if it will – the position “noise” will be higher than what cameras can resolve. So we’ve got more epoxy paint and started a long process of wet-sanding and application of the new paint coats. We have compressed air (used to blow during optical and mechanical assembly) so we thought we’ll just spray the paint instead of rolling it to avoid those bumps that were left even after professional work. Unfortunately, without the needed experience in spray-painting, we adjusted pressure too high, and probably as much as a half of our first coat ended somewhere else, but not on the sprayed wall – the paint droplets were too small. Next coat was better, and in several days we had a wall that seemed to be covered with hard plastic laminate, not just painted.

Installing the pattern

Our next concern was – how to install the vinyl film? We wanted to have very good match between the individual panels, as it is not possible to have the target printed on a single piece, maximal width of which is just over 1.5m. We hesitated to order professional installation because for regular applications (like vehicle wraps) such sub-millimeter precision is not required. For the really seamless (compared to the precision of the calibration) we needed better than 0.1mm match, but it is possible to just mask out the grid nodes around the seams and disregard them during calibration data processing, so we planned to get to about 0.5mm match.

Fig.4. Pattern Z-deviations (perpendicular to the target plane)

Fig. 5. Pattern deviations in X,Y plane

Fig. 6. Pattern deviation from the "ideal" grid (horizontal profile)

We knew people are doing that but still it seemed very difficult to apply 1.5m wide by 3m long “stickers” without wrinkles and bubbles. Web search provided multiple recommendations, but the main thing was to use “wet” method that none of us new before. It involves spraying the wall (and the film on the adhesive side) with “application fluid” (basically water with small addition of soap and alcohol). When the sticky film is applied to the wet surface, the adhesive is temporarily inhibited and it is possible to reposition (slide) the film to achieve required match. Then the water is squeezed away with the squeegee tools, and if done properly, there should be no bubbles left.

Geometric properties of the pattern

The Z-deviations on Fig. 4 show the wall non-flatness, the gypsum panel borders are still visible (even with “level 5″ finish), the horizontal discontinuity near the top is where the wall was extended to accommodate increased ceiling height. Positive Z direction is away from the camera, so lighter areas are concave areas on the wall and darker are bumps extending out from the wall.

Fig.5. illustrates mismatch and stretching of the vinyl panels application. Red/green color difference corresponds to the horizontal shift, while blue/green – the vertical one.

Figure 6. contains a horizontal profile at the half-height and provides numerical values of the deviations. Diff. Error plot indicates areas around panel boundaries that should be avoided during reprojection errors minimization and measuring point spread functions (PSF) for aberration correction.

Illuminating the target pattern

We use the same pattern for different parts of the camera calibration. Correction of aberrations and distortions does not impose strict requirements on the illumination of the pattern, but we use the same images to measure (and compensate) lens vignetting and color variations of the camera sensitivity caused among other reasons by the multilayer infrared cutoff filter and angular variations of the pixel color sensitivity. This method works for low-frequency part of the flat field correction and does not deal with the pixel fixed-pattern noise that, if present should be corrected by other means.

Fig. 7. pattern brightness for station 2 view 0 (top channels)

Fig. 8. pattern brightness for station 2 view 0 (top), specular component

Fig. 9. pattern brightness for station 2 view 1 (bottom channels)

Fig. 10. pattern brightness for station 2 view 1 (bottom channels), specular

Acquiring thousands of images made by different channels of the camera and capturing the same target, it is possible to perform simultaneous relative photometric calibration of the pattern and the sensors, provided that each element of the pattern preserves the same brightness for each image where it is captured. This may be true when the target is observed from the same point, but when we calibrate Eyesis4π camera with 2 sensors attached far from the other ones, and these sensors travel significantly when capturing the target, this assumption does not hold. The same pattern element has different brightness depending on the lens position when the image is acquired. This is because even matte pattern material is not perfectly diffusive, there is some specular (reflective) component.

In the earlier setup we used photographic lamps with large umbrellas, but these umbrellas were still small when placed at a distance that they were out of the camera view. Specular component was still visible when the diffusive part was subtracted. When designing the new calibration target we decided to use bright linear fluorescent lamps along the floor and the ceiling and keep them spatially compact without any diffusers or umbrellas, we only used mirrors behind the lamps to effectively double the output. Such light source was expected to produce specular reflections on the target, but these reflections occupy just a small portion of the target surface, the rest of it is close to be pure diffusive. That allowed us to locate positions of the specular reflections for each camera station/viewpoint by subtracting the average (between all stations/viewpoints) pattern brightness from each individual station/view of the pattern and then masking out this areas of the pattern during flat-field calculations.

Images on Fig. 7-10 were made for camera station 2 – 3.3m from the target and 1.55m to the right of the target center, that caused lamp reflections to be shifted to the left. View 0 (Fig. 7-8) correspond to the camera head, which is the center of rotations. View 1 (Fig. 9-10) is captured by the camera 2 bottom sensors mounted 820 mm below the camera head, so they were moving significantly between the images – that caused visible curvature on the top lamps reflection.

Virtual tour of Elphel calibration facility

You may walk through our calibration facility using our WebGL viewer/editor. The images were captured with newly calibrated Eyesis4π camera, there is no 3-d parallax correction – these are just raw panoramas stitched for infinity and most close objects are out of depth-of-field of the lenses. Hope you’ll still enjoy this snapshot of the new facility were we plan to develop and precisely calibrate many new cameras.

Sensor+Lens Tool

Wed, 12/19/2012 - 14:39

There’s a number of online lens calculators already and this one is not conceptually different – the focus is on the current sensor we use and the main feature is visualization done in HTML canvas using jCanvas.

It might help to figure out what lens is needed for a particular application where certain parameters can be important, e.g.:

Field of view for a lens of the given format
Depth of field at a fixed distance and f-stop
Aperture size (f-stop) at which the resolution starts to degrade due to diffraction limiting

The tool covers:

Different sensor formats (also compared to the “full frame” format)
Circle of confusion formulae (affects hyperfocal distance and depth of field):
- 1px – for machine vision applications
- d/1730 & d/1000 – “Zeiss formula” for photography
Distance to in-focus plane
Lens focal length
Field of view
Diffraction limit for aperture size (calculated for red light of 690nm, Airy disk size equals to 1px)
Depth of field

Links

Sensor+Lens Tool

Heptaclops camera and the 393

Thu, 10/25/2012 - 13:56

“Temporary diversion” that lasted for three years

Last years we were working on the multi-sensor cameras and optical parts of the cameras. It all started as a temporary diversion from the development of the model 373 cameras that we planned to use instead of our current model 353 cameras based on the discontinued Axis CPU. The problem with the 373 design was that while the prototype was assembled and successfully tested (together with two new add-on boards) I did not like the bandwidth between the FPGA and the CPU – even as I used as many connection channels between them as possible. So while the Texas Instruments DaVinci processor was a significant upgrade to the camera CPU power, the camera design did not seem to me as being able to stay current for the next 3-5 years and being able to accommodate new emerging (not yet available) sensors with increased resolution and frame rate. This is why we decided to put that design on hold being ready to start the production if our the number of our stored Axis CPU would fall dangerously low. Meanwhile wait for the better CPU/FPGA integration options to appear and focus on the development of the other parts of the system that are really important.

Now that wait for the processor is nearly over and it seems to be just in time – we still have enough stock to be able to provide NC353 cameras until the replacement will be ready. I’ll get to this later in the post, and first tell where did we get during these 3 years.

Optical measurements, mechanical design, image processing and cameras calibration

Up until 2009 we did not really bother with the optics of the cameras we made – cameras have a standard CS-mount that can accommodate C- and CS-mount lenses, available from many suppliers. We provided the electronics and software, but it was up to our users to deal with the rest. Yes, we did offer cameras with color and monochrome sensors, with or without IR cutoff filters, stocked some basic varifocal lenses – but that was virtually all. When we started to develop panoramic cameras ourselves we quickly recognized that the lenses we need just do not exist. The C/CS-mount format lenses are too big to make a compact layout of the camera (it not only becomes big itself, but large distance between the lenses cause large parallax that makes panorama stitching more difficult). The smaller M12 mount lenses (also called “S-mount”, and “board lens”) are mostly designed for the small security cameras and being cost-sensitive are not usually designed for the top performance.

We also realized that putting together multiple individual cameras to cover a panorama is not enough. All camera lenses have best resolution in the center, while closer to the corners it degrades. In many, especially small lenses the corners are substantially darker due to vignetting. And while we got used to it making photographs – in many cases it was even be considered as a useful feature to focus on the object in the center and blur and fade out the periphery, in stitched panoramas it is a disaster, as the individual lenses peripheral areas will be mapped to the middle areas of the composite panorama image.

Not being the lens manufacturers ourselves we went the path of correcting the lens aberrations by software post-processing ( “Zoom in. Now… enhance.” and later posts) – that allowed us to effectively double number of lens “megapixels”. Later we used the same pattern we developed for aberration correction to precisely correct the lens distortions. This process of camera calibration for the spherical view camera is described in my previous blogs (such as Building and Calibrating Eyesis4π) – we started to do so for the precise panorama stitching but later worked on making it suitable for the stereo photogrammetry and 3d reconstruction.

So now we have what we believe is the highest performance camera of a kind – the one that we demonstrated at SIGGRAPH-2012. We also have now precise thermally-compensated sensor front end that can be used in other applications – in an individual camera or in multi-camera setups.

One such application is

Shallow depth of field and cinema cameras

For many years now Elphel was cooperating with a group of enthusiasts who tried to adapt our cameras to use for cinema applications – and that fits very well into our vision: take our cameras and use them as clay to form something you (not us) envision. But eventually they got tired of waiting for our next model 373 camera (that they needed to support higher frame rate and larger image sensor) so they decided to develop a new camera themselves.

One of the main camera features they (and others who are interested in the cinematographic applications) needed was the physically large sensor. Such sensors allow capturing images with “shallow” depth of field (DoF) and can be used to shoot video where some objects are in focus, while others (farther or closer) need to be blurred. With the single lens systems the scale of distances where you can use DoF depends on the physical size of the sensor and with the small sensor as we use (and those used in camera-phones) are approximately 5 times (linearly) smaller than the 35-mm film frame. So what you can achieve with 35mm camera in 5-10 meter range is only possible in the 1-2 meter range with the small 1/2.5″ (~7mm diagonal) sensor – so instead of the human actors you’ll have to make animation with dolls. There are even special optical adapters that use 35mm format lens to focus image on the diffusing screen (made of wax or even fast rotating disk to make diffusing grains smaller) and then transfer the image on that screen to the small format sensor of the inexpensive camcorder. But that system still had limited resolution and was loosing a lot of light, dramatically reducing the camera sensitivity.

Three-camera setup for controlled depth of field capturing

The DoF first came as the feature inherent to the physical camera, the process of capturing the three-dimensional world on a two-dimensional media (film or image sensor). But in the artist’s hands it became a tool to focus viewer attention on the intended objects and also to show the 3-d nature of the actual world. With the modern computer animation there are no physical cameras with the lenses involved, but the depth of field is still present (like in this Sintel gallery). That means that the “shallow” DoF can be synthesized when the 3d information about the scene is present, and such information can be captured by other means – not only by the large format sensor and then the result image is rendered with synthetic depth of field. In some cases even a stereo-camera setup (a pair of synchronized cameras) can be used. Such setup is generally sufficient, if the in-focus objects are in foreground and there is nothing closer to the camera that occludes the target. But if such system is used to capture say image of a human behind the tree branches, then a single horizontal branch can close view of the human eye to both camera lenses. So regardless of how you blur the foreground objects (tree branches in this case) you will not be able to reconstruct the sharp image of the human face – there is no information about the color of the eye completely missing on both camera images. Using more cameras in the setup helps to provide more information about the objects – in our last case the third camera shifted vertically from the first two will have the information about the eye that was missing on the images from the first two cameras.

Building the 3-d model of the scene from the multiple images is not an easy task. The precision of the depth measurements is much lower than measuring distances in the direction orthogonal to the line of view. And often the portions of the scene have no fine details and so there is nothing to match to find out the distance to that object. On the other hand, when the 3-d reconstruction is needed just for synthesizing DoF, the precision of the distance needed to simulate the DoF of a real lens is the same as you can get from the lenses separated by the large lens diameter. The areas that do not have details, where it is impossible to measure distance – that areas would look the same on the final image, even if you blur them with the wrong sigma (or not blur at all).

HTML5 demo

A section of the screenshot - click on the image to open the actual HTML5 demo page

We do not yet have a seven-camera setup or “heptaclops”, we used a smaller “triclops” configuration. When we had built and calibrated the new camera (using the target pattern data measured earlier with Eyesis) we looked at the way to demonstrate it. First the images were processed with the known calibration and each of the raw images was mapped to the common projection plane – each pixel with ~0.15 pix accuracy – this process compensates for the lenses distortions and mis-alignment of the individual sub-cameras. These images can be used as the input data for the 3-d reconstruction. We do not have finished 3-d processing software yet, Oleg Dzhimiev made a small HTML5 application that illustrates the information from the camera triplet.

This web application overlaps the triplet of the corrected images acquired simultaneously by the 3 sub-cameras and applies the transparencies to the two of them so the the visible superposition has equal weight of each image of the set. Then each image is shifted by the value of the disparity that matches the distance from the camera to the image plane – the amount of disparity is controlled by a slider or by rotating the mouse scroll wheel. The objects in or near the selected image plane from all three images coincide, while the objects closer or farther from the camera are shifted from each other. When the shift is small, it looks like a blur, but farther images look as they actually are – as individual ones. While these separate image spoil illusion of the out-of-focus blurring (but still looking more realistic than dual images in old rangefinder cameras), they illustrate the raw data. Using more parallel cameras would improve illusion of focusing on such fast demo and provide more data for the actual reconstruction, reduce ambiguity when finding the disparity (and so the distance) at each pixel. Additionally, combining the data from multiple individual sensors would increase signal-to-noise ratio of the result image and so the dynamic range even if used with the same exposure/gain settings. And it is possible to program some channels with different exposure and run the whole system in the HDR mode.

The same applcation can be useful with the 3-d processing too. Instead of the 3 images that are just aberration and distortion corrected originals acquired from the different sensors, we can generate multiple close views and feed them to the same program – just shifting multiple images (or videos) is much less computationally demanding as correct 3-d rendering of the scene with the selected image plane and DoF, so such application can be used as a preview for the artist to dynamically adjust those parameters (distance and DoF) before running the final rendering (when it is possible to add desired bokeh too).

Back to the Model NC373 camera status

Model NC393 CAD rendering with M12 (S-mount) lens and thermally compensated sensor front end

We decided to drop the idea of building the already designed and prototyped model NC373 camera. While the next camera will share some parts with the 373, the changes are too big to call it just a revision “C” of the 10373 system board, so it will be model NC393. The camera system board will have Xilinx Zynq that combines FPGA and a dual-core ARM processor on a same chip, so my main concern of the FPGA-CPU bandwidth is not applicable here.

When information about the new Xilinx device was announced, I thought it is a good candidate for the next camera design. In spring of the last year we had a Xilinx seminar in Salt Lake City, where I was told that these new devices will be supported by the zero-cost development software.

Model NC393 CAD rendering with a C/CS-mount lens

That feature is very important for us, because while the cost of the tools is not high for the manufacturer, it is higher than the cost of a camera. We strive to make our products highly customizable by the users, each camera contains the source code needed to compile the executables (including the FPGA code). Making our customers to pay high price to be able to modify even a single line of the FPGA code is not acceptable to us, so we use only those FPGA devices in our designs that are supported by the software that our users can download at zero cost. Of course ideally we would love to use free (FLOSS) development tools (like we use for the FPGA functional simulation), not just the zero cost ones, but in the real world it is not possible yet, so we develop and share our free (licensed under GNU GPLv3) code with the non-free closed-source tools.

The news that came from the Xilinx reps later last year were really disappointing – none of the Zynq devices (even the smallest one) will be supported by the zero-price software tools. And only this year it finally became official that 3 of the 4 devices is going to be supported and so we can use them. Xilinx did have some production delays, the availability schedule slipped to later dates, but I’m crossing my fingers that the needed part/package combination will actually be in production by the end of the Q1 2013.

Model NC393 CAD rendering with three M12 lenses and thermally compensated sensor front ends

While the NC393 design is far from being finished, some features are already settled and are likely to remain unchanged in the final product.

The camera will be compatible with both parallel output sensors (such as the Aptina MT9P001/MP9P031/MP9P006 that we use currently) and the multi-lane serial sensors (such as having MIPI). The connectors will not change and the sensors used with NC353 will fit directly to the NC393 camera
The camera system board is being designed for the multi-sensor operation. It will accommodate three sensors without the need to use multiplexer boards (like 10359 needed for NC353). Multiplexer boards will likely still be used in some cases, but the system board itself will have 3 identical sensor board connectors
Physical dimensions of the camera and the mounting holes location on the system board will remain the same as on the previous camera models
Camera will have a single GigE port as a main communication channel
One serial console port with internal USB converter, so a microUSB cable will be sufficient to use system console for the software development.
Firmware installation and update will be done by booting from the microSD accessible without opening of the camera. It will be possible to use the same card slot during normal operation for data storage.
512MB NAND flash as a main storage for firmware, boot source for camera normal operation.
1GB of the system memory made of the two 256×16 DDR3 chips.
512MB of dedicated video memory (not shared with the CPU) – one 256×16 DDR3 chip, same the one used for the system memory.
USB2 (host): One external micro-USB and 2 internal flex cable connectors with USB, additional 3.3VDC power, I2C and FPGA general purpose I/O compatible withe the add-on boards for the NC353
30-pin board-to-board connector with 12 differential /24 single-ended FPGA I/O for add-on boards.
a pair of 2.5mm audio connectors on the back panel for camera synchronization – from external trigger and/or from other cameras
2-port SATA controller based on the free (GNU/GPL) implementation. Camera will have eSATA/USB external connector (so capable of running external SATA device without additional power supply) and internal mSATA SSD that fits inside the camera.

Model NC393 CAD rendering with three M12 lenses for capturing panorama images

NC393 camera will have significantly higher performance than the 353 and it will inherit the openness and flexibility from its predecessors. Elphel does not take orders on the custom design, but rather we try to do our best in making sure our users can do the customization themselves. The same policy would remain the same for the NC393 too – we will offer some camera options and add-ons, and in most cases it will be up to the camera users to build the camera of their dream.

Elphel will use the new system board in the Eyesis cameras. It will allow us to make the overall design more compact by reducing number of boards inside, increase the network bandwidth as well as the SSD bandwidth, increase the frame rate. We also plan to increase the camera resolution by switching to the same format but smaller pixel sensor while reusing the same optical-mechanical design – that would be definitely too much for the current system that is limited to the currently used sensors.

And of course we will continue to build “small” cameras based on the new design – with universal C/CS-mount and with M12 one, including precisely calibrated fixed-lens systems. And as the camera is designed for the multi-sensor operations, we will offer several typical configurations for robotic (parallel sensors for stereo-vision) and panoramic applications, as shown on the images above.

All the camera hardware documentation (circuit diagrams, parts lists, PCB layout and mechanical CAD files) will be released under CERN OHL license when the design will be finished and we will start the actual production of the cameras (add-on documentation will be released when it will become available) . All the firmware and FPGA code will be traditionally released under GNU GPL and maintained at Sourceforge repositories.

Building and Calibrating Eyesis4π

Mon, 09/24/2012 - 17:22

This is a long overdue post describing our work on the Eyesis4π camera, an attempt to catch up with the developments of the last half of a year. The design of the camera started a year before that and I described the planned changes from the previous model in Eyesis4πi post. Oleg wrote about the assembly progress and since that post we did not post any updates.

Sensor front end challenges

Working on the first camera of this series we had to solve several technical problems – and that push us back behind our schedule. First problem was with the use of the UV-curing adhesive to fix the sensor relative to the lens. In the first Eyesis we incorporated some elements of the sensor adjustment into each SFE (sensor front end), in the current system we decided to follow a more traditional approach and adjust the sensor on a specialized device and then fix the position with the adhesive – that allowed us to make the SFE more compact and we hoped to simplify it too. In the new design I tried to reduce the thickness on the UV-curable adhesive and make the system self-compensating for the glue shrinkage during curing and thermal expansion of it when the camera is used. The solution used 3 pins in 3 holes with the glue between the pins and the walls of the holes, so expansion/contraction of the adhesive would not lead so significant movement of the pins. Unfortunately the illumination of the glue with the UV radiation proved to be insufficient (some shadow areas remained) and the UV LED were on the same side of the glue where it contacted the air, so the most illuminated areas suffered from the “oxygen inhibition”. We tried several small modifications but still could not achieve reliable and strong bonding we needed. So we decided to use just low-shrinkage epoxy instead of the UV glue the first camera and leave more radical redesign for later time. With epoxy we could make only 2 SFE in 24 hours, because the curing took much longer than the UV glue and we could not use fast-setting epoxy as the adjustment took some time. That method was slow but it worked. Worked until we decided to measure the temperature dependence of the focusing and realized that just maintaining the SFE “in focus” over the intended temperature range is not sufficient for our application where we compensate for the lens aberrations with post-processing. The measured temperature coefficient was about 0.2μm/°C – that corresponds to 10 mm of the expanding aluminum – material used in most of the SFE.

Thermally compensated SFE design

Section of the SFE used in Eyesis4π

We could not think of any quick fix to that problem so we decided to go through the complete redesign of the sensor front ends used in Eyesis4π cameras, add thermal compensation and improve bonding process. Some elements of the SFE are made of invar – nearly zero expansion material for the thermal compensation, the bonding is spit into two separate stages – fast UV bonding and final using low-shrink epoxy. Additionally we modified the 10338D sensor front end PCB (the new version has revision “E”) to include the temperature sensor. Luckily for us we just had to replace a single chip – instead of the serial EEPROM the new board uses a combination of the EEPROM and a temperature sensor in the same size package and pinout (such chips are used in computer memory modules to store module parameters and monitor temperature). The new board simplifies temperature dependence measurements of each SFE during manufacturing, it also makes possible to do perform additional thermal correction of the acquired images – the SFE temperature during acquisition is embedded in the Exif header of each of them.

The 0353-07-25 SFE has two major parts – the base with the attached lens and the movable (during adjustment) plate to which the sensor PCB is attached. These two parts are connected with the 3 invar rods, each being press-in (and then flared) in the base. Only the very bottom part of the rod is press-fit, most of it is loose so the thermal expansion of the aluminum base is isolated from the rod. The base has 3 arms that are partially cut through to allow some bending, these arms support the invar rods laterally while allowing the axial movement caused by the thermal expansion. The top of each invar has aluminum cap pressed on and flared, these caps fit (with the sufficient clearance to guarantee co-contact during adjustment process) inside the holes in the sensor plate and are later bonded with the epoxy compound. Each of the 3 arms that provide lateral support of the invar rods additionally have 3 through holes that are temporarily plugged at the bottoms with the transparent adhesive tape to hold UV-curable adhesive. The sensor plate has 3 thin-wall stainless steel tubes pressed in it, these tubes are immersed in the adhesive and bonded to the base arms when irradiated with UV from the bottom during curing. The SFE is mounted in the adjustment machine with the lens pointed down, the mirror mounted at 45 degree reflects the target pattern located on the vertical wall. The same mirror reflects the UV radiation during curing process after the adjustment is finished. The 2.8mm invar spacer ring (for expansion it is in-series with the rods) is designed to slightly over-compensate the thermal expansion of the aluminum parts, so it can be made of different material (or a combination of 2 washers made of different materials) to fine-tune the overall expansion. This design allowed to reduce the thermal variance of the distance between the sensor and the focal plane of the lens by nearly an order of magnitude – the measured value falls in ±0.03 μm/°C range.

SFE compensated for the purpose of the aberration correction that maintains the same position of the lens focal plane relative to the sensor surface still has some magnification variations caused by the sensor expansion itself among other factors. It is not large – until we upgrade camera to the higher resolution sensors the change for 10°C is only 0.08 pixels for the diagonal corners of the image, this effect can be easily compensated when the temperature during acquisition is known.

Camera calibration machine

Goniometer with Eyesis4π camera

Camera calibration involves the following procedures:

measuring the point spread function (PSF) for each area of the field of view of each sensor to be able to compensate for the aberration during post-processing of the acquired images
measuring distortions of each lens and precise orientation and position of each lens in the camera assembly so the result images have the pixels precisely mapped to the lines in space
measuring the vignetting of each lens including variations of color reproduction over the area of each sensor
logging the inertial measurement unit (IMU) data

All the optical measurements (first three) are made with the same target pattern described in the earlier post. When performing the distortion measurements the camera can be located rather close to the pattern, but for the aberration measurement and correction it should be within the depth-of-field range from infinity – distance at which the camera will operate. In our case it is 6 meters. With the individual sub-camera FOV of 45°x60° the target pattern would have to be 5m (horizontal) by 7m (vertical) to fill the sensor completely. As it is not easy to make and use such large target we developed software to combine PSF data from multiple overlapping images of a smaller pattern – we used 3022mm by 2667mm that fits on the wall in our office.

Calibration pattern

When calibrating the earlier Eyesis model that had just 9 sensors we manually rotated the camera on a photographic tripod and were making at least 12 shots for each sensor. For the Eyesis4π with full sphere FOV and with the long tube body that can not be detached during calibration (it has essential electronic boards and the two bottom sensors on it) the regular tripod would not work. So we had built a special device that allows rotation of the camera around to axes – horizontal (it goes approximately through the center of the camera optical head) and the vertical axis of the camera. As the camera is capable to view at nadir (along the tube body) the camera is rotating in the polyurethane rollers that do not block the view of the target along the tube.

When the PSF are calculated during post-processing it does not matter what part of the pattern is visible – the ideal pattern is locally distorted for the best fit with the acquired images and then used in deconvolution to calculate the aberration correction kernels, minor geometric errors in the pattern and non-flatness of the pattern surface are not critical. But the same is not true when we perform the distortion measurements and precise pixel mapping – in that case the stretching of the pattern panels, non flatness would cause significant errors. In this case the pattern is treated as a 3-d mesh of the pattern cells with arbitrary coordinates of each of the nodes, these coordinates are determined during bundle adjustment with the camera parameters. The post-processing in this case should not just fit ideal pattern to the measured images, but have an absolute match (same cell to the same cell) between the wall pattern and the acquired images.

There are several methods to achieve such matching. One is to add special marks to the pattern or just some non-periodic elements that would allow unambiguously determine what part of the whole pattern is visible. That would work for the purpose of the PSF measurement – if the pattern marks are recognized they can be included in the simulated pattern being used for de-convolution. We used a different approach – projecting spots by the 4 red diode lasers to the white pattern cells at some distance from the corners.These lasers are under software control so multiple images with different state of the lasers are recorded and used for absolute matching of the actual and acquired pattern, the final image is made with all lasers off, so pattern is not influenced by them.

The distortion calibration for individual sensors is described in an earlier post – Subpixel Registration and Distortion Measurement – it uses Levenberg-Marquardt algorithm (LMA) to simultaneously fit the whole camera orientation/position as well as individual lens/sensor parameters. The calibration machine allows acquisition of multiple sets of 26 simultaneous images, for the full calibration we record about 450 sets to have good overlap – each area of each sensor has the target visible on at least 4 images, after filtering out images that did not capture the view of the target pattern there are about 1500 images to process. It is essential that while the overlap between different sensors FOV is small (under 10%) the same target pattern is visible by multiple sensors on many image sets images, this allows to determine mutual location/orientation of the sub-cameras and finally find out the coordinates of each lens in the camera coordinate system.

Before the image data is processed farther, these images are converted to arrays of pattern grid pixel coordinates using absolute grid cell numbers if laser pointers were detected or just relative if the pointers are not available. Images without laser pointers data are still useful – they are processed when the program has enough information (from another images) to predict where the pattern nodes are expected.

The calibration measurement takes about 10 hours – the laser pointers are detected from 6 images (to increase signal-to-noise ratio) and those images are discarded, only a single images with laser pointer metadata is preserved, this processing accounts for the most of these 10 hours procedure. We perform it overnight to reduce requirements to completely block the daylight and avoid disturbance from shaking the floor. And still the processing discovers small number of images that do not fit with others (usually by under 0.3 pixels) – that is most likely caused by semi-trucks going over the speed bump right by our building. Luckily such disturbances are present on very few images and it is easier to use software to detect and remove them than to provide a complete vibration-free calibration environment.

Parameters that are determined during fitting with LMA include:

position and orientation of the calibration machine relative to the target,
distance and angle between the two camera rotation axes,
locations and orientations of the individual lenses relative to the camera coordinate system
lens (distortion) parameters for each channel – focal length, lens center coordinates, radial distortion polynomial coefficients and
the two rotation angles of the calibration machine

All the parameters but the last ones (the two rotation angles) are assumed to be the same during the calibration process, the last ones are individual for each calibration set. Overall there are sup to 1500 of the simultaneously optimized variables using five to six millions data points of the reprojection error – differences (measured in pixels) between the pattern grid nodes on the images and the ones calculated from the actual target nodes coordinates and the camera model. When the algorithm converges to a set of parameters, we calibrate the target pattern itself. this is needed because the calibration pattern is printed on the material that can stretch and is not perfectly flat. Target calibration involves measuring and recording 3d coordinates of each cell, this is done by multiple iterations of referencing reprojection errors from multiple images to the individual pattern cells, calculating and applying those corrections and then repeating the LMA. After several iterations the root mean square (RMS) of the reprojection error reaches 0.3-0.5 pixels. At this stage the lens focal length, center and the radial distortion coefficients (fifth-degree polynomial) are frozen and the program encodes the residual differences as an array of X and Y corrections over the area of each sensor. We also repeat this procedure several times interleaving it with LMA that excludes the “frozen” lens parameters. This additionally reduces the RMS error down to 0.07-0.09 pixels.

Flat-field data for each sensor is measured to compensate for the lens vignetting and minor color variations caused by the sensor mosaic filter, it does not include individual pixel differences. Such data is measured with the same calibration pattern as aberrations and distortions. With the camera rotation steps we use the pattern is visible in each of the sensors in some 30-40 individual shots, each centered at different areas of the target. Assuming constant illumination intensity between measurements this allows to calibrate the relative illumination (and color variations) of the target cells and then use this data (averaged over all sensors) to determine each sub-camera sensitivity over the FOV.

Results of the camera calibration are stored separately for each of the 26 individual sub-cameras as multi-layer TIFF files with the text metadata that includes parameter values and description. These files are later used during raw image correction for precise pixel mapping and flat field correction. These files include:

Lens residual distortion, x-coordinate

short text description of each parameter
sub-camera (channel) number
position and orientation in the camera coordinate system
optical parameters
- Focal length
- Coordinates of the lens axis
- Radial distortion coefficients
and the following six 2-dimensional arrays stored as image layers (1/4 resolution of the sensor):
- Residual horizontal (X) correction in pixels (shown on the illustration
- Residual vertical (Y) correction in pixels
- Image mask
- Red color channel sensitivity (divide raw picture by these values for correction)
- Green color channel sensitivity
- Blue color channel sensitivity

FOV of sub-cameras in Eyesis4π, colors show relative time of the pixel acquisition (from red to blue). Same colors designate simultaneous capture.

Image mask is used to specify which parts of the sensor provide useful data, sensors covering areas around zenith and nadir acquire only triangular segments of the full sensor rectangular pixel array as shown on the picture to the left.

This earlier article explains why using only 50% of the area of those sensors is not a waste but helps to avoid stitching problems caused by fast movement of the camera visible on some high-resolution footage from car-mounted panoramic cameras that use sensors with electronic rolling shutter similar to Eyesis.

Rolling shutter can still cause image distortions in Eyesis but such design guarantees that there will be no duplication or even worse – gaps in the areas where images from different sensors are merged together. When the imagery is used for just rendering panoramas, those residual distortions are not visible (unless the camera was shaken really violently during image capturing). If the same image sets are intended for the photogrammetric applications the rolling shutter effect has to be dealt with to keep the total error in subpixel range, comparable with that of the static camera calibration. Such correction relies on measuring the camera egomotion with the embedded inertial measurement unit and applying camera position/orientation at the moment when each pixel was acquired to the static pixel mapping.

The last chance to see us at SIGGRAPH’12

Thu, 08/09/2012 - 03:25

Thanks to everyone who had visited us, learned about Eyesis4Pi and suggested some new applications. We hope you have enjoyed our discussions as much as we did.
We are glad to see so much interest in the Eyesis4π panoramic applications we have demonstrated and we continue to look for collaboration in 3D reconstruction based on our camera calibrated for photogrammetry.

More images from the show:

Elphel at SIGGRAPH 2012

Thu, 07/19/2012 - 15:33

Tuesday August 7th – Thursday August 9th Los Angeles Convention Center, Main Hall , Booth 1058

Elphel will present Eyesis4Pi – high resolution full sphere stereophotogrammetric camera at SIGGRAPH 2012, together with it’s calibration machine. We will demonstrate full calibration process to compensate for optical aberrations, allowing to preserve full sensor resolution over the camera FOV, and distortions – for precise pixel-mapping for photogrammetry and 3D reconstruction.

All Elphel camera users are welcome, current and prospective, as well as parties interested in Eyesis4Pi. Here (booth 1058 – see plan) you can talk to the camera developers, see the calibration process and touch the actual working hardware. There is a number of passes available for exhibition only. Please contact Olga Filippova if you would like to receive one.

Run ImageJ plugins from the command line in Ubuntu

Wed, 02/22/2012 - 20:11

1. Get X Virtual FrameBuffer
sudo apt-get install xvfb

2. Launch ImageJ (“cd” to the ij.jar directory):
Xvfb :15 &
DISPLAY=:15 java -Xmx12288m -jar ij.jar -run "TestIJ Plugin"

Comments:

TestIJ Plugin is the name of the compiled plugin in the ImageJ menu. No need to specify a subfolder.
:15 is an example.

Links that helped:

HomeSide 720° – A helmet mounted panoramic camera

Wed, 02/22/2012 - 11:21

Seeing the impressive images of the Elphel-Eyesis 4pi camera I thought it’s time to tell you about the HomeSide 720°. Like the Eyesis its purpose is to capture panorama frames with a framerate of 5fps. The major difference is that the HomeSide 720° is mounted on a helmet. To have an acceptable weight it consists of only two instead of eight Elphel 353 delivering one forth of the resolution the 4pi does. Thus the camera is able to record 30MPix frames before stitching. Additionally it’s reconfigureable to enable HDR panorama frames.

More interesting probably is the purpose it was built for. We created the assembly for indoor virtual tours. After several drawbacks we finally have an approach which works very well. We do auto leveling, auto stabilization and path extraction by image analysis only. Furthermore we recognize crossing points where the user can decide where to go when the tour is shown in the player.

This is not so easy since we neither have GPS nor IMU data. Nevertheless its possible.

All this information goes into our new webplayer which reassembles the images to a virtual tour.

Have a look at the HomeSide 720° Virtual Tour
Click into the player and use the cursor keys to navigate. You may also click and drag to change the point of view. This tour was recorded with 10MPix i.e. one Elphel 353 with two sensors.

Important: The pi symbols shows a rendered tour, not recorded by the camera

At the moment we are improving the image quality. We are also looking for a partner to drive the development even faster to create stunning indoor virtual tours.

Introducing the River View Web Player & Other News from River Studies

Fri, 02/10/2012 - 06:27

It has been a long while since my last blog entry in regards to river view panoramas. In the meantime the recording setup runs basically stable (putting aside minor problems with loose connectors) even under rough conditions (see also the gallery “Making Of” at the end of this post).

I just came back from artist-in-residency stays in Varanasi/Benares and Guwahati in India, that enabled me to have a few extensive recording sessions on various vessels like house boats, motor and rowing boats on Ganges River – for one the most sacred river to Hindus and probably most worshipped river on the planet, next to being one of the most polluted rivers of the world – and Brahmaputra River in Assam.

Many thanks go to Kriti Gallery in Varanasi and the Periferry project in Guwahati for hosting me and helping me to get onboard.

Web Player

Meanwhile I also just finished the “River View Web Player” as a public online Beta Version. It runs on openlayers and geodjango and features the growing archive and collection of river views (up to now including views of Ganges, Brahmaputra, Danube and Nile River). There are still plenty of things to polish up, some image material to be retouched and/or uploaded, but see yourself here:

River View Web Player

Using Elphel

I am using an Elphel 353 equipped with 8 or 16mm movie lenses from the seventies to acquire this imagery. The camera delivers a Window-of-Interest video stream of 2592 x 48 pixel size with a variable frame rate between 1 and 300fps (changed on the fly according to the speed of the vessel and distance to the object of focus. That is done manually and under visual control: similar to the auto-focus mechanism, it is also a question of choice and decision and therefore not easily automatable).
The video stream is then processed and saved by custom software.

I am not using Elphel’s internal linescan/photofinish for now, since it still behaves pretty unstable and unpredictable when changing parameters like TRIGGER, VIRT_KEEP and VIRT_HEIGHT on the fly. Also, having a line per frame allows me for a more fine-graded control and preview as long as enough frames are delivered.

Elphel’s internal linescan mode, however, works quite well and stable if it runs as fast as it goes or once you have fixed a setup – as some testing on Austrian Highways prove (find more here):

Elphel 353 in Linescan mode on Austrian Highway A1 - approx. 120km/h, 2000 lps

Sources

The source code of both, my recording software malisca and the web app and player, are open on my github repository.

Making Of / Gallery

A brief guided tour to Ganges and Brahmaputra river recordings in pictures:

On Ganges

Elphel on Ganges River

river recording setup: Thinkpad, Elphel 353, battery pack and USB-GPS-receiver

another shot of recording setup (including an improvised tent)

.. staring into the lens ..

.. obstacles (pontoon bridge) ..

lunch break on Ganges River

Elphel on Brahmaputra River

Not yet, but I would love this setup to be powered by solar energy ....

Secondary menu

Wiki Recent Changes

Elphel GIT logs

Elphel CVS logs

Who's online