JP4
Example file
← Older revision Revision as of 01:00, 10 February 2018 (2 intermediate revisions by the same user not shown)Line 84: Line 84: ==<font color="blue">JP4</font>== ==<font color="blue">JP4</font>== +===Example file=== + Download [https://community.elphel.com/pictures/test.jp4 sample] {| {| |[[File:Jp4 example rotated asjpeg.jpeg|200px|thumb|JP4 opened as JPEG, rotated 90°]] |[[File:Jp4 example rotated asjpeg.jpeg|200px|thumb|JP4 opened as JPEG, rotated 90°]] Line 89: Line 91: |} |} * If opened as a normal image the macro blocks will be displayed as 8x32. * If opened as a normal image the macro blocks will be displayed as 8x32. + ===Online=== ===Online=== * [https://community.elphel.com/jp4viewer/?width=800&quality=1 Online JP4 viewer] * [https://community.elphel.com/jp4viewer/?width=800&quality=1 Online JP4 viewer] Line 367: Line 370: Example files: Example files: −* [http://community.elphel.com/files/jp4/example_JP4.jpeg Example JP4]+* [http://community.elphel.com/files/jp4/example_JP46.jpeg Example JP46] * [http://community.elphel.com/files/jp4/example_flipped.dng Example DNG] * [http://community.elphel.com/files/jp4/example_flipped.dng Example DNG] * [http://community.elphel.com/files/jp4/example_converted.jpg Example JPG (converted)] * [http://community.elphel.com/files/jp4/example_converted.jpg Example JPG (converted)] Oleg02/09/18 [linux-elphel][rocko] by Oleg Dzhimiev: filled table for pages
filled table for pages
02/09/18 [x393][CMV300] by Oleg Dzhimiev: frequency and voltage for 14MPx
frequency and voltage for 14MPx
02/09/18 [x393][CMV300] by AndreyFilippov: Merge branch 'master' of git.elphel.com:Elphel/x393
Merge branch 'master' of git.elphel.com:Elphel/x393
02/09/18 [x393][CMV300] by AndreyFilippov: generated hispi bitstream with 15.3
generated hispi bitstream with 15.3
02/09/18 [x393][CMV300] by Oleg Dzhimiev: init for 14MPx
init for 14MPx
02/08/18 [linux-elphel][rocko] by Oleg Dzhimiev: initial changes to dynamic page registering and sensors with 16 bit register addresses
initial changes to dynamic page registering and sensors with 16 bit register addresses
02/07/18 [x393][CMV300] by AndreyFilippov: Increased latency in lens_flat393 to fix timing for hispi, generated parallel 039300f4 and hispi 039300f9 bitstreams
Increased latency in lens_flat393 to fix timing for hispi, generated parallel 039300f4 and hispi 039300f9 bitstreams
02/06/18 [x393][CMV300] by AndreyFilippov: Updated bitstream, version 0x039300f3
Updated bitstream, version 0x039300f3
02/06/18 [x393][CMV300] by AndreyFilippov: Merge branch 'master' of git@git.elphel.com:Elphel/x393.git
Merge branch 'master' of git@git.elphel.com:Elphel/x393.git
02/06/18 [x393][CMV300] by AndreyFilippov: Added reporting of the memory channel last transferred frame in a buffer number
Added reporting of the memory channel last transferred frame in a buffer number
High Resolution Multi-Vew Stereo: Tile Processor and Convolutional Neural Network
Figure 1. Multi-board setup for the TP+CNN prototype
This article describes our next steps that will continue the year-long research on high resolution multi-view stereo for long distance ranging and 3D reconstruction. We plan to fuse the methods of high resolution images calibration and processing, already emulated functionality of the Tile Processor (TP), developed RTL code for its implementation and the Convolutional Neural Network (CNN). Compared to the CNN alone this approach promises over a hundred times reduction in the number of input features without sacrificing universality of the end-to-end processing. The TP part of the system is responsible for the high resolution aspects of the image acquisition (such as optical aberrations correction and image rectification), preserves deep sub-pixel super-resolution using efficient implementation of the 2-D linear transforms. Tile processor is independent of any training, only a few hyperparameters define its operation, all the application-specific processing and “decision making” is delegated to the CNN.
Machine Learning for 3D Scene ReconstructionMachine learning is an active development area, and its applications to the 3D scene reconstruction stimulated by the autonomous vehicles including self-driving cars is no exception. Use of the CNNs to extract surfaces from the random-dot stereograms was published as early as 1992[1]. Most of the modern researches use standard image sets: Middlebury stereo data set[2] for high resolution near objects and KITTI[3] for the longer range applications. KITTI images are acquired from a moving car, they have attached ground truth data captured by the LIDAR. This image set uses binocular pairs and has relatively low resolution (1.4 MPix) compared to the modern image sensors, and still most of the CNN architectures require from seconds to thousands of seconds even when implemented with GPU devices and so are not yet suitable for the real-time applications.
Most of the CNNs[4] input raw pixel data and perform unary feature extraction in the parallel (one for each image in a stereo set) subnets, merge features and perform additional processing of the resulting 3d data. This is a so-called “siamese” network architecture that benefits from sharing parameters between the identical subnetworks. It is common to put most resources to the unary part of the processing resulting in truncating of the common stage of the processing that can consist of just a single layer(Fast Architecture in [4]). Efficient implementation in [5] limits CNN processing to just generate DSI and use traditional methods of DSI enhancement such as semi-global matching [6], other architectures split network after exchanging features and generate depth maps for each of the stereo images individually[7].
Figure 2. 2D MDCT basis functions (¼ of all MCLT ones for N=8)
Convolutional Neural Networks and the Frequency Domain ProcessingEarly layers of the various CNNs (and the eye retina too) are very general and even remind the basis functions (Figure 2) of the two-dimensional Fourier (DFT), cosine/sine (DCT and DST) and wavelet (DWT) transforms so it is no surprise that there are works that explore combinations of such transforms and the neural networks. Some of them [8, 9] exploit energy concentration property of these transforms that makes possible popular image compression such as JPEG. Others [10, 11] evaluate efficiency of the available Fast Fourier Transform implementations to speed-up convolutions by converting image data to the frequency domain and then applying the pointwise multiplication according to the convolution-multiplication property. Improvement is modest, as the frequency domain calculations are most efficient for the large windows, while most modern CNNs use small ones, such as 3×3, where Winograd algorithm is more efficient.
Tile Processor and the High Resolution Multi-View CameraMulti-view high resolution cameras present a special case where frequency domain processing results may lead to the reduction of the CNN input features by two orders of magnitude compared to raw pixel input. As described in the earlier post Tile Processor is using efficient Modulated Complex Lapped Transform (MCLT) conversion of the Bayer mosaic (color) high resolution image data to the frequency domain, simultaneously providing subpixel resolution shift for image rectification. Frequency domain processing includes space-variant optical aberration correction (required for the high resolution small format image sensors), phase correlation for image pairs and textures processing if it is required in addition to the distance (disparity) measurement. After channels/pairs merging the frequency domain data is converted back to the pixel domain where the tile correlation maximums are located and measured. Each overlapping tile is 16×16 pixels, the tile period (corresponding to stride in CNN terms) is 8×8 pixels. In the case when only the disparity value and confidence are output from each tile it makes 2 features. If the raw pixels were input, for 4 channels it would be 4*8*8 = 256 features or 128 times more. It may be useful to increase the number of features and supplement average disparity for all 4 pairs of quad camera with separate and horizontal and vertical pairs to improve foreground-background separation, in that case the feature reduction would still be over 40.
Conversion of the raw pixels to the Disparity Space Image (DSI) involves significant reduction of the (X,Y) resolution, when using four of the 18 MPix (4912×3684) imagers the DSI resolution is just 614×460, – isn’t it a waste of the sensor resolution? No, it is not:
- a deep sub-pixel resolution for disparity measurement needed for long-distance ranging requires matching of the large image areas anyway
- most of the image area for the most real-world images corresponds to smooth 3-d surfaces where assumption of a common disparity value for a tile is reasonable.
- the initial image resolution is preserved by the TP when source images are converted to the textures (simultaneously improving quality as the data from 4 rectified images is averaged)
- pixel-accurate distance map may be restored by extra processing the pixel data for selected tiles where depth discontinuity is detected, then assigning each pixel to one of the available surfaces.
Significant (42..128) reduction of the input features is not the only advantage of the TP+CNN combination over the CNN alone. Being “convolutional” CNNs depend on translation symmetry, the groups of related pixels are treated the same way regardless of their localization in the image. That is only an approximation, especially when dealing with the high resolution images and extracting subpixel disparity values. This divergence from the strict convolution model is caused by the optical aberrations and distortions and requires use of the space-variant convolution instead, or performing complete aberration correction and image rectification before the images are fed to the network. Otherwise both the complexity of the network and amount of training data would increase dramatically. Image rectification with pixel (or slightly better) precision is a common task in stereo processing. It involves interpolation and re-sampling of the pixel data, the process that leads to the phase noise introduction, especially harmful when deep super-resolution of the matched images is required. Tile processor implementation combines multiple operations (fractional pixel image shifts, optical aberrations correction, phase correlation of the matched pairs), TP avoids image re-sampling from the sensor pixel grid by replacing it with the phase rotation in the frequency domain.
Final step that reduces the number of features that are sent from the TP to the CNN is extraction of the disparity value by calculation of the argmax of the phase correlation data. This function has to be calculated with subpixel resolution for the data defined on the integer pixel grid. Certain biases are possible, and the TP implementation offers trade-off between speed and accuracy. The result disparity value is a sum of the pre-applied disparity (implemented as a phase rotation in the frequency domain on top of the integer pixel shift) and the argmax value (correlation maximum offset from zero). When higher accuracy is required, a second iteration may be performed by applying the full disparity from the first iteration, then the residual argmax offset will be close to zero and less subject to bias.
System Performance Estimation and the Prototype SetupOptimal system for the real-time high resolution 3-D scene reconstruction and ranging would require development of the application-specific SoC. If used with a set of four 18 MPix image sensors (such as ON Semiconductor AR1820HS) and a single ×16 1600 MHz DDR4 memory device, the 16 nm technology process, the TP subsystem will be capable of 10 Hz operation covering the full 4912×3684 frames reserving half of the memory bandwidth for other then TP operations.
We plan to emulate such system using available NC393 camera electronic and optical-mechanical components, including multiple 10393 system boards based on Xilinx Zynq 7030 SoC. Each such board has a GigE port and four identical sensor ports routed directly to the FPGA I/O pads allowing flexible assignment of pin functions. Typical applications include up to 8 differential LVDS pairs, clock pair, I²C and clock input. The same connectors can be used for high-speed communication between the 10393 boards. Partitioning the system into multiple boards will allow to fit the required TP functionality into smaller FPGAs, then send the result features (614×460×6) over the GigE to a workstation with GPU for the experiments with different CNN implementations. The system bandwidth will be lower than that of the application-specific SoC, the 10 Hz operation will be possible with 5 MPix sensors (2.5 Hz with 18 MPix).
Inter-board connections are shown in Figure 1 (just the connections, the actual prototype camera will look more like in Figure 3, but with a wider body). Five to seven of the 10393 boards are arranged in 2 layers. Four layer 1 boards use one of the sensor ports to receive image data from the attached sensor, perform image conditioning, flat-field correction and store data in the dedicated DDR3 memory. They later read the data as 16×16 pixels overlapping tiles, calculate the tile centers using calibration data and requested location and nominal disparity from the data received over the GigE. Each tile is transformed to the frequency domain, the data is subject to the space-variant aberration correction. The result frequency domain tiles are output through three remaining sensor ports that are reconfigured to be LVDS transmitters. The layer 2 boards simultaneously receive frequency domain data through all 4 of their sensor ports from the layer 1 and perform phase correlation (pointwise multiplication followed by normalization) on the image pairs. There could be just a single layer 2 board, or up to 3 (limited by the available layer 1 ports) to perform different types of correlations in parallel (all 4 pairs combined, two vertical pairs and separately 2 horizontal pairs for better foreground/background separation. The results of the frequency domain calculations are then transformed to the pixel domain and the argmax is calculated. Then argmax value is used to calculate the full tile disparity, and the corresponding correlation value – as a disparity confidence. The pair (disparity, confidence) for each tile is then sent over GigE to the CNN implemented on a workstation computer.
Figure 3. Quad sensor camera for image sets acquisition
Image Sets for Training and TestingWhile the TP functionality is already tested with the software emulation, and the efficient implementation is developed, more research is needed for the CNN part of the system. Available image sets, such as KITTI[3] have insufficient resolution (1.4 MPix) and they use different spatial arrangement of the cameras. We plan to capture high resolution quad camera image sets using available NC393-based cameras that will be upgraded from 5 MPix to 18 MPix sensors of the same 1/2.3″ format so the optical-mechanical design will remain the same. As we are primarily interested in long distance ranging (few hundreds to thousands meters), use of the LIDARs to capture ground truth data is not practical. Instead we plan to mount a pair of identical quad cameras (with the baseline of 150mm) on a car 1500 mm apart, pointed in the same direction, so when the 3D measurements from these quad cameras are fused, the accuracy of the composite distance data would be ten times better, because the effective baseline will be 1500mm. Of course that method has some limitations (it will not help to improve data from the poorly textured objects) but it will provide higher absolute distance resolution that can be used for the loss function during CNN training. Data from the individual quad cameras will be used for training and testing of the network.
All acquired images, related calibration data and software will be available online under GNU GPL.
References[1] Becker, Suzanna, and Geoffrey E. Hinton. “Self-organizing neural network that discovers surfaces in random-dot stereograms.” Nature 355.6356 (1992): 161.
[2] Scharstein, Daniel, et al. “High-resolution stereo datasets with subpixel-accurate ground truth.” German Conference on Pattern Recognition. Springer, Cham, 2014.
[3] Menze, Moritz, and Andreas Geiger. “Object scene flow for autonomous vehicles.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[4] J. Zbontar and Y. LeCun, “Stereo matching by training a convolutional neural network to compare image patches,” Journal of Machine Learning Research, vol. 17, no. 1-32, p. 2, 2016.
[5] W. Luo, A. G. Schwing, and R. Urtasun, “Efficient deep learning for stereo matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703, 2016.
[6] H. Hirschmuller, “Accurate and efficient stereo processing by semi-global matching and mutual information,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, pp. 807–814, IEEE, 2005.
[7] J A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry, “End-to-end learning of geometry and context for deep stereo regression,” arXiv preprint arXiv:1703.04309, 2017.
[8] Sihag, Saurabh, and Pranab Kumar Dutta. “Faster method for Deep Belief Network based Object classification using DWT.” arXiv preprint arXiv:1511.06276 (2015).
[9] Ulicny, Matej, and Rozenn Dahyot. “On using CNN with DCT based Image Data.” Proceedings of the 19th Irish Machine Vision and Image Processing conference IMVIP 2017
[10] Vasilache, Nicolas, et al. “Fast convolutional nets with fbfft: A GPU performance evaluation.” arXiv preprint arXiv:1412.7580 (2014).
[11] Lavin, Andrew, and Scott Gray. “Fast algorithms for convolutional neural networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
02/04/18 [x393][CMV300] by AndreyFilippov: debugging upgrade from 15.3 to 17.4
debugging upgrade from 15.3 to 17.4
JP4
python
← Older revision Revision as of 19:10, 2 February 2018 (6 intermediate revisions by the same user not shown)Line 108: Line 108: ====python==== ====python==== −* very slow+* 0.5s for 2592x1936 GRBG image +<font size='1'> import numpy as np import numpy as np from PIL import Image from PIL import Image import scipy.misc import scipy.misc − ...+ − # in JP4 format the 16x16 block is 8x32 (GRBG)+ # first open image in grayscale − # the 1st line of 8x32 blocks is the left half of the image+ I = scipy.misc.imread(filename, flatten=False, mode='L') − # the 2nd line of 8x32 blocks is the right half+ − + H,W = I.shape − # vertical step = 16px+ − for y in range(0,self.h,16):+ # 16x16 block − for x in range(0,self.w,16):+ block = np.zeros((16,16)) − + # 16xW − bx = x if x<self.w/2 else x-self.w/2+ stripe = np.zeros((16,W)) − by = y if x<self.w/2 else y+8+ − + I = np.reshape(I,(H/16,16,W)) − block8x8_gr = self.px[by:by+8,2*bx+0 :2*bx+ 8]+ − block8x8_r = self.px[by:by+8,2*bx+8 :2*bx+16]+ # print(I.shape) − block8x8_b = self.px[by:by+8,2*bx+16:2*bx+24]+ # (1936, 2592) -> (121, 16, 2592) − block8x8_gb = self.px[by:by+8,2*bx+24:2*bx+32]+ − + for i in range(I.shape[0]): − for dy in range(8):+ # stripe 16xW − for dx in range(8):+ stripe = np.copy(I[i]) − #OpenCV uses BGR format+ for j in range(0,W,16): − output[y+2*dy+0,x+2*dx+0]=[0,block8x8_gr[dy,dx][0],0]+ if j<W/2: − output[y+2*dy+0,x+2*dx+1]=[0,0,block8x8_r[dy,dx][0]]+ k = 0 − output[y+2*dy+1,x+2*dx+0]=[block8x8_b[dy,dx][0],0,0]+ l = 2*j − output[y+2*dy+1,x+2*dx+1]=[0,block8x8_gb[dy,dx][0],0]+ else: − ...+ k = 8 + l = 2*(j-W/2) + + # gr r b gb + block[0::2,0::2] = stripe[k:k+8,l+ 0:l+ 0+8] + block[0::2,1::2] = stripe[k:k+8,l+ 8:l+ 8+8] + block[1::2,0::2] = stripe[k:k+8,l+16:l+16+8] + block[1::2,1::2] = stripe[k:k+8,l+24:l+24+8] + + I[i,0:16,j:j+16] = block + + I = np.reshape(I,(H,W)) + # pixels ordered as in bayer pattern, this needs demosaicing + cv2.imwrite("result.png",I) +</font> ==<font color="blue">JP46</font>== ==<font color="blue">JP46</font>== Oleg01/31/18 [linux-elphel][rocko] by Oleg Dzhimiev: added mt9f002 registers addresses
added mt9f002 registers addresses
01/31/18 [linux-elphel][rocko] by Oleg Dzhimiev: minor edits
minor edits
103891
103891
Tmp manual
Changelog
← Older revision Revision as of 23:30, 30 January 2018 (3 intermediate revisions by the same user not shown)Line 284: Line 284: * [https://community.elphel.com/files/393/20180118/ 20180118] * [https://community.elphel.com/files/393/20180118/ 20180118] * [https://community.elphel.com/files/393/20180116/ 20180116] * [https://community.elphel.com/files/393/20180116/ 20180116] +===Changelog=== + <font size='1'>==20180130== + * added photo finish demo + * fixed fps limit calcs for triggered mode + ==20180118== + * raw.py & raw.php, see wiki.elphel.com + * added python3 and python3-opencv + ==20180116== + * raw pixel data downloading through membridge + * added gcc,make + ==20180109== + * fixed autoexposure + ==20171228== + * + strace, ltrace, dmsetup + * added to drivers: register devs to sysfs - nodes then created by udev + ==20171226== + * kernel updated to 4.9 (from 4.0) + * lots of drivers is updated to newer versions + * +dm-crypt and cryptsetup + ==20171120== + * bugfix - incorrect displaying of TRIG_PERIOD at init + ==20171115== + * Fixed autocampars to let 10393 work with the mux board - 10359, see wiki.elphel.com for + docs + ==20170823== + * Fixed autocampars for multiple sensors getting desynced at init + * Fixed Garmin GPS 18x USB driver + ==20170802== + * fixed a page for taking snapshots - works in chrome and firefox for bigger images (>2MB) + * enabled "Access-Control-Expose-Headers: Content-Disposition" and CORS + ==20170627== + * viewing decoded jp4 format in camvc + ==20170623== + * fixed incorrect default setting of the master channel +</font> ==<font color="blue">Other info</font>== ==<font color="blue">Other info</font>== OlegPhoto-finish
Line scan with Elphel
← Older revision Revision as of 22:40, 30 January 2018 Line 38: Line 38: [[File:Elphelimg 1015229 a.jpeg|thumb|500px|This image is downsized from 2592x8000 and rotated by 90° CCW]] [[File:Elphelimg 1015229 a.jpeg|thumb|500px|This image is downsized from 2592x8000 and rotated by 90° CCW]] +[[File:Photofinish example 2.jpeg|thumb|500px|Another sample]] {| class='wikitable' {| class='wikitable' OlegPages
