10393 manual
Add a program or a shell command to autostart on boot
← Older revision Revision as of 01:10, 23 October 2018 (5 intermediate revisions by the same user not shown)Line 65: Line 65: ====Notes==== ====Notes==== * Boot time: ~30s * Boot time: ~30s −* The default boot is from the on-board NAND flash. [[Boot_options_393|More information]] on available boot options and recovery boot+* The default boot is from the on-board NAND flash. [[Boot_options_393|More information]] on available boot options and recovery boot. −* For development one can boot from the μSD recovery card and use it+* For a production system with rare changes to the file system it is recommended to boot from NAND flash. +* For development, one can boot from the μSD recovery card and use it. ==<font color="blue">Defaults</font>== ==<font color="blue">Defaults</font>== Line 401: Line 402: root@elphel393:~# reboot -f root@elphel393:~# reboot -f </font> </font> +===Add a program or a shell command to autostart on boot=== +There's a way to do this using [https://unix.stackexchange.com/questions/56957/how-to-start-an-application-automatically-on-boot cron or init.d] but it might run before the sensors are initialized by init_elphel393.py. The recommended way is the following: +* '''nano''' or '''vi''' +<font size='2'> + ssh root@192.168.0.9 + ''pass'' + root@elphel393:~# nano /etc/elphel393/init_elphel393.py + ''edit - save'' + ''example: to launch the rtsp streamer - add '''shout("/usr/bin/str")''' to the end of the file''. shout() function is just a wrapper for '''subprocess.call(cmd,shell=True)''' +</font> +Then: +<font size='2'> + #boot from NAND flash? + root@elphel393:~# overlay_sync 1 + root@elphel393:~# shutdown -hP now + #powercycle + + #boot from card? + root@elphel393:~# sync + root@elphel393:~# reboot -f +</font> + ===Set up histogram window and autoexposure parameters=== ===Set up histogram window and autoexposure parameters=== [[Autoexposure|Read article]] [[Autoexposure|Read article]] Oleg10393 manual
Notes
← Older revision Revision as of 00:27, 23 October 2018 (2 intermediate revisions by the same user not shown)Line 65: Line 65: ====Notes==== ====Notes==== * Boot time: ~30s * Boot time: ~30s −* The default boot is from the on-board NAND flash. [[Boot_options_393|More information]] on available boot options and recovery boot+* The default boot is from the on-board NAND flash. [[Boot_options_393|More information]] on available boot options and recovery boot. −* For development one can boot from the μSD recovery card and use it+* For a production system with rare changes to the file system it is recommended to boot from NAND flash. +* For development, one can boot from the μSD recovery card and use it. ==<font color="blue">Defaults</font>== ==<font color="blue">Defaults</font>== Oleg10/22/18 [x3domlet][] by Oleg Dzhimiev: updated help
updated help
10/22/18 [x3domlet][] by Oleg Dzhimiev: 1. click on icon 2. disabled autopan for popup
1. click on icon 2. disabled autopan for popup
Trigger 393
10389 trigger testing
← Older revision Revision as of 16:46, 16 October 2018 (One intermediate revision by the same user not shown)Line 101: Line 101: ==<font color="blue">10389 trigger testing</font>== ==<font color="blue">10389 trigger testing</font>== −===external connector===+===External connector=== In this test the camera triggers itself via audio cable. For cable wiring see [[103891]]. In this test the camera triggers itself via audio cable. For cable wiring see [[103891]]. * To test: * To test: Line 115: Line 115: |} |} −===internal connectors===+===Internal connectors=== In this test the camera triggers itself via 4-pin flex cable. In this test the camera triggers itself via 4-pin flex cable. * Connect as displayed on the pictures below. * Connect as displayed on the pictures below. Line 134: Line 134: |} |} +===Sync example: from external to internal (chaining)=== +* The simplest synchronization would be making a custom audio cable to sync all cameras in the system +* In case there are multiple cameras in the same enclosure it might make sense to sync them by chaining using 4-pin flex cables +{| +|valign='top'|[[File:10389 sync example 1.jpeg|thumb|400px]] +|} [[Category:393]] [[Category:393]] OlegTrigger 393
Description
← Older revision Revision as of 22:41, 12 October 2018 (5 intermediate revisions by the same user not shown)Line 1: Line 1: ==<font color="blue">Description</font>== ==<font color="blue">Description</font>== −10389 board is required.+10389 board is required (and possible 103891). −The triggering is used for one or several cameras synchronization or setting the frame rate (or fps). The conditions of the trigger can be generated either internally or externally. For external triggering a 4-conductor 2.5mm audio plug with cable (example: [http://www.digikey.com/products/en?keywords=839-1029-ND digikey]) is required.+Use cases: +* Trigger a single camera from external source +* Synchronize multiple cameras to a master camera or external trigger source +* Program frame rate + +The conditions of the trigger can be generated either internally or externally: +* for external triggering a [[103891]] board and a 4-conductor 2.5mm audio plug with cable (example: [http://www.digikey.com/products/en?keywords=839-1029-ND digikey]) is required. +* for internal triggering use J4-J6 connectors with a 4-pin flex cable: J4 - internal trigger source (also transmits timestamps that can be decoded by other cameras), J5-6 - receive trigger To program trigger go to: '''http://192.168.0.9/parsedit.php => External Trigger Controls''' To program trigger go to: '''http://192.168.0.9/parsedit.php => External Trigger Controls''' Line 97: Line 104: In this test the camera triggers itself via audio cable. For cable wiring see [[103891]]. In this test the camera triggers itself via audio cable. For cable wiring see [[103891]]. * To test: * To test: − http://192.168.0.9/parsedit.php?sensor_port=0&immediate&TRIG_CONDITION=0&TRIG_OUT=0x66555&TRIG_PERIOD=25000000&TRIG=4+ - connect a sensor to port 0 + - http://192.168.0.9/parsedit.php?sensor_port=0&immediate&TRIG_CONDITION=0&TRIG_OUT=0x66555&TRIG_PERIOD=25000000&TRIG=4 visual: LED is blinking visual: LED is blinking software: frame counter is running software: frame counter is running Oleg10389
Links
← Older revision Revision as of 22:23, 12 October 2018 Line 25: Line 25: * [[Media:10389b.pdf|10389 Circuit Diagram, Parts List, PCB layout]] * [[Media:10389b.pdf|10389 Circuit Diagram, Parts List, PCB layout]] * [[Media:10389b gerber.tar.gz|10389 Gerber files]] * [[Media:10389b gerber.tar.gz|10389 Gerber files]] − +* [[Media:10389b gerber.tar.gz|10389 Gerber files]] +* [[Trigger 393|External/internal synchronization & triggering and frame rate controlling for single/multi-camera systems]] [[Category:393]] [[Category:393]] [[Category:Boards 393]] [[Category:Boards 393]] Oleg10393 manual
External/internal trigger and FPS control
← Older revision Revision as of 22:18, 12 October 2018 Line 414: Line 414: ===Switch between ERS and GRR modes in MT9P006=== ===Switch between ERS and GRR modes in MT9P006=== * [[Electronic_Rolling_Shutter#ERS_and_GRR_in_MT9P001_on_10393|Read article]] * [[Electronic_Rolling_Shutter#ERS_and_GRR_in_MT9P001_on_10393|Read article]] −===External/internal trigger and FPS control===+===External/internal synchronization & triggering and FPS control for single/multi-camera systems=== * [[Trigger_393]] * [[Trigger_393]] + ===SSD/MMC/USB formatting=== ===SSD/MMC/USB formatting=== * [[Format_SSD_MMC]] * [[Format_SSD_MMC]] OlegTrigger 393
internal connectors
← Older revision Revision as of 21:50, 12 October 2018 (6 intermediate revisions by the same user not shown)Line 91: Line 91: * [[10389]] extension board * [[10389]] extension board * [[103891]] adapter board for external trigger connection * [[103891]] adapter board for external trigger connection + +==<font color="blue">10389 trigger testing</font>== + +===external connector=== +In this test the camera triggers itself via audio cable. For cable wiring see [[103891]]. +* To test: + http://192.168.0.9/parsedit.php?sensor_port=0&immediate&TRIG_CONDITION=0&TRIG_OUT=0x66555&TRIG_PERIOD=25000000&TRIG=4 + visual: LED is blinking + software: frame counter is running + +* It's easy to modify the cable to trigger itself and other multiple cameras. + +{| +|[[File:10389 extsync selftest.jpeg|thumb|400px]] +|} + +===internal connectors=== +In this test the camera triggers itself via 4-pin flex cable. +* Connect as displayed on the pictures below. +** J4 - trigger output (master port) +** J5-J6 - trigger input (slave ports) +* To test: + - connect a sensor to port 0 + - http://192.168.0.9/parsedit.php?sensor_port=0&immediate&TRIG_CONDITION=0x8000&TRIG_OUT=0x66555&TRIG_PERIOD=25000000&TRIG=4 + software: frame counter is running +* To sync multiple camera, example: + cable 1: camera1 (J4) -> camera2 (J6) + cable 2: camera2 (J5) -> camera3 (J6) + cable 3: camera3 (J5) -> camera4 (J6) + etc. +{| +|valign='top'|[[File:10389 intsync selftest.jpeg|thumb|400px]] +|valign='top'|[[File:10389 intsync selftest closeup.jpeg|thumb|400px]] +|} + + [[Category:393]] [[Category:393]] OlegGPU Implementation of the Tile Processor
After we coupled the Tile Processor (TP) that performs quad camera image conditioning and produces 2D phase correlation in space-invariant form with the neural network[1], the TP remained the bottleneck of the tandem. While the inferred network uses GPU and produces disparity output in 0.5 sec (more than 80% of this time is used for the data transfer), the TP required tens of seconds to run on CPU as a multithreaded Java application. When converted to run on the GPU, similar operation takes just 0.087 seconds for four 5 MPix images, and it is likely possible to optimize the code farther — this is our first experience with Nvidia® CUDA™.
Implementation Starting with CUDA™ and JCUDABefore starting development of the GPU code we verified that the GPU acceleration is possible for the main program written in Java by evaluating demo ImageJ plugin[2]. Next was to get into development with Nvidia® CUDA™ using Nsight[3] plugin for Eclipse IDE. Nsight offered an option to import sample projects and I started to learn CUDA™ with one of them – dct8x8. That project uses “Runtime API” and consists of a mixture of C/C++ code for the CPU and for the GPU. This mixed mode is not directly compatible with JCUDA[4], but if the “kernel” (code executed by the GPU) is kept separate from the CPU one, it is possible to develop/debug/test the program in Nsight IDE and then use the same file containing just the GPU kernel(s) with JCUDA. What needs to be changed is the portion of the non-GPU C/C++ code that transfers data between the computer system (CPU) memory and the GPU dedicated memory physically located on the graphic card.
Guided by the Nvidia® dct8x8 sample project I first implemented similar DCT-IV[5] and DST-IV needed for the Complex Lapped Transform (CLT) used by the TP for conversion to the frequncy domain and back (sample project contained code only for the DCT-II (direct) and DCT-II (inverse) used in JPEG and related applications. After several iterations I made those programs to run almost as fast as the highly optimized code in the Nvidia® sample, and the next step was to implement complete CLT and aberration correction for the Bayer mosaic images, following the approach we used for the RTL[6]. Debugging was simplified by the fact that the same algorithm was already tested in Java, and the intermediate data arrays could be compared between the Java and CUDA™ outputs.
Kernel for the Complex Lapped Transform of the Bayer Mosaic ImagesThe first implemented kernel is taking the following inputs:
- Four of the 5 MPix Bayer mosaic images.
- Four of the per-camera arrays of the space variant (stride 16) CLT deconvolution kernels already converted to the frequency domain (this data is reused for multiple image sets).
- Additional kernel fractional pixel x,y offsets and their derivatives to interpolate required image shifts between the grid nodes of the space-variant kernels
- List of the tiles to process that contains tiles location on the tile grid and the fractional pixel X,Y shift calculated externally from the required disparity for the known radial lens distortions. This list is used because not all the tiles need to be processed on each pass, when building the depth map only some tiles need to be processed, and some tiles require multiple iterations with different disparity values.
Output from the first kernel is a set of per-camera (4), per tile (324×242 for the 2592×1936 images), per color component (3) of 4×8×8 arrays representing CLT frequency domain transformation of each of the 16×16 (stride 8) pixels tiles of the source images. The transformation sequence is described in the earlier post[6], it assumes the following stages:
- Finding the closest de-convolution kernel for the specified tile index and offsets.
- Calculation of the full horizontal and vertical offset that combines requested image tile center and the interpolated kernel data.
- Splitting X,Y offsets into rounded integer and fractional pixel shifts. Integer part is used to select position of the 16×16 window in the source image, fractional one is applied later in the form of the phase shift in the frequency domain. It is also applied to the window function.
- Folding 16×16 image tiles into the 8×8 pixel ones for the 2D DCT-IV/DST-IV conversions, using the shifted 2D half-sine window function.
- Calculating the CLT layers: DCT-IV/DCT-IV, DST-IV/DCT-IV, DCT-IV/DST-IV and DST-IV/DST-IV (horizontal pass/vertical pass). For the Bayer mosaic input, of the 12 (3 colors by 4 CLT layers) transforms only 4 are actually needed, other 8 are restored using the symmetry of transforms of sparse (1 in 4 non-zero for red and blue, 1 in 2 for green color components) inputs.
- Element-wise multiplication of the converted image tile by the kernel tile, equivalent to the pixel-domain convolution.
- Applying the residual fractional pixel shift (in the range of +/-0.5 pix in each direction) implemented as a phase rotation. Such shift does not involve re-sampling as the space domain shift would, and so it does not introduce any related quantization noise.
- Optional 2D low pass filter that can be combined with the convolution kernels.
When I went through all the processing pipeline, and made sure the results match the Java code output. I measured the execution time and was disappointed – ~5 seconds per set of 4 images wasn’t what I expected. Using profiler I soon realized that my understanding of CUDA™ was wrong – all the participating threads have to execute exactly the same code, it is not like in multi-threaded CPU code or multiple simultaneously operating modules in RTL. Re-writing the code to eliminate divergence reduced execution time to 0.9 s. Not too exciting, but still significant gain over the CPU alone. Another kernel for inverse transform to convert from the frequency domain back to the images added 0.6 seconds. The inverse MCLT produces overlapping 16×16 (stride 8) tiles that need to be added together to result in a full picture, implementation uses 4 (stride 2 in each direction) passes over the image, with each pass free of any overlaps between tiles allowing asynchronous parallel execution.
After spending about two weeks troubleshooting the code kernel code as a part of the C++ application I was expecting to encounter more difficult to pinpoint problems when adding these GPU kernels to the Java application. There are multiple data arrays to be fed to the GPU with no convenience of having the debugger at hand.
In reality it was much easier – JCUDA[4] does a wonderful job and it took me just a few hours to convert the code to run as a GPU accelerator for the Java program. I already had the Java code to convert images and convolution kernels to the data arrays that were passed to the C++ test application via binary files, only what remained to be done was to flatten multi-dimensional arrays and to replace an array of struct – it had a mixture of integer (tile indices) and float (offsets) members. With that done, there were a couple bugs left, but they were clearly reported in the Java stack trace output.
I do not know if it is possible to use #include directives in the code compiled from JCUDA, but as the source code is anyway first read from the file to a Java String before it is sent to the compiler, I just concatenated all the needed (currently just 2) files as strings. I also added “#define JCUDA” to the concatenated string before the files content, as well as some other numeric defines (like image dimensions) shared between the Java and the GPU code. I enclosed all the includes and the duplicate parameter defines within #ifndef JCUDA in the GPU kernel source files, that made the same source code files to work in both Nsight IDE and in Java application with nvrtcCreateProgram(). Soon I got the complete output images and verified they are the same as generated by the Java code.
And in the end I’ve got an unexpected and wonderful “reward”. When stepping over the critical GPU-related Java code lines I was expecting 1.5 second delay for the execution of the two kernels, but could not notice any. I enclosed each GPU kernel call with extra cuCtxSynchronize() thinking it did not wait for completion – still no visible delay. Then I put a loop to run kernels 100 times and got ~6 seconds for the first kernel. Something was obviously wrong – but the output images were correct. I went back to the IDE and made sure I have “Release” (not “Debug”) in seemingly every relevant place, but still it was running slow. Then I launched built binaries from the command line and discovered, that while “Debug” version is as slow as when run from the IDE, the “Release” version is 20.1 times faster, same performance as with JCUDA.
Kernel description Debug mode execution time Release mode execution time Convert 4 images to FD and deconvolve with kernels 1073.4 ms 57.0 ms Convert 4 images from FD to RGB slices 665.3 ms 29.5 ms Total 1738.7 ms 86.5 msThese run times were measured for the “GeForce GTX 1050 Ti” with compute capability 6.1, 4GB memory.
ResultsThe GPU code implemented and tested so far does not include the 2D phase correlation kernel need for the depth map generation, I started with just the rectified images as the pictures usually show if something is wrong in the processing. Phase correlation kernel will be easy to implement (the first kernel will stay the same) and the execution time will be approximately the same. Then we will work on coupling this code with the Tensorflow inferred network and feed the 2D correlation data directly from the GPU device memory. That will result in the near real-time performance of the whole system.
Links to the source code are provided below[7],[8],[9] (the code is also mirrored at Github). It needs to be cleaned up and may eventually be released as a library after more functionality will be added. We believe this code will be useful for variety of imaging applications both coupled with the ML systems or traditional. The phase correlation can be used for multiple view camera systems (as in our case) or for the optical flow processing when matching images acquired by the same camera in subsequent frames.
Links[1] “Neural network doubled effective baseline of the stereo camera”↗
[2] “How to create an ImageJ Plugin using JCuda”↗
[3] Nvidia®Nsight™ Eclipse Edition↗
[4] Java bindings for Nvidia®CUDA™↗
[5] Wikipedia article: “Discrete Cosine Transform”↗
[6] “Efficient Complex Lapped Transform Implementation for the Space-Variant Frequency Domain Calculations of the Bayer Mosaic Color Images”↗
[7] Source code of the 8×8 DCT-II,DST-II,DCT-IV and DST-IV for GPU↗
[8] Source code of the Tile Processor GPU implementation↗
[9] Source code of the Java class to integrate GPU acceleration with JCUDA ↗
10/09/18 [imagej-elphel][master] by Andrey Filippov: Update .gitattributes
Update .gitattributes
10/09/18 [imagej-elphel][master] by Andrey Filippov: Update .gitattributes
Update .gitattributes
10/09/18 [imagej-elphel][master] by Andrey Filippov: highlight syntax for CUDA source files
highlight syntax for CUDA source files
10/08/18 [imagej-elphel][gpu] by AndreyFilippov: added variable LPF through GPU constants memory
added variable LPF through GPU constants memory
10/08/18 [imagej-elphel][gpu] by AndreyFilippov: Got 4 images converted, corrected, and converted back with JCUDA!
Got 4 images converted, corrected, and converted back with JCUDA!
10/07/18 [imagej-elphel][gpu] by AndreyFilippov: Fixed several bugs, 4-image aberration correction and IMCLT matches Java output
Fixed several bugs, 4-image aberration correction and IMCLT matches Java output
Pages
