Zero-Copy in DekTec Matrix API® 2.0
Turbo charge 4K processing by saving CPU and memory bandwidth
DekTec's Matrix API® 2.0 is an easy-to-use API to create software-based video-processing solutions that receive and transmit SDI streams through our PCIe cards. The new Zero-Copy feature reduces the overhead of transferring video between card and PC and makes it as efficient as possible. This article explains the inner workings of Zero-Copy.
Matrix API for SDI Applications
The Matrix API is a set of C++ classes that allows real-time processing of audio and video in software. It is widely used in PC-based professional A/V applications such as video encoders and decoders, that are software based yet require real time operation and low latency. Especially when interfacing via DekTec's SDI PCIe cards, the Matrix API makes writing applications much easier, completely hiding the complexities of the SDI layer.
However, when pushing the boundaries, e.g. all-software solutions for encoding UHD or multichannel video, there are a couple of system-level challenges to overcome:
In practice, memory bandwidth can prove to be a bigger bottleneck than CPU power. To help reduce memory bandwidth, the latest Matrix API version includes a new feature called Zero-Copy. In the rest of this article we'll explain how this feature works.
Inner Workings of the Matrix API
The diagram below shows the standard flow of SDI data when using the Matrix API with the DTA-2174B (as an example).
Figure 1. Standard Matrix API flow: Video, audio and ANC data is stored in the frame.
From left to right:
Note: The Matrix API flow for SDI output (e.g in a decoder) is similar, but in the opposite direction.
The Cost of Convenience
In the flow described above, the user application reads (or in the case of output, writes) the video data in the pixel format most convenient for further processing. The downside of this convenience is that the pixel conversion routine effectively copies the video, while transforming the pixels, from the DMA buffer to an intermediate video buffer. This “copy” in the Matrix API results in additional traffic to and from memory, increasing the memory bandwidth used.
For applications that already require high memory bandwidth, such as video encoders and decoders, every avoidable copy is worth a lot. Enter Zero-Copy.
Zero-Copy to the Rescue
The Zero-Copy feature eliminates the copy from the driver's DMA buffer to a video buffer (or vice versa for output). The way this is done is by giving the user application direct access to the video data in the DMA buffer. This avoids the copy, but it does mean that the video is only available to the user application in its native 10-bit UYVY format.
For a high performance encoder implementation it will be more efficient to directly combine the pixel conversion with the encoding process itself, instead of letting the Matrix API convert the pixel format before encoding the video. The diagram below shows again the flow of data for an encoder type of application, but this time using Zero-Copy for video.
Figure 2. Zero-Copy for video: The user reads video directly from the DMA buffer via line pointers.
The core idea is that video data is not transferred from the DMA buffer to a frame buffer, but the frame structure is given a list of pointers to the individual video lines in the DMA buffer. This way, the user callback function can iterate over the pointers, read the video lines directly from the DMA buffer, and feed the video data to the pixel processing pipeline.
Note: For an application with SDI output, the Zero-Copy concept works the same, but the other way around: the application writes the video directly to the DMA buffer, at the location indicated by the line pointers.
When reading or writing 4K UHD video with the Matrix API, the new Zero-Copy feature saves about 20Gbit/s of memory bandwidth. This is a significant portion of the total available PC memory bandwidth, which can be put to good use for other purposes. As a result, multi-core systems in particular will run much smoother.