Contact Sales & After-Sales Service

Contact & Quotation

  • Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
  • Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Drag & Drop Files, Choose Files to Upload You can upload up to 3 files.

Notes:
For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.

Zynq OpenCV Acceleration: Hardware Image Processing Tutorial

After implementing image processing pipelines on various platforms, I can say that Zynq OpenCV acceleration offers something unique: the ability to prototype algorithms in Python, then push compute-intensive operations to hardware without rewriting everything in HDL. This tutorial walks through practical approaches to hardware-accelerated image processing on Zynq, from capturing frames via Zynq MIPI interfaces to running Zynq Python algorithms through the PYNQ framework.

Why Hardware Acceleration Matters for Image Processing

Image processing demands enormous computational throughput. A single 1080p frame contains over 2 million pixels, each requiring multiple operations per algorithm stage. Running a Sobel edge detector at 60 fps means processing 124 million pixels per second. Software alone struggles to keep up.

Zynq OpenCV Performance Comparison

Implementation1080p Sobel Edge DetectionPower Consumption
ARM Cortex-A9 (Software)5-8 fps~2W
ARM + NEON Optimization12-18 fps~2.5W
PL Hardware Acceleration60+ fps~3W
Full Pipeline in PL120+ fps~4W

The programmable logic handles pixel-level parallelism naturally. While the ARM core processes one pixel, the FPGA fabric processes thousands simultaneously. That’s the fundamental advantage of Zynq OpenCV acceleration.

Understanding the Zynq Image Processing Architecture

The Zynq architecture splits image processing responsibilities between the Processing System (PS) and Programmable Logic (PL). Getting the partition right determines whether your system achieves real-time performance.

Typical Processing Pipeline

StageBest LocationReason
Image Capture (MIPI/HDMI)PLHigh-speed serial interfaces
Color Space ConversionPLPixel-parallel operations
Filtering (Blur, Sharpen)PLConvolution benefits from parallelism
Feature DetectionPL or PSDepends on algorithm complexity
Object ClassificationPSComplex decision logic
Display OutputPLTiming-critical video signals

The key insight is streaming data through PL processing blocks while keeping high-level decisions in software. AXI Stream interfaces connect processing stages, enabling data to flow without CPU intervention.

Setting Up Zynq MIPI Camera Interfaces

Modern image sensors use Zynq MIPI CSI-2 interfaces for high-bandwidth data transfer. Implementing MIPI on Zynq requires understanding both the physical layer (D-PHY) and protocol layer (CSI-2).

Zynq MIPI Implementation Options

Zynq FamilyD-PHY SupportImplementation Method
Zynq-7000External PHY requiredResistor network or dedicated PHY IC
Zynq UltraScale+Native IO supportDirect connection to HP bank IOs

For Zynq-7000 designs, the external resistor network approach works for data rates up to 800 Mbps per lane. Higher speeds require a dedicated PHY chip like the MC2002.

MIPI CSI-2 IP Core Configuration

The Xilinx MIPI CSI-2 RX Subsystem IP (free since Vivado 2020.1) handles protocol decoding. Key configuration parameters:

ParameterTypical ValueNotes
Number of Lanes2 or 4Match sensor configuration
Line Rate800-1500 MbpsPer lane speed
Pixel FormatRAW10, RAW12Bayer pattern from sensor
Pixels Per Clock2 or 4Higher = more resources, higher throughput

The IP outputs AXI4-Stream video data ready for downstream processing blocks.

Camera Sensor Initialization

Most MIPI sensors require I2C configuration before streaming. The Zynq I2C controller connects to the sensor’s CCI (Camera Control Interface):

// Example: Initialize Sony IMX219 sensor

XIicPs_MasterSend(&Iic, init_sequence, sizeof(init_sequence), SENSOR_I2C_ADDR);

Sensor initialization sequences are typically sensor-specific and available in manufacturer datasheets or Linux kernel drivers.

Read more Xilinx FPGA Series:

Zynq Python Development with PYNQ

PYNQ (Python + Zynq) revolutionizes how we develop Zynq OpenCV applications. Instead of writing C code and cross-compiling, you write Zynq Python directly on the board using Jupyter notebooks.

PYNQ Architecture Overview

ComponentFunction
Linux OSBase operating system on ARM
Jupyter NotebookBrowser-based Python IDE
PYNQ LibrariesPython wrappers for hardware control
OverlaysPre-built FPGA bitstreams
OpenCVStandard computer vision library

The overlay concept is powerful. Hardware designs are packaged as overlays that Python code can load dynamically. Switch between different hardware configurations without rebooting.

Installing OpenCV on PYNQ

OpenCV comes pre-installed on recent PYNQ images, but you may need to update it:

# Check OpenCV version

import cv2

print(cv2.__version__)

# If update needed (run in terminal)

# pip3 install opencv-python –upgrade

Basic Zynq Python Image Processing

Here’s a simple motion detection example using Zynq Python and OpenCV:

import cv2

import numpy as np

from pynq.overlays.base import BaseOverlay

# Load the base overlay

base = BaseOverlay(“base.bit”)

# Initialize video capture

cap = cv2.VideoCapture(0)  # USB camera

# Read reference frame

ret, frame1 = cap.read()

gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

gray1 = cv2.GaussianBlur(gray1, (21, 21), 0)

while True:

    ret, frame2 = cap.read()

    gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    gray2 = cv2.GaussianBlur(gray2, (21, 21), 0)

    # Calculate difference

    diff = cv2.absdiff(gray1, gray2)

    thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)[1]

    # Find contours

    contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL,

                                    cv2.CHAIN_APPROX_SIMPLE)

    for contour in contours:

        if cv2.contourArea(contour) > 500:

            x, y, w, h = cv2.boundingRect(contour)

            cv2.rectangle(frame2, (x, y), (x+w, y+h), (0, 255, 0), 2)

    gray1 = gray2

This software-only implementation achieves around 8-12 fps. Let’s accelerate it.

Hardware Acceleration with Vitis Vision Library

The Vitis Vision Library (formerly xfOpenCV) provides HLS-synthesizable versions of OpenCV functions. These compile directly to FPGA hardware.

Supported Zynq OpenCV Functions

CategoryFunctions
FiltersGaussianBlur, Sobel, Median, Bilateral
TransformsResize, Remap, WarpAffine, WarpPerspective
Feature DetectionHarris, FAST, Canny, HoughLines
Color ConversionBGR2Gray, BGR2HSV, Bayer2RGB
ArithmeticAbsDiff, Add, Subtract, Threshold
MorphologyErode, Dilate, Open, Close

Creating an Accelerated Filter in Vitis HLS

Here’s how to create a hardware-accelerated Gaussian blur:

#include “hls_video.h”

#include “xf_gaussian_filter.hpp”

void gaussian_accel(

    hls::stream<ap_axiu<24,1,1,1>> &src,

    hls::stream<ap_axiu<24,1,1,1>> &dst,

    int rows, int cols)

{

    #pragma HLS INTERFACE axis port=src

    #pragma HLS INTERFACE axis port=dst

    #pragma HLS INTERFACE s_axilite port=rows

    #pragma HLS INTERFACE s_axilite port=cols

    #pragma HLS INTERFACE s_axilite port=return

    xf::cv::Mat<XF_8UC3, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> imgInput(rows, cols);

    xf::cv::Mat<XF_8UC3, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> imgOutput(rows, cols);

    #pragma HLS DATAFLOW

    xf::cv::AXIvideo2xfMat(src, imgInput);

    xf::cv::GaussianBlur<5, XF_BORDER_CONSTANT, XF_8UC3,

                         MAX_HEIGHT, MAX_WIDTH, XF_NPPC1>(imgInput, imgOutput, 1.0);

    xf::cv::xfMat2AXIvideo(imgOutput, dst);

}

The pragmas tell HLS how to synthesize the design:

  • INTERFACE axis: Creates AXI Stream ports
  • INTERFACE s_axilite: Creates register interface for control
  • DATAFLOW: Enables pipelining between functions

Integration with PYNQ Overlays

After synthesizing in Vivado, package the design as a PYNQ overlay:

from pynq import Overlay, allocate

import numpy as np

# Load custom overlay

ol = Overlay(“gaussian_accel.bit”)

# Get accelerator handle

gaussian = ol.gaussian_accel_0

# Allocate DMA buffers

in_buffer = allocate(shape=(1080, 1920, 3), dtype=np.uint8)

out_buffer = allocate(shape=(1080, 1920, 3), dtype=np.uint8)

# Copy input image to buffer

in_buffer[:] = input_image

# Configure and run accelerator

gaussian.write(0x10, 1080)  # rows

gaussian.write(0x18, 1920)  # cols

gaussian.write(0x00, 0x01)  # start

# Wait for completion

while (gaussian.read(0x00) & 0x02) == 0:

    pass

# Read result

output_image = np.array(out_buffer)

Building a Complete Vision Pipeline

Let’s put together a complete hardware-accelerated pipeline for edge detection with Zynq OpenCV.

Pipeline Architecture

StageImplementationInterface
HDMI InputVideo DMA IPAXI Stream
Color ConvertHLS RGB2GrayAXI Stream
Gaussian BlurHLS GaussianBlurAXI Stream
Sobel FilterHLS SobelAXI Stream
ThresholdHLS ThresholdAXI Stream
HDMI OutputVideo DMA IPAXI Stream

Each stage connects via AXI Stream, creating a pixel-streaming pipeline. The VDMA handles frame buffering in DDR memory.

Read more Xilinx Products:

Performance Results

ConfigurationFrame RateLatency
Software Only8 fps125 ms
HDMI In + SW Processing15 fps67 ms
Full HW Pipeline60 fps16.7 ms
Optimized (2 PPC)120 fps8.3 ms

The hardware pipeline achieves real-time 60 fps with minimal CPU involvement. The ARM cores remain available for higher-level tasks like object classification.

Essential Resources for Zynq OpenCV Development

Official Documentation

ResourceDescription
UG1233Vitis Vision Library User Guide
UG902Vivado HLS User Guide
PG232MIPI CSI-2 RX Subsystem Guide
PG238MIPI DSI TX Subsystem Guide

Download Links

Vitis Vision Library: https://github.com/Xilinx/Vitis_Libraries/tree/main/vision

PYNQ Framework: http://www.pynq.io/

PYNQ SD Card Images: https://github.com/Xilinx/PYNQ/releases

OpenCV for Zynq: https://github.com/Xilinx/Vitis-AI/tree/master/examples

Xilinx Embedded Software: https://github.com/Xilinx/embeddedsw

Adam Taylor’s PYNQ OpenCV Project: https://github.com/ATaylorCEngworking/pynq_cv

Frequently Asked Questions

Can I use standard OpenCV code with Zynq acceleration?

Not directly. Standard OpenCV functions run on the ARM processor as software. To accelerate them, you need to use the Vitis Vision Library equivalents, which are written for HLS synthesis. The good news is that the API is similar, so porting algorithms isn’t too difficult. You typically keep the algorithm structure the same but replace OpenCV function calls with xf::cv equivalents. The Vitis Vision Library covers most commonly used OpenCV functions for filtering, transforms, and feature detection.

What’s the difference between PYNQ overlays and Vitis acceleration?

PYNQ overlays are pre-built FPGA bitstreams that you load at runtime using Python. They’re convenient for rapid prototyping because someone else has already done the hardware design. Vitis acceleration involves creating custom hardware accelerators using HLS and integrating them into your own design. It offers more flexibility but requires more development effort. Many developers start with PYNQ overlays to validate their algorithms, then create custom Vitis accelerators for production designs where they need specific optimizations.

How do I choose between Zynq MIPI and HDMI for camera input?

Zynq MIPI CSI-2 is the native interface for most modern image sensors and provides the best integration for custom camera designs. It’s compact, low-power, and supports high bandwidth. HDMI input is better when you’re working with standard video sources like cameras with HDMI output, capture cards, or development/testing scenarios where you want flexibility in video sources. MIPI requires more hardware design effort (especially on Zynq-7000 which needs external PHY components), while HDMI interfaces are well-supported by existing IP cores and development boards.

What frame rates can I achieve with Zynq OpenCV hardware acceleration?

Frame rates depend on resolution, algorithm complexity, and how well your pipeline is optimized. For 1080p video with typical filtering operations (color conversion, blur, edge detection), you can achieve 60 fps with a single-pixel-per-clock design and 120+ fps with multi-pixel-per-clock implementations. More complex algorithms like optical flow or stereo vision may be limited to 30-60 fps at 1080p. The Zynq UltraScale+ devices with larger PL resources can handle 4K60 processing for many algorithms. Always profile your specific pipeline to understand bottlenecks.

Do I need to know Verilog or VHDL for Zynq OpenCV acceleration?

Not necessarily. High-Level Synthesis (HLS) lets you write C/C++ code that synthesizes to hardware. The Vitis Vision Library provides ready-to-use functions that you can integrate with minimal HDL knowledge. However, understanding basic FPGA concepts helps tremendously when debugging timing issues, optimizing resource usage, or integrating IP blocks in Vivado. For simple projects using PYNQ and existing overlays, you can work entirely in Python without touching any HDL. For custom high-performance designs, some Vivado block design experience is beneficial even if you don’t write RTL code directly.

Moving Forward with Zynq OpenCV

Hardware-accelerated image processing on Zynq opens possibilities that pure software implementations can’t match. The combination of ARM processors for flexibility and FPGA fabric for raw throughput creates a platform suitable for everything from industrial inspection systems to autonomous robots.

Start with PYNQ if you’re new to the platform. The Jupyter notebook environment lets you experiment with Zynq Python and OpenCV without complex toolchain setup. As your projects mature, move critical processing stages to hardware using the Vitis Vision Library.

The Zynq MIPI interfaces connect directly to modern image sensors, while HDMI provides convenient development and testing options. Whether you’re building a simple edge detector or a complex multi-camera system, the architectural patterns remain similar: capture in hardware, process through streaming pipelines, and make decisions in software.

Recommended Development Boards for Zynq OpenCV

Choosing the right development board accelerates your Zynq OpenCV projects. Here are boards I’ve worked with that offer good video capabilities:

Entry-Level Boards

BoardZynq DeviceVideo InterfacesPrice Range
PYNQ-Z2XC7Z020HDMI In/Out$120-150
Arty Z7-20XC7Z020HDMI Out, Pmod$130-160
Zybo Z7-20XC7Z020HDMI In/Out, Pcam$200-250

Professional Boards

BoardZynq DeviceVideo InterfacesPrice Range
ZCU104XCZU7EVHDMI, DisplayPort, FMC$1,200-1,500
Kria KV260XCK26MIPI CSI, DisplayPort$250-300
Ultra96-V2XCZU3EGMIPI CSI, DisplayPort$250-300

The Kria KV260 deserves special mention for vision applications. It includes the Raspberry Pi camera connector, making it easy to interface common MIPI camera modules. The included reference designs demonstrate Zynq OpenCV acceleration out of the box.

Optimization Tips for Real-Time Performance

After building many vision systems, I’ve collected these practical optimization strategies:

Memory Bandwidth Management

Video processing consumes enormous memory bandwidth. A 1080p60 RGB stream requires 373 MB/s just for raw pixel data. Add processing stages that read and write intermediate results, and bandwidth demands multiply quickly.

OptimizationBandwidth Impact
Process in streaming modeEliminates intermediate frame buffers
Use on-chip line buffersReduces DDR access for filter kernels
Increase pixels per clockReduces transaction overhead
Enable AXI burst transfersImproves DDR efficiency

Clock Domain Planning

Video pipelines often involve multiple clock domains: the pixel clock from MIPI/HDMI, the PL fabric clock, and the AXI interconnect clock. Proper FIFO placement prevents data corruption at domain crossings.

Resource Utilization Balance

The Zynq-7020 (common on PYNQ boards) provides 53,200 LUTs and 220 DSP slices. A single hardware Gaussian blur uses approximately 2,000 LUTs and 0 DSPs. A Sobel filter needs around 1,500 LUTs and 4 DSPs. Plan your pipeline based on available resources, leaving headroom for timing closure.

Debug and Verification Strategies

Hardware image processing introduces debugging challenges that software developers don’t typically encounter. Here are approaches that have saved me countless hours:

Simulation with Test Images

Always simulate your HLS designs with real image data before synthesis. The Vitis Vision Library includes testbenches that read standard image formats:

cv::Mat src = cv::imread(“test_image.png”);

// Convert to HLS stream format and run simulation

ILA Integration for Runtime Debug

Xilinx Integrated Logic Analyzer (ILA) cores capture signals in the running hardware. Insert ILA probes at AXI Stream interfaces to verify pixel data flows correctly through your pipeline.

Frame Buffer Inspection

When debugging display issues, dump frame buffer contents to files and examine them offline. Incorrect pixel formats, byte ordering issues, and timing glitches become obvious when you can compare expected versus actual image data.

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Sales & After-Sales Service

Contact & Quotation

  • Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.

  • Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.

Drag & Drop Files, Choose Files to Upload You can upload up to 3 files.

Notes:
For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.