Zynq OpenCV Acceleration: Hardware Image Processing Tutorial

After implementing image processing pipelines on various platforms, I can say that Zynq OpenCV acceleration offers something unique: the ability to prototype algorithms in Python, then push compute-intensive operations to hardware without rewriting everything in HDL. This tutorial walks through practical approaches to hardware-accelerated image processing on Zynq, from capturing frames via Zynq MIPI interfaces to running Zynq Python algorithms through the PYNQ framework.

Request PCB Manufacturing & Assembly Quote Now

Why Hardware Acceleration Matters for Image Processing

Image processing demands enormous computational throughput. A single 1080p frame contains over 2 million pixels, each requiring multiple operations per algorithm stage. Running a Sobel edge detector at 60 fps means processing 124 million pixels per second. Software alone struggles to keep up.

Zynq OpenCV Performance Comparison

Implementation	1080p Sobel Edge Detection	Power Consumption
ARM Cortex-A9 (Software)	5-8 fps	~2W
ARM + NEON Optimization	12-18 fps	~2.5W
PL Hardware Acceleration	60+ fps	~3W
Full Pipeline in PL	120+ fps	~4W

The programmable logic handles pixel-level parallelism naturally. While the ARM core processes one pixel, the FPGA fabric processes thousands simultaneously. That’s the fundamental advantage of Zynq OpenCV acceleration.

Understanding the Zynq Image Processing Architecture

The Zynq architecture splits image processing responsibilities between the Processing System (PS) and Programmable Logic (PL). Getting the partition right determines whether your system achieves real-time performance.

Typical Processing Pipeline

Stage	Best Location	Reason
Image Capture (MIPI/HDMI)	PL	High-speed serial interfaces
Color Space Conversion	PL	Pixel-parallel operations
Filtering (Blur, Sharpen)	PL	Convolution benefits from parallelism
Feature Detection	PL or PS	Depends on algorithm complexity
Object Classification	PS	Complex decision logic
Display Output	PL	Timing-critical video signals

The key insight is streaming data through PL processing blocks while keeping high-level decisions in software. AXI Stream interfaces connect processing stages, enabling data to flow without CPU intervention.

Setting Up Zynq MIPI Camera Interfaces

Modern image sensors use Zynq MIPI CSI-2 interfaces for high-bandwidth data transfer. Implementing MIPI on Zynq requires understanding both the physical layer (D-PHY) and protocol layer (CSI-2).

Zynq MIPI Implementation Options

Zynq Family	D-PHY Support	Implementation Method
Zynq-7000	External PHY required	Resistor network or dedicated PHY IC
Zynq UltraScale+	Native IO support	Direct connection to HP bank IOs

For Zynq-7000 designs, the external resistor network approach works for data rates up to 800 Mbps per lane. Higher speeds require a dedicated PHY chip like the MC2002.

MIPI CSI-2 IP Core Configuration

The Xilinx MIPI CSI-2 RX Subsystem IP (free since Vivado 2020.1) handles protocol decoding. Key configuration parameters:

Parameter	Typical Value	Notes
Number of Lanes	2 or 4	Match sensor configuration
Line Rate	800-1500 Mbps	Per lane speed
Pixel Format	RAW10, RAW12	Bayer pattern from sensor
Pixels Per Clock	2 or 4	Higher = more resources, higher throughput

The IP outputs AXI4-Stream video data ready for downstream processing blocks.

Camera Sensor Initialization

Most MIPI sensors require I2C configuration before streaming. The Zynq I2C controller connects to the sensor’s CCI (Camera Control Interface):

// Example: Initialize Sony IMX219 sensor

XIicPs_MasterSend(&Iic, init_sequence, sizeof(init_sequence), SENSOR_I2C_ADDR);

Sensor initialization sequences are typically sensor-specific and available in manufacturer datasheets or Linux kernel drivers.

Read more Xilinx FPGA Series:

Best Zynq UltraScale+ Development Boards Compared (2024)

How to Install Vivado on Windows 11: Step-by-Step Tutorial

Spartan-3E FPGA Board: Beginner Tutorial & Project Ideas

Where to Buy Xilinx FPGAs: Complete Authorized Distributors Guide

Xilinx Alveo Accelerator Cards: Data Center FPGA Guide

Xilinx AMD Acquisition: What It Means for FPGA Developers

Xilinx Artix-7 FPGA Family: Features, Specs & Selection Guide

Xilinx Artix-7 FPGA Price Guide

Xilinx CPLD Programmer and Xilinx CPLD Board: The Complete Guide for Engineers

Xilinx FPGA Programming for Beginners: First Project Tutorial

Xilinx JTAG Programming: Complete Hardware Setup & Debug Tutorial

Xilinx Kintex-7 FPGA: Mid-Range Performance Powerhouse

Xilinx Spartan-3 FPGA: Legacy Support & Migration Guide

Xilinx Spartan-6 FPGA: Still Relevant? Complete 2025 Guide

Xilinx Spartan-7 FPGA: Low-Cost Solution for Embedded Design

Xilinx Virtex-7 FPGA: High-End Performance for Critical Applications

Zynq Python Development with PYNQ

PYNQ (Python + Zynq) revolutionizes how we develop Zynq OpenCV applications. Instead of writing C code and cross-compiling, you write Zynq Python directly on the board using Jupyter notebooks.

PYNQ Architecture Overview

Component	Function
Linux OS	Base operating system on ARM
Jupyter Notebook	Browser-based Python IDE
PYNQ Libraries	Python wrappers for hardware control
Overlays	Pre-built FPGA bitstreams
OpenCV	Standard computer vision library

The overlay concept is powerful. Hardware designs are packaged as overlays that Python code can load dynamically. Switch between different hardware configurations without rebooting.

Installing OpenCV on PYNQ

OpenCV comes pre-installed on recent PYNQ images, but you may need to update it:

# Check OpenCV version

import cv2

print(cv2.__version__)

# If update needed (run in terminal)

# pip3 install opencv-python –upgrade

Basic Zynq Python Image Processing

Here’s a simple motion detection example using Zynq Python and OpenCV:

import cv2

import numpy as np

from pynq.overlays.base import BaseOverlay

# Load the base overlay

base = BaseOverlay(“base.bit”)

# Initialize video capture

cap = cv2.VideoCapture(0) # USB camera

# Read reference frame

ret, frame1 = cap.read()

gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

gray1 = cv2.GaussianBlur(gray1, (21, 21), 0)

while True:

ret, frame2 = cap.read()

gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

gray2 = cv2.GaussianBlur(gray2, (21, 21), 0)

# Calculate difference

diff = cv2.absdiff(gray1, gray2)

thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)[1]

# Find contours

contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL,

cv2.CHAIN_APPROX_SIMPLE)

for contour in contours:

if cv2.contourArea(contour) > 500:

x, y, w, h = cv2.boundingRect(contour)

cv2.rectangle(frame2, (x, y), (x+w, y+h), (0, 255, 0), 2)

gray1 = gray2

This software-only implementation achieves around 8-12 fps. Let’s accelerate it.

Hardware Acceleration with Vitis Vision Library

The Vitis Vision Library (formerly xfOpenCV) provides HLS-synthesizable versions of OpenCV functions. These compile directly to FPGA hardware.

Supported Zynq OpenCV Functions

Category	Functions
Filters	GaussianBlur, Sobel, Median, Bilateral
Transforms	Resize, Remap, WarpAffine, WarpPerspective
Feature Detection	Harris, FAST, Canny, HoughLines
Color Conversion	BGR2Gray, BGR2HSV, Bayer2RGB
Arithmetic	AbsDiff, Add, Subtract, Threshold
Morphology	Erode, Dilate, Open, Close

Creating an Accelerated Filter in Vitis HLS

Here’s how to create a hardware-accelerated Gaussian blur:

#include “hls_video.h”

#include “xf_gaussian_filter.hpp”

void gaussian_accel(

hls::stream<ap_axiu<24,1,1,1>> &src,

hls::stream<ap_axiu<24,1,1,1>> &dst,

int rows, int cols)

{

#pragma HLS INTERFACE axis port=src

#pragma HLS INTERFACE axis port=dst

#pragma HLS INTERFACE s_axilite port=rows

#pragma HLS INTERFACE s_axilite port=cols

#pragma HLS INTERFACE s_axilite port=return

xf::cv::Mat<XF_8UC3, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> imgInput(rows, cols);

xf::cv::Mat<XF_8UC3, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> imgOutput(rows, cols);

#pragma HLS DATAFLOW

xf::cv::AXIvideo2xfMat(src, imgInput);

xf::cv::GaussianBlur<5, XF_BORDER_CONSTANT, XF_8UC3,

MAX_HEIGHT, MAX_WIDTH, XF_NPPC1>(imgInput, imgOutput, 1.0);

xf::cv::xfMat2AXIvideo(imgOutput, dst);

}

The pragmas tell HLS how to synthesize the design:

INTERFACE axis: Creates AXI Stream ports
INTERFACE s_axilite: Creates register interface for control
DATAFLOW: Enables pipelining between functions

Integration with PYNQ Overlays

After synthesizing in Vivado, package the design as a PYNQ overlay:

from pynq import Overlay, allocate

import numpy as np

# Load custom overlay

ol = Overlay(“gaussian_accel.bit”)

# Get accelerator handle

gaussian = ol.gaussian_accel_0

# Allocate DMA buffers

in_buffer = allocate(shape=(1080, 1920, 3), dtype=np.uint8)

out_buffer = allocate(shape=(1080, 1920, 3), dtype=np.uint8)

# Copy input image to buffer

in_buffer[:] = input_image

# Configure and run accelerator

gaussian.write(0x10, 1080) # rows

gaussian.write(0x18, 1920) # cols

gaussian.write(0x00, 0x01) # start

# Wait for completion

while (gaussian.read(0x00) & 0x02) == 0:

pass

# Read result

output_image = np.array(out_buffer)

Building a Complete Vision Pipeline

Let’s put together a complete hardware-accelerated pipeline for edge detection with Zynq OpenCV.

Pipeline Architecture

Stage	Implementation	Interface
HDMI Input	Video DMA IP	AXI Stream
Color Convert	HLS RGB2Gray	AXI Stream
Gaussian Blur	HLS GaussianBlur	AXI Stream
Sobel Filter	HLS Sobel	AXI Stream
Threshold	HLS Threshold	AXI Stream
HDMI Output	Video DMA IP	AXI Stream

Each stage connects via AXI Stream, creating a pixel-streaming pipeline. The VDMA handles frame buffering in DDR memory.

Read more Xilinx Products:

XCV300E-7PQ240I: High-Performance Virtex-E FPGA for Industrial Applications

XCV100E-6BG352C: High-Performance Virtex-E FPGA with 196 I/O for Advanced Embedded Applications

XCV300E-6BG352C: High-Performance Virtex-E FPGA for Advanced Digital Design

XC2S200-6FGG965C: High-Performance Spartan-II FPGA for Advanced Digital Design

XCV600E-6FG680C: High-Performance Virtex-E FPGA for Advanced Digital Design Applications

XCV600E-6BG560C: High-Performance Virtex-E FPGA Solution for Advanced Digital Design

XC2S200-6FGG987C: High-Performance Spartan-II FPGA for Advanced Digital Design

XCV300E-8FG256C: Advanced Virtex-E FPGA for High-Performance Embedded Systems

XCV300E-6FG456C: High-Performance Virtex-E FPGA for Advanced Digital Systems

XCV100E-6FG256C: High-Performance Virtex-E FPGA for Industrial Applications

Performance Results

Configuration	Frame Rate	Latency
Software Only	8 fps	125 ms
HDMI In + SW Processing	15 fps	67 ms
Full HW Pipeline	60 fps	16.7 ms
Optimized (2 PPC)	120 fps	8.3 ms

The hardware pipeline achieves real-time 60 fps with minimal CPU involvement. The ARM cores remain available for higher-level tasks like object classification.

Essential Resources for Zynq OpenCV Development

Official Documentation

Resource	Description
UG1233	Vitis Vision Library User Guide
UG902	Vivado HLS User Guide
PG232	MIPI CSI-2 RX Subsystem Guide
PG238	MIPI DSI TX Subsystem Guide

Download Links

Vitis Vision Library: https://github.com/Xilinx/Vitis_Libraries/tree/main/vision

PYNQ Framework: http://www.pynq.io/

PYNQ SD Card Images: https://github.com/Xilinx/PYNQ/releases

OpenCV for Zynq: https://github.com/Xilinx/Vitis-AI/tree/master/examples

Xilinx Embedded Software: https://github.com/Xilinx/embeddedsw

Adam Taylor’s PYNQ OpenCV Project: https://github.com/ATaylorCEngworking/pynq_cv

Frequently Asked Questions

Can I use standard OpenCV code with Zynq acceleration?

Not directly. Standard OpenCV functions run on the ARM processor as software. To accelerate them, you need to use the Vitis Vision Library equivalents, which are written for HLS synthesis. The good news is that the API is similar, so porting algorithms isn’t too difficult. You typically keep the algorithm structure the same but replace OpenCV function calls with xf::cv equivalents. The Vitis Vision Library covers most commonly used OpenCV functions for filtering, transforms, and feature detection.

What’s the difference between PYNQ overlays and Vitis acceleration?

PYNQ overlays are pre-built FPGA bitstreams that you load at runtime using Python. They’re convenient for rapid prototyping because someone else has already done the hardware design. Vitis acceleration involves creating custom hardware accelerators using HLS and integrating them into your own design. It offers more flexibility but requires more development effort. Many developers start with PYNQ overlays to validate their algorithms, then create custom Vitis accelerators for production designs where they need specific optimizations.

How do I choose between Zynq MIPI and HDMI for camera input?

Zynq MIPI CSI-2 is the native interface for most modern image sensors and provides the best integration for custom camera designs. It’s compact, low-power, and supports high bandwidth. HDMI input is better when you’re working with standard video sources like cameras with HDMI output, capture cards, or development/testing scenarios where you want flexibility in video sources. MIPI requires more hardware design effort (especially on Zynq-7000 which needs external PHY components), while HDMI interfaces are well-supported by existing IP cores and development boards.

What frame rates can I achieve with Zynq OpenCV hardware acceleration?

Frame rates depend on resolution, algorithm complexity, and how well your pipeline is optimized. For 1080p video with typical filtering operations (color conversion, blur, edge detection), you can achieve 60 fps with a single-pixel-per-clock design and 120+ fps with multi-pixel-per-clock implementations. More complex algorithms like optical flow or stereo vision may be limited to 30-60 fps at 1080p. The Zynq UltraScale+ devices with larger PL resources can handle 4K60 processing for many algorithms. Always profile your specific pipeline to understand bottlenecks.

Do I need to know Verilog or VHDL for Zynq OpenCV acceleration?

Not necessarily. High-Level Synthesis (HLS) lets you write C/C++ code that synthesizes to hardware. The Vitis Vision Library provides ready-to-use functions that you can integrate with minimal HDL knowledge. However, understanding basic FPGA concepts helps tremendously when debugging timing issues, optimizing resource usage, or integrating IP blocks in Vivado. For simple projects using PYNQ and existing overlays, you can work entirely in Python without touching any HDL. For custom high-performance designs, some Vivado block design experience is beneficial even if you don’t write RTL code directly.

Moving Forward with Zynq OpenCV

Hardware-accelerated image processing on Zynq opens possibilities that pure software implementations can’t match. The combination of ARM processors for flexibility and FPGA fabric for raw throughput creates a platform suitable for everything from industrial inspection systems to autonomous robots.

Start with PYNQ if you’re new to the platform. The Jupyter notebook environment lets you experiment with Zynq Python and OpenCV without complex toolchain setup. As your projects mature, move critical processing stages to hardware using the Vitis Vision Library.

The Zynq MIPI interfaces connect directly to modern image sensors, while HDMI provides convenient development and testing options. Whether you’re building a simple edge detector or a complex multi-camera system, the architectural patterns remain similar: capture in hardware, process through streaming pipelines, and make decisions in software.

Recommended Development Boards for Zynq OpenCV

Choosing the right development board accelerates your Zynq OpenCV projects. Here are boards I’ve worked with that offer good video capabilities:

Entry-Level Boards

Board	Zynq Device	Video Interfaces	Price Range
PYNQ-Z2	XC7Z020	HDMI In/Out	$120-150
Arty Z7-20	XC7Z020	HDMI Out, Pmod	$130-160
Zybo Z7-20	XC7Z020	HDMI In/Out, Pcam	$200-250

Professional Boards

Board	Zynq Device	Video Interfaces	Price Range
ZCU104	XCZU7EV	HDMI, DisplayPort, FMC	$1,200-1,500
Kria KV260	XCK26	MIPI CSI, DisplayPort	$250-300
Ultra96-V2	XCZU3EG	MIPI CSI, DisplayPort	$250-300

The Kria KV260 deserves special mention for vision applications. It includes the Raspberry Pi camera connector, making it easy to interface common MIPI camera modules. The included reference designs demonstrate Zynq OpenCV acceleration out of the box.

Optimization Tips for Real-Time Performance

After building many vision systems, I’ve collected these practical optimization strategies:

Memory Bandwidth Management

Video processing consumes enormous memory bandwidth. A 1080p60 RGB stream requires 373 MB/s just for raw pixel data. Add processing stages that read and write intermediate results, and bandwidth demands multiply quickly.

Optimization	Bandwidth Impact
Process in streaming mode	Eliminates intermediate frame buffers
Use on-chip line buffers	Reduces DDR access for filter kernels
Increase pixels per clock	Reduces transaction overhead
Enable AXI burst transfers	Improves DDR efficiency

Clock Domain Planning

Video pipelines often involve multiple clock domains: the pixel clock from MIPI/HDMI, the PL fabric clock, and the AXI interconnect clock. Proper FIFO placement prevents data corruption at domain crossings.

Resource Utilization Balance

The Zynq-7020 (common on PYNQ boards) provides 53,200 LUTs and 220 DSP slices. A single hardware Gaussian blur uses approximately 2,000 LUTs and 0 DSPs. A Sobel filter needs around 1,500 LUTs and 4 DSPs. Plan your pipeline based on available resources, leaving headroom for timing closure.

Debug and Verification Strategies

Hardware image processing introduces debugging challenges that software developers don’t typically encounter. Here are approaches that have saved me countless hours:

Simulation with Test Images

Always simulate your HLS designs with real image data before synthesis. The Vitis Vision Library includes testbenches that read standard image formats:

cv::Mat src = cv::imread(“test_image.png”);

// Convert to HLS stream format and run simulation

ILA Integration for Runtime Debug

Xilinx Integrated Logic Analyzer (ILA) cores capture signals in the running hardware. Insert ILA probes at AXI Stream interfaces to verify pixel data flows correctly through your pipeline.

Frame Buffer Inspection

When debugging display issues, dump frame buffer contents to files and examine them offline. Incorrect pixel formats, byte ordering issues, and timing glitches become obvious when you can compare expected versus actual image data.

Contact Sales & After-Sales Service

Printed Circuit Board

RF PCB

PCB Surface Finish

Special Process

Special Materials

PCB Assembly

PCBA Services

Testing

Application

Resources

News & Blog