ZCU104: AI/ML Focused Zynq UltraScale+ Development Board

The ZCU104 has earned its place as the go-to development platform for engineers building AI-powered embedded vision systems. Having deployed multiple machine learning applications on this ZCU104 FPGA platform, I’ve come to appreciate why AMD designed it specifically for edge inference workloads.

This comprehensive guide covers everything engineers need to know about the Xilinx ZCU104—from hardware specifications and video codec capabilities to deploying neural networks with Vitis AI. Whether you’re evaluating the ZCU104 price against alternatives or ready to start your first DPU project, this article provides the technical foundation for successful development.

Request PCB Manufacturing & Assembly Quote Now

Why the ZCU104 Excels at AI and Machine Learning

The ZCU104 stands apart from other Zynq UltraScale+ boards because AMD designed it from the ground up for embedded vision and machine learning applications. The XCZU7EV device at its core includes features specifically optimized for these workloads.

The EV Advantage: Hardware Video Codec

Unlike EG-variant devices found on boards like the ZCU102, the Zynq ZCU104 uses an EV-variant device with integrated H.264/H.265 video codec. This hardware Video Codec Unit (VCU) handles video encoding and decoding without consuming FPGA resources or processor cycles.

VCU Capability	Specification
Encode/Decode	Simultaneous
Maximum Resolution	4K @ 60 fps
Supported Standards	H.264 (AVC), H.265 (HEVC)
Multi-Stream	Up to 8× 1080p @ 30 fps
Bit Depth	8-bit and 10-bit
Chroma	4:2:0, 4:2:2

For surveillance cameras, drones, and ADAS applications, this hardware codec is invaluable. Processing 4K video streams in software would consume significant ARM processor resources, leaving little headroom for machine learning inference. The VCU frees both the processors and FPGA fabric for AI acceleration.

UltraRAM: Essential for Vision Processing

EV devices include UltraRAM, which the ZCU104 FPGA uses extensively for image processing pipelines:

Memory Type	XCZU7EV Capacity	Use Case
Block RAM	11.0 Mb	Line buffers, small FIFOs
UltraRAM	27.0 Mb	Frame buffers, feature maps
DSP Slices	1,728	Convolution operations

UltraRAM blocks provide 288 Kb each—8× denser than 36 Kb Block RAM. For DPU implementations storing activation maps and feature data, this additional on-chip memory significantly improves inference performance by reducing external memory bandwidth requirements.

ZCU104 Hardware Specifications

Understanding the complete ZCU104 hardware capabilities helps determine suitability for your application.

Processing Architecture

The XCZU7EV-2FFVC1156 MPSoC combines multiple processing domains:

Processing Element	Specifications
Application Processor	Quad-core ARM Cortex-A53 @ 1.5 GHz
Real-Time Processor	Dual-core ARM Cortex-R5F @ 600 MHz
Graphics Processor	Mali-400 MP2 GPU
Video Codec Unit	4KP60 H.264/H.265
FPGA Fabric	504K logic cells
DSP Slices	1,728
Block RAM	11.0 Mb
UltraRAM	27.0 Mb

The quad-core Cortex-A53 handles Linux, application logic, and pre/post-processing for ML inference. The Cortex-R5F cores provide deterministic real-time processing when needed. The Mali-400 GPU accelerates display rendering and basic image operations.

Memory Configuration

The Xilinx ZCU104 provides substantial memory for AI workloads:

Memory Interface	Capacity	Width	Speed
PS DDR4 Component	2 GB	64-bit	2400 MT/s
PL DDR4 SODIMM	Expandable	64-bit	2400 MT/s
Quad-SPI Flash	128 MB	x8	–

The PS-side 2 GB DDR4 connects directly to the processing system for Linux operation, application code, and inference model weights. The PL-side SODIMM socket allows adding memory dedicated to FPGA fabric operations—useful for applications requiring high-bandwidth video frame buffers.

Display and Video Interfaces

Video connectivity on the ZCU104 supports complete vision pipelines:

Interface	Direction	Specifications
DisplayPort 1.2a	Output	Up to 4K @ 30 fps
HDMI 2.0	Input/Output	Via 3× GTH transceivers
USB 3.0	Input	Includes USB camera
MIPI CSI-2	Input	Via FMC expansion

The included 1080p60 USB 3.0 camera enables immediate prototyping without additional hardware purchases—a thoughtful inclusion for AI/ML development.

Expansion and Connectivity

Interface	Specifications
FMC LPC	1× GTH transceiver, 68 user I/O
PMOD	3× 12-pin headers
M.2 (SATA)	For SSD storage
Ethernet	Gigabit RJ45
USB	USB 3.0 + USB 2.0

The FMC LPC connector provides expansion for camera interfaces, additional sensors, or custom I/O. While limited to one GTH transceiver (versus 16 on ZCU102), this tradeoff keeps costs manageable for vision-focused applications that don’t require high-speed serial connectivity.

Read more Xilinx FPGA Series:

Best Zynq UltraScale+ Development Boards Compared (2024)

How to Install Vivado on Windows 11: Step-by-Step Tutorial

Spartan-3E FPGA Board: Beginner Tutorial & Project Ideas

Where to Buy Xilinx FPGAs: Complete Authorized Distributors Guide

Xilinx Alveo Accelerator Cards: Data Center FPGA Guide

Xilinx AMD Acquisition: What It Means for FPGA Developers

Xilinx Artix-7 FPGA Family: Features, Specs & Selection Guide

Xilinx Artix-7 FPGA Price Guide

Xilinx CPLD Programmer and Xilinx CPLD Board: The Complete Guide for Engineers

Xilinx FPGA Programming for Beginners: First Project Tutorial

Xilinx JTAG Programming: Complete Hardware Setup & Debug Tutorial

Xilinx Kintex-7 FPGA: Mid-Range Performance Powerhouse

Xilinx Spartan-3 FPGA: Legacy Support & Migration Guide

Xilinx Spartan-6 FPGA: Still Relevant? Complete 2025 Guide

Xilinx Spartan-7 FPGA: Low-Cost Solution for Embedded Design

Xilinx Virtex-7 FPGA: High-End Performance for Critical Applications

ZCU104 Price and Value Analysis

When evaluating ZCU104 price against alternatives, consider the complete value proposition.

Current Pricing

Source	Approximate Price
AMD Direct	$1,895 MSRP
DigiKey	~$1,899
Mouser	~$1,899
Avnet	~$1,895

Prices fluctuate based on availability and region. The ZCU104 price includes Vivado Design Suite: Design Edition license (node-locked, device-locked) plus access to SDSoC/Vitis development environments.

Comparing ZCU104 vs ZCU102 vs ZCU106

Feature	ZCU104	ZCU102	ZCU106
Device	XCZU7EV	XCZU9EG	XCZU7EV
Video Codec	Yes (H.264/H.265)	No	Yes (H.264/H.265)
Logic Cells	504K	600K	504K
UltraRAM	27.0 Mb	0	27.0 Mb
FMC GTH Lanes	1	16	7
SFP+ Ports	0	4	2
USB Camera Included	Yes	No	No
Price (approx.)	$1,899	$2,995	$3,570

Choose the ZCU104 when:

Hardware video codec is required
Cost optimization is important
Limited high-speed serial I/O is acceptable
AI/ML inference is the primary use case

Choose alternatives when:

Maximum GTH transceivers are needed (ZCU102)
Both video codec and substantial high-speed I/O are required (ZCU106)
Larger FPGA fabric is essential (ZCU102)

The ZCU104 offers the best value for embedded vision and AI applications where the hardware codec justifies the tradeoff in high-speed connectivity.

Machine Learning with Vitis AI on the ZCU104

The ZCU104 FPGA serves as a primary development platform for AMD’s Vitis AI machine learning toolchain.

Understanding the DPU Architecture

Vitis AI deploys a Deep Learning Processing Unit (DPU) in the FPGA fabric. The DPU is a soft IP core optimized for CNN inference:

DPU Feature	ZCU104 Configuration
Architecture	DPUCZDX8G
Recommended Config	B4096
Clock Speed	Up to 300 MHz
Peak Performance	~1200 GOP/s per core
Cores Supported	1-2
Precision	INT8

The B4096 architecture provides 4096 operations per clock cycle. At 300 MHz, a single DPU core delivers approximately 1200 GOP/s peak performance. The Zynq ZCU104 can support dual DPU cores for applications requiring higher throughput.

Typical Inference Performance

Real-world performance depends on network architecture and efficiency:

Network	Single DPU (fps)	Dual DPU (fps)
ResNet-50	~150	~280
MobileNet-V2	~300	~550
YOLOv4	~25	~45
SSD-MobileNet	~120	~220

These numbers represent typical measurements; actual performance varies based on implementation details, batch size, and pre/post-processing overhead.

Vitis AI Development Workflow

The development flow for deploying models on the ZCU104 follows these stages:

Stage	Tool	Output
Model Training	PyTorch/TensorFlow/Caffe	Float32 model
Quantization	Vitis AI Quantizer	INT8 model
Compilation	Vitis AI Compiler	.xmodel file
Deployment	VART Runtime	Executable inference

Quantization converts 32-bit floating-point weights to 8-bit integers, reducing model size and enabling efficient DPU execution. The Vitis AI Quantizer performs calibration using a representative dataset to minimize accuracy loss.

Compilation translates the quantized model into DPU instructions specific to the target architecture (B4096 for ZCU104). The output .xmodel file contains optimized instructions for the DPU.

Supported Frameworks and Models

Framework	Version Support
TensorFlow	1.15, 2.x
PyTorch	1.x, 2.x
Caffe	1.0
ONNX	Via conversion

The Vitis AI Model Zoo provides pre-trained and pre-compiled models ready for deployment:

Category	Example Models
Classification	ResNet, MobileNet, VGG, Inception
Detection	YOLO, SSD, RetinaNet, RefineDet
Segmentation	UNet, FPN, DeepLabV3
Pose Estimation	OpenPose, HRNet

Pre-compiled models for ZCU104 can be downloaded directly, enabling immediate inference without the full compilation workflow.

Embedded Vision Applications

The Xilinx ZCU104 targets specific embedded vision use cases.

Surveillance and Security

The combination of hardware video codec and DPU makes the ZCU104 ideal for intelligent video analytics:

Multi-stream encoding (up to 8× 1080p)
Real-time object detection
Face recognition
Behavior analysis
Edge-based analytics reducing network bandwidth

Advanced Driver Assistance Systems (ADAS)

Automotive vision applications benefit from:

Low-latency inference for safety-critical decisions
Hardware codec for dashcam/DVR functionality
Multiple camera input processing
Sensor fusion capabilities
Real-time lane detection and object classification

Medical Imaging

Healthcare applications leverage:

High-resolution image processing
AI-assisted diagnosis
Real-time ultrasound enhancement
Endoscopy video processing
HIPAA-compliant edge processing

Drones and Robotics

Autonomous systems utilize:

Lightweight inference for battery-powered applications
Real-time obstacle detection
Visual SLAM processing
Target tracking
Payload video encoding

Getting Started with the ZCU104

Setting up the ZCU104 for AI development requires both hardware and software preparation.

Kit Contents

The EK-U1-ZCU104-G kit includes:

Item	Description
ZCU104 Board	XCZU7EV-2FFVC1156 MPSoC
USB Camera	1080p60 USB 3.0 camera
USB Hub	4-port USB 3.0 hub
Power Supply	12V adapter and cables
Ethernet Cable	For network connectivity
Vivado License	Design Suite: Design Edition
SDSoC Access	Development environment

Initial Hardware Setup

Connect USB cable to JTAG/UART port (J164)
Connect Ethernet for network access
Attach USB camera for vision demos
Connect DisplayPort monitor for video output
Insert SD card with Vitis AI image
Configure boot mode switches (SW6)
Apply 12V power

Read more Xilinx Products:

XCVU35P-L2FSVH2104E: AMD Virtex UltraScale+ HBM FPGA Specifications, Features & Applications

XCVU35P-1FSVH2892E: High-Performance AMD Virtex UltraScale+ HBM FPGA

XC2C256-7FT256I CoolRunner-II CPLD: High-Performance Programmable Logic Device

XC2C128-7VQ100C: High-Performance CoolRunner-II CPLD for Advanced Digital Design

XC18V01SO20I: High-Performance Configuration PROM for FPGA Applications

XQ18V04VQ44N: Military-Grade 4Mbit FPGA Configuration PROM by AMD Xilinx

XC18V02VQG44I: Complete Guide to Xilinx 2Mbit In-System Programmable Configuration PROM

XC18V02PC44C0936: AMD Xilinx 2Mbit In-System Programmable Configuration PROM for FPGA Applications

XC2C512-7FT256C: AMD Xilinx CoolRunner-II CPLD | 512 Macrocell Programmable Logic Device

XC17S30PC: Xilinx Spartan OTP Configuration PROM for FPGA Applications

Boot Mode Configuration

SW6[4:1]	Boot Mode
1110	SD Card
0010	QSPI32
0000	JTAG

For Vitis AI development, SD card boot mode provides the fastest path to running inference demos.

Running First Inference

With the Vitis AI pre-built image:

Boot from SD card
Connect via SSH or serial console
Navigate to example applications
Run classification demo with USB camera
Observe real-time inference on DisplayPort output

The included examples demonstrate ResNet classification, YOLO detection, and segmentation networks running on the DPU.

Performance Optimization Techniques

Maximizing inference performance on the ZCU104 FPGA requires understanding system-level considerations.

DPU Configuration Options

Parameter	Options	Impact
Architecture	B512-B4096	Throughput vs resources
Clock Speed	150-350 MHz	Performance vs power
RAM Usage	Low/High	Memory efficiency
Channel Augmentation	Enabled/Disabled	Specific layer support

For the ZCU104, B4096 at 300 MHz typically provides optimal performance. Higher clock speeds are possible but may require careful timing closure.

Multi-Threading Strategies

VART (Vitis AI Runtime) supports multi-threaded inference:

Threads	Use Case
1	Minimum latency per frame
2-4	Balanced throughput/latency
8+	Maximum throughput

Increasing thread count improves DPU utilization but adds latency for individual frames. Match threading strategy to application requirements.

Pipeline Optimization

Complete vision pipelines include:

Stage	Processor	Optimization
Capture	ARM/DMA	Use GStreamer plugins
Pre-processing	ARM/GPU	OpenCV acceleration
Inference	DPU	Batch size tuning
Post-processing	ARM	Efficient NMS implementation
Encode/Decode	VCU	Hardware codec
Display	DisplayPort	Direct buffer sharing

Optimizing data movement between stages often yields greater improvements than DPU optimization alone.

Essential Resources for ZCU104 Development

Official Documentation

Document	Number	Description
User Guide	UG1267	Complete board documentation
Vitis AI User Guide	UG1414	ML development workflow
Vitis AI Model Zoo	–	Pre-trained models
Quick Start Guide	XTP449	Initial setup

Download Links

Resource	URL
Product Page	https://www.xilinx.com/products/boards-and-kits/zcu104.html
Vitis AI GitHub	https://github.com/Xilinx/Vitis-AI
Model Zoo	https://github.com/Xilinx/Vitis-AI/tree/master/model_zoo
Tutorials	https://github.com/Xilinx/Vitis-AI-Tutorials
Pre-built Images	https://www.xilinx.com/support/download.html

Community and Support

Resource	Description
AMD Adaptive Support	Official technical support
Vitis AI Forum	Community discussions
GitHub Issues	Bug reports, feature requests
FPGA Developer	Tutorials and projects

Frequently Asked Questions

What is the difference between the ZCU104 and ZCU102 for machine learning?

The ZCU104 uses an EV-variant device (XCZU7EV) with integrated H.264/H.265 video codec and 27 Mb UltraRAM, while the ZCU102 uses an EG-variant (XCZU9EG) with larger FPGA fabric but no hardware codec or UltraRAM. For ML applications involving video streams, the ZCU104’s hardware codec frees FPGA resources for the DPU. For applications requiring maximum logic cells or extensive high-speed serial I/O, the ZCU102 may be preferred.

What is the ZCU104 price and what does it include?

The ZCU104 price is approximately $1,895-1,899 USD depending on distributor and region. The kit includes the evaluation board, 1080p60 USB camera, USB 3.0 hub, power supply, Ethernet cable, and licenses for Vivado Design Suite: Design Edition (node-locked) plus SDSoC development environment access. This represents significant value compared to purchasing components separately.

How many frames per second can the ZCU104 FPGA achieve for object detection?

Performance varies by network architecture. With a B4096 DPU configuration, the ZCU104 FPGA typically achieves 120+ fps for SSD-MobileNet, 25-45 fps for YOLOv4 (single vs dual DPU), and 150+ fps for ResNet-50 classification. Actual performance depends on input resolution, batch size, and pre/post-processing implementation. The Vitis AI Model Zoo provides benchmarks for specific models.

Can I use PYNQ with the Xilinx ZCU104?

Yes, PYNQ supports the Xilinx ZCU104 with available overlays for DPU inference. The DPU-PYNQ package provides pre-built bitstreams and Python libraries for deploying neural networks via Jupyter notebooks. This approach simplifies development for users preferring Python over C++ for application development, though maximum performance may require native C++ implementations.

What camera interfaces does the ZCU104 support?

The Zynq ZCU104 supports USB 3.0 cameras (included 1080p60 camera), MIPI CSI-2 cameras via FMC expansion cards, and HDMI video input. The USB camera enables immediate prototyping, while production systems typically use MIPI CSI-2 interfaces for direct sensor connection. The hardware codec processes video from any source for encoding, streaming, or storage.

Building AI-Powered Vision Systems

The ZCU104 represents AMD’s strategic platform for edge AI development. Its combination of hardware video codec, substantial FPGA fabric for DPU implementation, and comprehensive software support through Vitis AI creates a capable development environment for embedded vision applications.

Development Best Practices

Successfully deploying ML models on the ZCU104 FPGA requires attention to several practical considerations.

Model Selection: Not all neural network architectures are equally suited for DPU acceleration. Networks with standard convolution, pooling, and activation layers achieve high DPU efficiency. Custom or exotic layers may require CPU fallback, reducing performance. Before committing to a network architecture, verify DPU support in the Vitis AI documentation.

Quantization Strategy: INT8 quantization is mandatory for DPU deployment. The quantization process requires representative calibration data—typically 100-1000 images from your actual application domain. Poor calibration data leads to accuracy degradation. Always validate quantized model accuracy before deployment.

Memory Management: The Zynq ZCU104 has 2 GB PS DDR4 shared between Linux, applications, and DPU buffers. Large models or multi-stream processing can exhaust available memory. Monitor memory usage during development and consider the PL DDR4 SODIMM expansion for demanding applications.

Debugging and Profiling

Vitis AI provides profiling tools essential for optimization:

Tool	Purpose
Vitis Analyzer	Visualize execution timeline
DExplorer	DPU status and configuration
DDump	Model analysis and debugging
Power Profiler	Energy consumption measurement

The Vitis Analyzer shows exactly where time is spent—DPU compute, data transfer, or CPU processing. This visibility identifies bottlenecks and guides optimization efforts.

Power Considerations

For battery-powered or thermally constrained applications:

Configuration	Typical Power
Idle (Linux booted)	~8W
Single DPU active	~12W
Dual DPU active	~18W
VCU encoding 4K	~5W additional

Power consumption scales with DPU utilization and clock speed. Reducing clock frequency or using smaller DPU configurations (B2048 vs B4096) trades performance for power efficiency.

From Development to Production

The ZCU104 serves as a development platform; production deployment typically uses:

Platform	Use Case
Kria SOM	Production modules
Custom Board	High-volume products
ZCU104	Low-volume, prototyping

Design for production early by using portable VART APIs and avoiding ZCU104-specific dependencies. The Kria KV260 provides a production-oriented platform with similar capabilities.

For engineers evaluating development platforms, the ZCU104 price delivers excellent value given the included camera, software licenses, and vision-optimized hardware. The hardware codec differentiates it from EG-variant alternatives for applications involving video streams.

Success with the Xilinx ZCU104 requires understanding both the hardware capabilities and the Vitis AI software ecosystem. Start with pre-built images and Model Zoo examples, then progressively customize for your specific application requirements. The combination of powerful hardware and mature software tools enables deploying sophisticated AI systems at the edge with confidence.

Contact Sales & After-Sales Service

Printed Circuit Board

RF PCB

PCB Surface Finish

Special Process

Special Materials

PCB Assembly

PCBA Services

Testing

Application

Resources

News & Blog