Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Notes: For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.
ZCU104: AI/ML Focused Zynq UltraScale+ Development Board
The ZCU104 has earned its place as the go-to development platform for engineers building AI-powered embedded vision systems. Having deployed multiple machine learning applications on this ZCU104 FPGA platform, I’ve come to appreciate why AMD designed it specifically for edge inference workloads.
This comprehensive guide covers everything engineers need to know about the Xilinx ZCU104—from hardware specifications and video codec capabilities to deploying neural networks with Vitis AI. Whether you’re evaluating the ZCU104 price against alternatives or ready to start your first DPU project, this article provides the technical foundation for successful development.
The ZCU104 stands apart from other Zynq UltraScale+ boards because AMD designed it from the ground up for embedded vision and machine learning applications. The XCZU7EV device at its core includes features specifically optimized for these workloads.
The EV Advantage: Hardware Video Codec
Unlike EG-variant devices found on boards like the ZCU102, the Zynq ZCU104 uses an EV-variant device with integrated H.264/H.265 video codec. This hardware Video Codec Unit (VCU) handles video encoding and decoding without consuming FPGA resources or processor cycles.
VCU Capability
Specification
Encode/Decode
Simultaneous
Maximum Resolution
4K @ 60 fps
Supported Standards
H.264 (AVC), H.265 (HEVC)
Multi-Stream
Up to 8× 1080p @ 30 fps
Bit Depth
8-bit and 10-bit
Chroma
4:2:0, 4:2:2
For surveillance cameras, drones, and ADAS applications, this hardware codec is invaluable. Processing 4K video streams in software would consume significant ARM processor resources, leaving little headroom for machine learning inference. The VCU frees both the processors and FPGA fabric for AI acceleration.
UltraRAM: Essential for Vision Processing
EV devices include UltraRAM, which the ZCU104 FPGA uses extensively for image processing pipelines:
Memory Type
XCZU7EV Capacity
Use Case
Block RAM
11.0 Mb
Line buffers, small FIFOs
UltraRAM
27.0 Mb
Frame buffers, feature maps
DSP Slices
1,728
Convolution operations
UltraRAM blocks provide 288 Kb each—8× denser than 36 Kb Block RAM. For DPU implementations storing activation maps and feature data, this additional on-chip memory significantly improves inference performance by reducing external memory bandwidth requirements.
ZCU104 Hardware Specifications
Understanding the complete ZCU104 hardware capabilities helps determine suitability for your application.
Processing Architecture
The XCZU7EV-2FFVC1156 MPSoC combines multiple processing domains:
Processing Element
Specifications
Application Processor
Quad-core ARM Cortex-A53 @ 1.5 GHz
Real-Time Processor
Dual-core ARM Cortex-R5F @ 600 MHz
Graphics Processor
Mali-400 MP2 GPU
Video Codec Unit
4KP60 H.264/H.265
FPGA Fabric
504K logic cells
DSP Slices
1,728
Block RAM
11.0 Mb
UltraRAM
27.0 Mb
The quad-core Cortex-A53 handles Linux, application logic, and pre/post-processing for ML inference. The Cortex-R5F cores provide deterministic real-time processing when needed. The Mali-400 GPU accelerates display rendering and basic image operations.
Memory Configuration
The Xilinx ZCU104 provides substantial memory for AI workloads:
Memory Interface
Capacity
Width
Speed
PS DDR4 Component
2 GB
64-bit
2400 MT/s
PL DDR4 SODIMM
Expandable
64-bit
2400 MT/s
Quad-SPI Flash
128 MB
x8
–
The PS-side 2 GB DDR4 connects directly to the processing system for Linux operation, application code, and inference model weights. The PL-side SODIMM socket allows adding memory dedicated to FPGA fabric operations—useful for applications requiring high-bandwidth video frame buffers.
Display and Video Interfaces
Video connectivity on the ZCU104 supports complete vision pipelines:
Interface
Direction
Specifications
DisplayPort 1.2a
Output
Up to 4K @ 30 fps
HDMI 2.0
Input/Output
Via 3× GTH transceivers
USB 3.0
Input
Includes USB camera
MIPI CSI-2
Input
Via FMC expansion
The included 1080p60 USB 3.0 camera enables immediate prototyping without additional hardware purchases—a thoughtful inclusion for AI/ML development.
Expansion and Connectivity
Interface
Specifications
FMC LPC
1× GTH transceiver, 68 user I/O
PMOD
3× 12-pin headers
M.2 (SATA)
For SSD storage
Ethernet
Gigabit RJ45
USB
USB 3.0 + USB 2.0
The FMC LPC connector provides expansion for camera interfaces, additional sensors, or custom I/O. While limited to one GTH transceiver (versus 16 on ZCU102), this tradeoff keeps costs manageable for vision-focused applications that don’t require high-speed serial connectivity.
When evaluating ZCU104 price against alternatives, consider the complete value proposition.
Current Pricing
Source
Approximate Price
AMD Direct
$1,895 MSRP
DigiKey
~$1,899
Mouser
~$1,899
Avnet
~$1,895
Prices fluctuate based on availability and region. The ZCU104 price includes Vivado Design Suite: Design Edition license (node-locked, device-locked) plus access to SDSoC/Vitis development environments.
Comparing ZCU104 vs ZCU102 vs ZCU106
Feature
ZCU104
ZCU102
ZCU106
Device
XCZU7EV
XCZU9EG
XCZU7EV
Video Codec
Yes (H.264/H.265)
No
Yes (H.264/H.265)
Logic Cells
504K
600K
504K
UltraRAM
27.0 Mb
0
27.0 Mb
FMC GTH Lanes
1
16
7
SFP+ Ports
0
4
2
USB Camera Included
Yes
No
No
Price (approx.)
$1,899
$2,995
$3,570
Choose the ZCU104 when:
Hardware video codec is required
Cost optimization is important
Limited high-speed serial I/O is acceptable
AI/ML inference is the primary use case
Choose alternatives when:
Maximum GTH transceivers are needed (ZCU102)
Both video codec and substantial high-speed I/O are required (ZCU106)
Larger FPGA fabric is essential (ZCU102)
The ZCU104 offers the best value for embedded vision and AI applications where the hardware codec justifies the tradeoff in high-speed connectivity.
Machine Learning with Vitis AI on the ZCU104
The ZCU104 FPGA serves as a primary development platform for AMD’s Vitis AI machine learning toolchain.
Understanding the DPU Architecture
Vitis AI deploys a Deep Learning Processing Unit (DPU) in the FPGA fabric. The DPU is a soft IP core optimized for CNN inference:
DPU Feature
ZCU104 Configuration
Architecture
DPUCZDX8G
Recommended Config
B4096
Clock Speed
Up to 300 MHz
Peak Performance
~1200 GOP/s per core
Cores Supported
1-2
Precision
INT8
The B4096 architecture provides 4096 operations per clock cycle. At 300 MHz, a single DPU core delivers approximately 1200 GOP/s peak performance. The Zynq ZCU104 can support dual DPU cores for applications requiring higher throughput.
Typical Inference Performance
Real-world performance depends on network architecture and efficiency:
Network
Single DPU (fps)
Dual DPU (fps)
ResNet-50
~150
~280
MobileNet-V2
~300
~550
YOLOv4
~25
~45
SSD-MobileNet
~120
~220
These numbers represent typical measurements; actual performance varies based on implementation details, batch size, and pre/post-processing overhead.
Vitis AI Development Workflow
The development flow for deploying models on the ZCU104 follows these stages:
Stage
Tool
Output
Model Training
PyTorch/TensorFlow/Caffe
Float32 model
Quantization
Vitis AI Quantizer
INT8 model
Compilation
Vitis AI Compiler
.xmodel file
Deployment
VART Runtime
Executable inference
Quantization converts 32-bit floating-point weights to 8-bit integers, reducing model size and enabling efficient DPU execution. The Vitis AI Quantizer performs calibration using a representative dataset to minimize accuracy loss.
Compilation translates the quantized model into DPU instructions specific to the target architecture (B4096 for ZCU104). The output .xmodel file contains optimized instructions for the DPU.
Supported Frameworks and Models
Framework
Version Support
TensorFlow
1.15, 2.x
PyTorch
1.x, 2.x
Caffe
1.0
ONNX
Via conversion
The Vitis AI Model Zoo provides pre-trained and pre-compiled models ready for deployment:
Category
Example Models
Classification
ResNet, MobileNet, VGG, Inception
Detection
YOLO, SSD, RetinaNet, RefineDet
Segmentation
UNet, FPN, DeepLabV3
Pose Estimation
OpenPose, HRNet
Pre-compiled models for ZCU104 can be downloaded directly, enabling immediate inference without the full compilation workflow.
Embedded Vision Applications
The Xilinx ZCU104 targets specific embedded vision use cases.
Surveillance and Security
The combination of hardware video codec and DPU makes the ZCU104 ideal for intelligent video analytics:
Multi-stream encoding (up to 8× 1080p)
Real-time object detection
Face recognition
Behavior analysis
Edge-based analytics reducing network bandwidth
Advanced Driver Assistance Systems (ADAS)
Automotive vision applications benefit from:
Low-latency inference for safety-critical decisions
Hardware codec for dashcam/DVR functionality
Multiple camera input processing
Sensor fusion capabilities
Real-time lane detection and object classification
Medical Imaging
Healthcare applications leverage:
High-resolution image processing
AI-assisted diagnosis
Real-time ultrasound enhancement
Endoscopy video processing
HIPAA-compliant edge processing
Drones and Robotics
Autonomous systems utilize:
Lightweight inference for battery-powered applications
Real-time obstacle detection
Visual SLAM processing
Target tracking
Payload video encoding
Getting Started with the ZCU104
Setting up the ZCU104 for AI development requires both hardware and software preparation.
What is the difference between the ZCU104 and ZCU102 for machine learning?
The ZCU104 uses an EV-variant device (XCZU7EV) with integrated H.264/H.265 video codec and 27 Mb UltraRAM, while the ZCU102 uses an EG-variant (XCZU9EG) with larger FPGA fabric but no hardware codec or UltraRAM. For ML applications involving video streams, the ZCU104’s hardware codec frees FPGA resources for the DPU. For applications requiring maximum logic cells or extensive high-speed serial I/O, the ZCU102 may be preferred.
What is the ZCU104 price and what does it include?
The ZCU104 price is approximately $1,895-1,899 USD depending on distributor and region. The kit includes the evaluation board, 1080p60 USB camera, USB 3.0 hub, power supply, Ethernet cable, and licenses for Vivado Design Suite: Design Edition (node-locked) plus SDSoC development environment access. This represents significant value compared to purchasing components separately.
How many frames per second can the ZCU104 FPGA achieve for object detection?
Performance varies by network architecture. With a B4096 DPU configuration, the ZCU104 FPGA typically achieves 120+ fps for SSD-MobileNet, 25-45 fps for YOLOv4 (single vs dual DPU), and 150+ fps for ResNet-50 classification. Actual performance depends on input resolution, batch size, and pre/post-processing implementation. The Vitis AI Model Zoo provides benchmarks for specific models.
Can I use PYNQ with the Xilinx ZCU104?
Yes, PYNQ supports the Xilinx ZCU104 with available overlays for DPU inference. The DPU-PYNQ package provides pre-built bitstreams and Python libraries for deploying neural networks via Jupyter notebooks. This approach simplifies development for users preferring Python over C++ for application development, though maximum performance may require native C++ implementations.
What camera interfaces does the ZCU104 support?
The Zynq ZCU104 supports USB 3.0 cameras (included 1080p60 camera), MIPI CSI-2 cameras via FMC expansion cards, and HDMI video input. The USB camera enables immediate prototyping, while production systems typically use MIPI CSI-2 interfaces for direct sensor connection. The hardware codec processes video from any source for encoding, streaming, or storage.
Building AI-Powered Vision Systems
The ZCU104 represents AMD’s strategic platform for edge AI development. Its combination of hardware video codec, substantial FPGA fabric for DPU implementation, and comprehensive software support through Vitis AI creates a capable development environment for embedded vision applications.
Development Best Practices
Successfully deploying ML models on the ZCU104 FPGA requires attention to several practical considerations.
Model Selection: Not all neural network architectures are equally suited for DPU acceleration. Networks with standard convolution, pooling, and activation layers achieve high DPU efficiency. Custom or exotic layers may require CPU fallback, reducing performance. Before committing to a network architecture, verify DPU support in the Vitis AI documentation.
Quantization Strategy: INT8 quantization is mandatory for DPU deployment. The quantization process requires representative calibration data—typically 100-1000 images from your actual application domain. Poor calibration data leads to accuracy degradation. Always validate quantized model accuracy before deployment.
Memory Management: The Zynq ZCU104 has 2 GB PS DDR4 shared between Linux, applications, and DPU buffers. Large models or multi-stream processing can exhaust available memory. Monitor memory usage during development and consider the PL DDR4 SODIMM expansion for demanding applications.
Debugging and Profiling
Vitis AI provides profiling tools essential for optimization:
Tool
Purpose
Vitis Analyzer
Visualize execution timeline
DExplorer
DPU status and configuration
DDump
Model analysis and debugging
Power Profiler
Energy consumption measurement
The Vitis Analyzer shows exactly where time is spent—DPU compute, data transfer, or CPU processing. This visibility identifies bottlenecks and guides optimization efforts.
Power Considerations
For battery-powered or thermally constrained applications:
Configuration
Typical Power
Idle (Linux booted)
~8W
Single DPU active
~12W
Dual DPU active
~18W
VCU encoding 4K
~5W additional
Power consumption scales with DPU utilization and clock speed. Reducing clock frequency or using smaller DPU configurations (B2048 vs B4096) trades performance for power efficiency.
From Development to Production
The ZCU104 serves as a development platform; production deployment typically uses:
Platform
Use Case
Kria SOM
Production modules
Custom Board
High-volume products
ZCU104
Low-volume, prototyping
Design for production early by using portable VART APIs and avoiding ZCU104-specific dependencies. The Kria KV260 provides a production-oriented platform with similar capabilities.
For engineers evaluating development platforms, the ZCU104 price delivers excellent value given the included camera, software licenses, and vision-optimized hardware. The hardware codec differentiates it from EG-variant alternatives for applications involving video streams.
Success with the Xilinx ZCU104 requires understanding both the hardware capabilities and the Vitis AI software ecosystem. Start with pre-built images and Model Zoo examples, then progressively customize for your specific application requirements. The combination of powerful hardware and mature software tools enables deploying sophisticated AI systems at the edge with confidence.
Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Notes: For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.