Contact Sales & After-Sales Service

Contact & Quotation

  • Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
  • Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Drag & Drop Files, Choose Files to Upload You can upload up to 3 files.

Notes:
For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.

ZCU104: AI/ML Focused Zynq UltraScale+ Development Board

The ZCU104 has earned its place as the go-to development platform for engineers building AI-powered embedded vision systems. Having deployed multiple machine learning applications on this ZCU104 FPGA platform, I’ve come to appreciate why AMD designed it specifically for edge inference workloads.

This comprehensive guide covers everything engineers need to know about the Xilinx ZCU104—from hardware specifications and video codec capabilities to deploying neural networks with Vitis AI. Whether you’re evaluating the ZCU104 price against alternatives or ready to start your first DPU project, this article provides the technical foundation for successful development.

Why the ZCU104 Excels at AI and Machine Learning

The ZCU104 stands apart from other Zynq UltraScale+ boards because AMD designed it from the ground up for embedded vision and machine learning applications. The XCZU7EV device at its core includes features specifically optimized for these workloads.

The EV Advantage: Hardware Video Codec

Unlike EG-variant devices found on boards like the ZCU102, the Zynq ZCU104 uses an EV-variant device with integrated H.264/H.265 video codec. This hardware Video Codec Unit (VCU) handles video encoding and decoding without consuming FPGA resources or processor cycles.

VCU CapabilitySpecification
Encode/DecodeSimultaneous
Maximum Resolution4K @ 60 fps
Supported StandardsH.264 (AVC), H.265 (HEVC)
Multi-StreamUp to 8× 1080p @ 30 fps
Bit Depth8-bit and 10-bit
Chroma4:2:0, 4:2:2

For surveillance cameras, drones, and ADAS applications, this hardware codec is invaluable. Processing 4K video streams in software would consume significant ARM processor resources, leaving little headroom for machine learning inference. The VCU frees both the processors and FPGA fabric for AI acceleration.

UltraRAM: Essential for Vision Processing

EV devices include UltraRAM, which the ZCU104 FPGA uses extensively for image processing pipelines:

Memory TypeXCZU7EV CapacityUse Case
Block RAM11.0 MbLine buffers, small FIFOs
UltraRAM27.0 MbFrame buffers, feature maps
DSP Slices1,728Convolution operations

UltraRAM blocks provide 288 Kb each—8× denser than 36 Kb Block RAM. For DPU implementations storing activation maps and feature data, this additional on-chip memory significantly improves inference performance by reducing external memory bandwidth requirements.

ZCU104 Hardware Specifications

Understanding the complete ZCU104 hardware capabilities helps determine suitability for your application.

Processing Architecture

The XCZU7EV-2FFVC1156 MPSoC combines multiple processing domains:

Processing ElementSpecifications
Application ProcessorQuad-core ARM Cortex-A53 @ 1.5 GHz
Real-Time ProcessorDual-core ARM Cortex-R5F @ 600 MHz
Graphics ProcessorMali-400 MP2 GPU
Video Codec Unit4KP60 H.264/H.265
FPGA Fabric504K logic cells
DSP Slices1,728
Block RAM11.0 Mb
UltraRAM27.0 Mb

The quad-core Cortex-A53 handles Linux, application logic, and pre/post-processing for ML inference. The Cortex-R5F cores provide deterministic real-time processing when needed. The Mali-400 GPU accelerates display rendering and basic image operations.

Memory Configuration

The Xilinx ZCU104 provides substantial memory for AI workloads:

Memory InterfaceCapacityWidthSpeed
PS DDR4 Component2 GB64-bit2400 MT/s
PL DDR4 SODIMMExpandable64-bit2400 MT/s
Quad-SPI Flash128 MBx8

The PS-side 2 GB DDR4 connects directly to the processing system for Linux operation, application code, and inference model weights. The PL-side SODIMM socket allows adding memory dedicated to FPGA fabric operations—useful for applications requiring high-bandwidth video frame buffers.

Display and Video Interfaces

Video connectivity on the ZCU104 supports complete vision pipelines:

InterfaceDirectionSpecifications
DisplayPort 1.2aOutputUp to 4K @ 30 fps
HDMI 2.0Input/OutputVia 3× GTH transceivers
USB 3.0InputIncludes USB camera
MIPI CSI-2InputVia FMC expansion

The included 1080p60 USB 3.0 camera enables immediate prototyping without additional hardware purchases—a thoughtful inclusion for AI/ML development.

Expansion and Connectivity

InterfaceSpecifications
FMC LPC1× GTH transceiver, 68 user I/O
PMOD3× 12-pin headers
M.2 (SATA)For SSD storage
EthernetGigabit RJ45
USBUSB 3.0 + USB 2.0

The FMC LPC connector provides expansion for camera interfaces, additional sensors, or custom I/O. While limited to one GTH transceiver (versus 16 on ZCU102), this tradeoff keeps costs manageable for vision-focused applications that don’t require high-speed serial connectivity.

Read more Xilinx FPGA Series:

ZCU104 Price and Value Analysis

When evaluating ZCU104 price against alternatives, consider the complete value proposition.

Current Pricing

SourceApproximate Price
AMD Direct$1,895 MSRP
DigiKey~$1,899
Mouser~$1,899
Avnet~$1,895

Prices fluctuate based on availability and region. The ZCU104 price includes Vivado Design Suite: Design Edition license (node-locked, device-locked) plus access to SDSoC/Vitis development environments.

Comparing ZCU104 vs ZCU102 vs ZCU106

FeatureZCU104ZCU102ZCU106
DeviceXCZU7EVXCZU9EGXCZU7EV
Video CodecYes (H.264/H.265)NoYes (H.264/H.265)
Logic Cells504K600K504K
UltraRAM27.0 Mb027.0 Mb
FMC GTH Lanes1167
SFP+ Ports042
USB Camera IncludedYesNoNo
Price (approx.)$1,899$2,995$3,570

Choose the ZCU104 when:

  • Hardware video codec is required
  • Cost optimization is important
  • Limited high-speed serial I/O is acceptable
  • AI/ML inference is the primary use case

Choose alternatives when:

  • Maximum GTH transceivers are needed (ZCU102)
  • Both video codec and substantial high-speed I/O are required (ZCU106)
  • Larger FPGA fabric is essential (ZCU102)

The ZCU104 offers the best value for embedded vision and AI applications where the hardware codec justifies the tradeoff in high-speed connectivity.

Machine Learning with Vitis AI on the ZCU104

The ZCU104 FPGA serves as a primary development platform for AMD’s Vitis AI machine learning toolchain.

Understanding the DPU Architecture

Vitis AI deploys a Deep Learning Processing Unit (DPU) in the FPGA fabric. The DPU is a soft IP core optimized for CNN inference:

DPU FeatureZCU104 Configuration
ArchitectureDPUCZDX8G
Recommended ConfigB4096
Clock SpeedUp to 300 MHz
Peak Performance~1200 GOP/s per core
Cores Supported1-2
PrecisionINT8

The B4096 architecture provides 4096 operations per clock cycle. At 300 MHz, a single DPU core delivers approximately 1200 GOP/s peak performance. The Zynq ZCU104 can support dual DPU cores for applications requiring higher throughput.

Typical Inference Performance

Real-world performance depends on network architecture and efficiency:

NetworkSingle DPU (fps)Dual DPU (fps)
ResNet-50~150~280
MobileNet-V2~300~550
YOLOv4~25~45
SSD-MobileNet~120~220

These numbers represent typical measurements; actual performance varies based on implementation details, batch size, and pre/post-processing overhead.

Vitis AI Development Workflow

The development flow for deploying models on the ZCU104 follows these stages:

StageToolOutput
Model TrainingPyTorch/TensorFlow/CaffeFloat32 model
QuantizationVitis AI QuantizerINT8 model
CompilationVitis AI Compiler.xmodel file
DeploymentVART RuntimeExecutable inference

Quantization converts 32-bit floating-point weights to 8-bit integers, reducing model size and enabling efficient DPU execution. The Vitis AI Quantizer performs calibration using a representative dataset to minimize accuracy loss.

Compilation translates the quantized model into DPU instructions specific to the target architecture (B4096 for ZCU104). The output .xmodel file contains optimized instructions for the DPU.

Supported Frameworks and Models

FrameworkVersion Support
TensorFlow1.15, 2.x
PyTorch1.x, 2.x
Caffe1.0
ONNXVia conversion

The Vitis AI Model Zoo provides pre-trained and pre-compiled models ready for deployment:

CategoryExample Models
ClassificationResNet, MobileNet, VGG, Inception
DetectionYOLO, SSD, RetinaNet, RefineDet
SegmentationUNet, FPN, DeepLabV3
Pose EstimationOpenPose, HRNet

Pre-compiled models for ZCU104 can be downloaded directly, enabling immediate inference without the full compilation workflow.

Embedded Vision Applications

The Xilinx ZCU104 targets specific embedded vision use cases.

Surveillance and Security

The combination of hardware video codec and DPU makes the ZCU104 ideal for intelligent video analytics:

  • Multi-stream encoding (up to 8× 1080p)
  • Real-time object detection
  • Face recognition
  • Behavior analysis
  • Edge-based analytics reducing network bandwidth

Advanced Driver Assistance Systems (ADAS)

Automotive vision applications benefit from:

  • Low-latency inference for safety-critical decisions
  • Hardware codec for dashcam/DVR functionality
  • Multiple camera input processing
  • Sensor fusion capabilities
  • Real-time lane detection and object classification

Medical Imaging

Healthcare applications leverage:

  • High-resolution image processing
  • AI-assisted diagnosis
  • Real-time ultrasound enhancement
  • Endoscopy video processing
  • HIPAA-compliant edge processing

Drones and Robotics

Autonomous systems utilize:

  • Lightweight inference for battery-powered applications
  • Real-time obstacle detection
  • Visual SLAM processing
  • Target tracking
  • Payload video encoding

Getting Started with the ZCU104

Setting up the ZCU104 for AI development requires both hardware and software preparation.

Kit Contents

The EK-U1-ZCU104-G kit includes:

ItemDescription
ZCU104 BoardXCZU7EV-2FFVC1156 MPSoC
USB Camera1080p60 USB 3.0 camera
USB Hub4-port USB 3.0 hub
Power Supply12V adapter and cables
Ethernet CableFor network connectivity
Vivado LicenseDesign Suite: Design Edition
SDSoC AccessDevelopment environment

Initial Hardware Setup

  1. Connect USB cable to JTAG/UART port (J164)
  2. Connect Ethernet for network access
  3. Attach USB camera for vision demos
  4. Connect DisplayPort monitor for video output
  5. Insert SD card with Vitis AI image
  6. Configure boot mode switches (SW6)
  7. Apply 12V power

Read more Xilinx Products:

Boot Mode Configuration

SW6[4:1]Boot Mode
1110SD Card
0010QSPI32
0000JTAG

For Vitis AI development, SD card boot mode provides the fastest path to running inference demos.

Running First Inference

With the Vitis AI pre-built image:

  1. Boot from SD card
  2. Connect via SSH or serial console
  3. Navigate to example applications
  4. Run classification demo with USB camera
  5. Observe real-time inference on DisplayPort output

The included examples demonstrate ResNet classification, YOLO detection, and segmentation networks running on the DPU.

Performance Optimization Techniques

Maximizing inference performance on the ZCU104 FPGA requires understanding system-level considerations.

DPU Configuration Options

ParameterOptionsImpact
ArchitectureB512-B4096Throughput vs resources
Clock Speed150-350 MHzPerformance vs power
RAM UsageLow/HighMemory efficiency
Channel AugmentationEnabled/DisabledSpecific layer support

For the ZCU104, B4096 at 300 MHz typically provides optimal performance. Higher clock speeds are possible but may require careful timing closure.

Multi-Threading Strategies

VART (Vitis AI Runtime) supports multi-threaded inference:

ThreadsUse Case
1Minimum latency per frame
2-4Balanced throughput/latency
8+Maximum throughput

Increasing thread count improves DPU utilization but adds latency for individual frames. Match threading strategy to application requirements.

Pipeline Optimization

Complete vision pipelines include:

StageProcessorOptimization
CaptureARM/DMAUse GStreamer plugins
Pre-processingARM/GPUOpenCV acceleration
InferenceDPUBatch size tuning
Post-processingARMEfficient NMS implementation
Encode/DecodeVCUHardware codec
DisplayDisplayPortDirect buffer sharing

Optimizing data movement between stages often yields greater improvements than DPU optimization alone.

Essential Resources for ZCU104 Development

Official Documentation

DocumentNumberDescription
User GuideUG1267Complete board documentation
Vitis AI User GuideUG1414ML development workflow
Vitis AI Model ZooPre-trained models
Quick Start GuideXTP449Initial setup

Download Links

ResourceURL
Product Pagehttps://www.xilinx.com/products/boards-and-kits/zcu104.html
Vitis AI GitHubhttps://github.com/Xilinx/Vitis-AI
Model Zoohttps://github.com/Xilinx/Vitis-AI/tree/master/model_zoo
Tutorialshttps://github.com/Xilinx/Vitis-AI-Tutorials
Pre-built Imageshttps://www.xilinx.com/support/download.html

Community and Support

ResourceDescription
AMD Adaptive SupportOfficial technical support
Vitis AI ForumCommunity discussions
GitHub IssuesBug reports, feature requests
FPGA DeveloperTutorials and projects

Frequently Asked Questions

What is the difference between the ZCU104 and ZCU102 for machine learning?

The ZCU104 uses an EV-variant device (XCZU7EV) with integrated H.264/H.265 video codec and 27 Mb UltraRAM, while the ZCU102 uses an EG-variant (XCZU9EG) with larger FPGA fabric but no hardware codec or UltraRAM. For ML applications involving video streams, the ZCU104’s hardware codec frees FPGA resources for the DPU. For applications requiring maximum logic cells or extensive high-speed serial I/O, the ZCU102 may be preferred.

What is the ZCU104 price and what does it include?

The ZCU104 price is approximately $1,895-1,899 USD depending on distributor and region. The kit includes the evaluation board, 1080p60 USB camera, USB 3.0 hub, power supply, Ethernet cable, and licenses for Vivado Design Suite: Design Edition (node-locked) plus SDSoC development environment access. This represents significant value compared to purchasing components separately.

How many frames per second can the ZCU104 FPGA achieve for object detection?

Performance varies by network architecture. With a B4096 DPU configuration, the ZCU104 FPGA typically achieves 120+ fps for SSD-MobileNet, 25-45 fps for YOLOv4 (single vs dual DPU), and 150+ fps for ResNet-50 classification. Actual performance depends on input resolution, batch size, and pre/post-processing implementation. The Vitis AI Model Zoo provides benchmarks for specific models.

Can I use PYNQ with the Xilinx ZCU104?

Yes, PYNQ supports the Xilinx ZCU104 with available overlays for DPU inference. The DPU-PYNQ package provides pre-built bitstreams and Python libraries for deploying neural networks via Jupyter notebooks. This approach simplifies development for users preferring Python over C++ for application development, though maximum performance may require native C++ implementations.

What camera interfaces does the ZCU104 support?

The Zynq ZCU104 supports USB 3.0 cameras (included 1080p60 camera), MIPI CSI-2 cameras via FMC expansion cards, and HDMI video input. The USB camera enables immediate prototyping, while production systems typically use MIPI CSI-2 interfaces for direct sensor connection. The hardware codec processes video from any source for encoding, streaming, or storage.

Building AI-Powered Vision Systems

The ZCU104 represents AMD’s strategic platform for edge AI development. Its combination of hardware video codec, substantial FPGA fabric for DPU implementation, and comprehensive software support through Vitis AI creates a capable development environment for embedded vision applications.

Development Best Practices

Successfully deploying ML models on the ZCU104 FPGA requires attention to several practical considerations.

Model Selection: Not all neural network architectures are equally suited for DPU acceleration. Networks with standard convolution, pooling, and activation layers achieve high DPU efficiency. Custom or exotic layers may require CPU fallback, reducing performance. Before committing to a network architecture, verify DPU support in the Vitis AI documentation.

Quantization Strategy: INT8 quantization is mandatory for DPU deployment. The quantization process requires representative calibration data—typically 100-1000 images from your actual application domain. Poor calibration data leads to accuracy degradation. Always validate quantized model accuracy before deployment.

Memory Management: The Zynq ZCU104 has 2 GB PS DDR4 shared between Linux, applications, and DPU buffers. Large models or multi-stream processing can exhaust available memory. Monitor memory usage during development and consider the PL DDR4 SODIMM expansion for demanding applications.

Debugging and Profiling

Vitis AI provides profiling tools essential for optimization:

ToolPurpose
Vitis AnalyzerVisualize execution timeline
DExplorerDPU status and configuration
DDumpModel analysis and debugging
Power ProfilerEnergy consumption measurement

The Vitis Analyzer shows exactly where time is spent—DPU compute, data transfer, or CPU processing. This visibility identifies bottlenecks and guides optimization efforts.

Power Considerations

For battery-powered or thermally constrained applications:

ConfigurationTypical Power
Idle (Linux booted)~8W
Single DPU active~12W
Dual DPU active~18W
VCU encoding 4K~5W additional

Power consumption scales with DPU utilization and clock speed. Reducing clock frequency or using smaller DPU configurations (B2048 vs B4096) trades performance for power efficiency.

From Development to Production

The ZCU104 serves as a development platform; production deployment typically uses:

PlatformUse Case
Kria SOMProduction modules
Custom BoardHigh-volume products
ZCU104Low-volume, prototyping

Design for production early by using portable VART APIs and avoiding ZCU104-specific dependencies. The Kria KV260 provides a production-oriented platform with similar capabilities.

For engineers evaluating development platforms, the ZCU104 price delivers excellent value given the included camera, software licenses, and vision-optimized hardware. The hardware codec differentiates it from EG-variant alternatives for applications involving video streams.

Success with the Xilinx ZCU104 requires understanding both the hardware capabilities and the Vitis AI software ecosystem. Start with pre-built images and Model Zoo examples, then progressively customize for your specific application requirements. The combination of powerful hardware and mature software tools enables deploying sophisticated AI systems at the edge with confidence.

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Sales & After-Sales Service

Contact & Quotation

  • Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.

  • Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.

Drag & Drop Files, Choose Files to Upload You can upload up to 3 files.

Notes:
For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.