Python FPGA Development with Xilinx: PYNQ & Beyond

When I first heard about programming FPGAs with Python, I was skeptical. After years of wrestling with Verilog timing constraints and cryptic synthesis errors, the idea of controlling hardware accelerators from a Jupyter notebook seemed almost too good to be true. But after spending considerable time with the PYNQ ecosystem and various Xilinx Python tools, I can say the landscape has genuinely changed for embedded engineers who want to leverage programmable logic without becoming HDL experts.

This guide covers everything from getting started with PYNQ to building custom hardware overlays, machine learning acceleration, and alternative Python-based HDL frameworks that can target Xilinx devices.

Request PCB Manufacturing & Assembly Quote Now

Understanding the Xilinx Python Ecosystem

The Xilinx Python ecosystem centers around PYNQ (Python + Zynq), but extends far beyond a single framework. Before diving into specifics, let’s understand how these pieces fit together.

What Makes Zynq Python Development Different

Traditional FPGA development requires intimate knowledge of hardware description languages like Verilog or VHDL. The Zynq architecture changed this equation by combining an ARM processing system (PS) with programmable logic (PL) on a single chip. This architecture enables a natural division of labor: Python runs on the ARM cores while hardware accelerators execute in the FPGA fabric.

The key insight behind Zynq Python development is that most applications don’t need custom RTL. Pre-built hardware overlays can handle common acceleration tasks, leaving software developers free to focus on their algorithms rather than clock domain crossings.

Component	Purpose	Language
Processing System (PS)	Runs Linux, Python, application logic	Python/C/C++
Programmable Logic (PL)	Hardware accelerators, custom IP	VHDL/Verilog/HLS
AXI Interconnect	PS-PL communication	Hardware protocol
Overlays	Pre-built hardware configurations	Bitstream + drivers

Getting Started with PYNQ on Zynq Boards

PYNQ makes Xilinx Python development accessible by providing a complete software stack, pre-built overlays, and example notebooks. The framework supports multiple boards across the Zynq, Zynq UltraScale+, Kria, and Alveo families.

Supported PYNQ Development Boards

Choosing the right board depends on your application requirements and budget. Here’s a comparison of popular options:

Board	SoC/FPGA	RAM	Key Features	Price Range
PYNQ-Z2	Zynq XC7Z020	512MB DDR3	HDMI in/out, Audio, Arduino/Pmod headers	~$100
Ultra96-V2	Zynq UltraScale+ ZU3EG	2GB DDR4	WiFi/BT, 96Boards expansion	~$220
ZCU104	Zynq UltraScale+ XCZU7EV	2GB DDR4	Video codec, DisplayPort	~$1,100
Kria KV260	Zynq UltraScale+ K26 SOM	4GB DDR4	Vision AI, Smart camera	~$250

For beginners, the PYNQ-Z2 offers the best value. The Zynq XC7Z020 provides enough resources for learning while the board includes practical peripherals like HDMI input/output for video processing projects.

Setting Up Your PYNQ Environment

Getting PYNQ running takes about 15 minutes. Here’s the basic workflow:

Download the appropriate SD card image from the PYNQ website. Flash it to a microSD card using tools like Etcher or dd. Insert the card into your board and connect Ethernet. Power on and wait for the boot sequence to complete (about 60 seconds). Access Jupyter notebooks at http://pynq:9090 or http://192.168.2.99:9090.

The default credentials are username “xilinx” and password “xilinx”. From there, you’re immediately ready to run Python code that interacts with hardware.

Read more Xilinx FPGA Series:

Best Zynq UltraScale+ Development Boards Compared (2024)

How to Install Vivado on Windows 11: Step-by-Step Tutorial

Spartan-3E FPGA Board: Beginner Tutorial & Project Ideas

Where to Buy Xilinx FPGAs: Complete Authorized Distributors Guide

Xilinx Alveo Accelerator Cards: Data Center FPGA Guide

Xilinx AMD Acquisition: What It Means for FPGA Developers

Xilinx Artix-7 FPGA Family: Features, Specs & Selection Guide

Xilinx Artix-7 FPGA Price Guide

Xilinx CPLD Programmer and Xilinx CPLD Board: The Complete Guide for Engineers

Xilinx FPGA Programming for Beginners: First Project Tutorial

Xilinx JTAG Programming: Complete Hardware Setup & Debug Tutorial

Xilinx Kintex-7 FPGA: Mid-Range Performance Powerhouse

Xilinx Spartan-3 FPGA: Legacy Support & Migration Guide

Xilinx Spartan-6 FPGA: Still Relevant? Complete 2025 Guide

Xilinx Spartan-7 FPGA: Low-Cost Solution for Embedded Design

Xilinx Virtex-7 FPGA: High-End Performance for Critical Applications

Working with PYNQ Overlays

Overlays are the secret sauce that makes Zynq Python development productive. Think of an overlay as a hardware library – you load it when needed, use its functions through a Python API, and swap it out for another overlay when your requirements change.

Understanding the Overlay Architecture

An overlay consists of three components: the bitstream (.bit file), the hardware handoff file (.hwh), and Python driver code. When you instantiate an overlay in Python, PYNQ automatically downloads the bitstream to the FPGA and creates Python objects for each IP block.

from pynq import Overlay

# Load the base overlay

overlay = Overlay(“base.bit”)

# Access hardware components through Python

overlay.leds[0].on()

overlay.buttons[0].read()

The base overlay included with each PYNQ board provides drivers for on-board peripherals: LEDs, buttons, switches, HDMI, audio, and GPIO interfaces. This lets you start experimenting immediately without building custom hardware.

Available Pre-Built Overlays

Beyond the base overlay, Xilinx and the community provide specialized overlays for common use cases:

Overlay	Purpose	Key IP Blocks
Base	Board peripherals	GPIO, Video, Audio
Logictools	Digital pattern generation	Pattern Generator, FSM, Boolean
PYNQ-ComputerVision	Image processing	OpenCV-compatible filters
BNN-PYNQ	Binary neural networks	Quantized inference engine
DPU-PYNQ	Deep learning	Vitis AI DPU

The logictools overlay deserves special mention for hardware debugging. It turns your PYNQ board into a configurable logic analyzer and pattern generator, perfect for testing external circuits or learning digital logic concepts.

Creating Custom Hardware Overlays

Pre-built overlays cover many scenarios, but eventually you’ll want to accelerate your own algorithms. Creating custom overlays requires Vivado and some understanding of the Zynq architecture, though not necessarily deep HDL expertise.

The Custom Overlay Development Flow

Building a custom overlay follows this general workflow:

Create IP blocks using Vivado HLS (C/C++ to RTL) or traditional HDL
Assemble the system in Vivado IP Integrator
Connect IP to the Zynq PS through AXI interfaces
Generate the bitstream and hardware handoff files
Write Python driver code to interface with your IP

The most accessible path for software developers is Vivado HLS. You write C or C++ code with specific pragmas, and the tool synthesizes it into RTL. This doesn’t produce optimal hardware, but it’s often fast enough and dramatically reduces development time.

AXI Interface Considerations

The communication between PS and PL happens through AXI interfaces. Understanding which interface to use is crucial for performance:

AXI GP (General Purpose): Low bandwidth, PS acts as master. Use for configuration registers and control signals.

AXI HP (High Performance): High bandwidth, PL acts as master. Use for DMA transfers and video streams. The Zynq-7000 provides four HP ports.

AXI ACP (Accelerator Coherency Port): Cache-coherent access to DDR. Use when sharing data structures between PS and PL without explicit cache management.

For most acceleration scenarios, you’ll use AXI GP for control and AXI HP for bulk data transfer via DMA.

Machine Learning Acceleration with Zynq Python

Machine learning inference is where Xilinx Python development really shines. The combination of PYNQ’s Python accessibility and the DPU’s inference performance creates a compelling platform for edge AI applications.

Vitis AI and the DPU

The Deep Learning Processor Unit (DPU) is AMD/Xilinx’s configurable inference accelerator for neural networks. It supports common architectures including ResNet, YOLO, SSD, and various transformer models.

The DPU-PYNQ project provides pre-built overlays and Python APIs that make deployment remarkably simple:

from pynq_dpu import DpuOverlay

overlay = DpuOverlay(“dpu.bit”)

overlay.load_model(“resnet50.xmodel”)

# Run inference

result = overlay.run(input_image)

DPU configurations vary by board. Larger boards like the ZCU104 can run multiple DPU cores simultaneously, while the Ultra96-V2 runs a smaller B1024 configuration suitable for real-time video inference at reduced precision.

Performance Expectations

Edge AI performance depends heavily on the specific board and model. Here are typical results for image classification:

Board	DPU Config	ResNet-50 (fps)	YOLO-v3 (fps)
Ultra96-V2	B1024 x1	60-80	10-15
ZCU104	B4096 x2	150-200	30-50
Kria KV260	B4096 x1	100-140	20-35

These numbers far exceed what’s achievable with pure CPU inference on the same ARM cores, demonstrating the value of hardware acceleration.

Read more Xilinx Products:

XCVU35P-L2FSVH2104E: AMD Virtex UltraScale+ HBM FPGA Specifications, Features & Applications

XCVU35P-1FSVH2892E: High-Performance AMD Virtex UltraScale+ HBM FPGA

XC2C256-7FT256I CoolRunner-II CPLD: High-Performance Programmable Logic Device

XC2C128-7VQ100C: High-Performance CoolRunner-II CPLD for Advanced Digital Design

XC18V01SO20I: High-Performance Configuration PROM for FPGA Applications

XQ18V04VQ44N: Military-Grade 4Mbit FPGA Configuration PROM by AMD Xilinx

XC18V02VQG44I: Complete Guide to Xilinx 2Mbit In-System Programmable Configuration PROM

XC18V02PC44C0936: AMD Xilinx 2Mbit In-System Programmable Configuration PROM for FPGA Applications

XC2C512-7FT256C: AMD Xilinx CoolRunner-II CPLD | 512 Macrocell Programmable Logic Device

XC17S30PC: Xilinx Spartan OTP Configuration PROM for FPGA Applications

Beyond PYNQ: Alternative Python HDL Frameworks

PYNQ excels at leveraging pre-built hardware, but what if you want to describe hardware itself using Python? Several frameworks enable this, each with different philosophies and target applications.

Amaranth HDL (formerly nMigen)

Amaranth is a modern Python-based hardware description language that generates synthesizable Verilog. Unlike PYNQ, Amaranth doesn’t abstract away hardware – it provides Python syntax for describing logic that will become actual gates and flip-flops.

Key features of Amaranth include a clean module system, strong type checking, built-in simulation, and direct integration with open-source synthesis tools. It’s particularly popular in the open-source hardware community.

from amaranth import *

class Counter(Elaboratable):

def __init__(self, width):

self.value = Signal(width)

def elaborate(self, platform):

m = Module()

m.d.sync += self.value.eq(self.value + 1)

return m

MyHDL and Migen

MyHDL is one of the oldest Python HDL projects, dating back to 2004. It uses Python generators to model concurrent hardware processes and can convert designs to VHDL or Verilog. While development has slowed, it remains functional and has extensive documentation.

Migen emerged from the LiteX project and focuses on building complete SoC designs. The LiteX ecosystem provides IP cores for common peripherals (UART, SPI, Ethernet, DDR controllers) that can be assembled into working systems entirely from Python.

Comparison of Python HDL Approaches

Framework	Use Case	Output	Xilinx Support
PYNQ	Using pre-built hardware	Python control code	Native
Amaranth	Designing hardware	Verilog	Via Vivado
Migen/LiteX	Building SoCs	Verilog	Via Vivado
MyHDL	Educational, prototyping	VHDL/Verilog	Via Vivado
cocotb	Verification/testing	Testbenches	Native simulation

For most Xilinx Python projects, PYNQ remains the practical choice. The alternative frameworks shine when you need to create novel hardware or target platforms beyond Xilinx’s ecosystem.

Practical Xilinx Python Development Tips

After working with these tools across multiple projects, here are the lessons that aren’t always obvious from documentation.

Memory Management Matters

PYNQ provides allocate() for creating DMA-capable buffers. Use these contiguous memory regions for any data transferred to hardware accelerators. Standard NumPy arrays won’t work correctly with DMA operations.

from pynq import allocate

# Create buffer for hardware DMA

input_buffer = allocate(shape=(1920, 1080, 3), dtype=np.uint8)

# Copy data to buffer

input_buffer[:] = frame_data

# Now safe to pass to hardware

dma.sendchannel.transfer(input_buffer)

Overlay Loading Takes Time

Loading an overlay reconfigures the FPGA, which takes 100-500ms depending on bitstream size. Don’t load overlays in tight loops. Load once at startup and reuse the instance throughout your application.

Clock Domain Awareness

Even when using PYNQ, understanding clock domains prevents subtle bugs. The PS runs at its own frequency while PL clocks are independently configurable. Data crossing between domains needs proper synchronization, typically handled by the AXI infrastructure but sometimes requiring attention in custom designs.

Useful Resources for Xilinx Python Development

Here are the essential resources for deepening your Xilinx Python and Zynq Python knowledge:

Resource	Description	Link
PYNQ Documentation	Official framework documentation	pynq.readthedocs.io
PYNQ GitHub	Source code and examples	github.com/Xilinx/PYNQ
PYNQ Workshop	Hands-on training materials	github.com/Xilinx/PYNQ_Workshop
DPU-PYNQ	Deep learning acceleration	github.com/Xilinx/DPU-PYNQ
PYNQ Community Forum	Technical support	discuss.pynq.io
Vitis AI Model Zoo	Pre-trained ML models	github.com/Xilinx/Vitis-AI
Amaranth Documentation	Python HDL reference	amaranth-lang.org/docs
LiteX Wiki	SoC building framework	github.com/enjoy-digital/litex/wiki

Frequently Asked Questions About Xilinx Python Development

Can I use PYNQ without knowing any Verilog or VHDL?

Absolutely. The entire point of PYNQ is enabling software developers to use FPGA acceleration without hardware design expertise. The pre-built overlays handle common scenarios like video processing, GPIO control, and machine learning inference. You only need HDL knowledge if you want to create custom hardware accelerators beyond what’s available in existing overlays.

What’s the performance difference between Python on PYNQ versus C/C++?

For control-plane operations (configuring registers, managing data flow), the performance difference is negligible. For data-plane operations, performance depends on where the work happens. If the heavy computation runs in the FPGA fabric, Python overhead is minimal since you’re just initiating DMA transfers. For CPU-intensive tasks, C/C++ remains faster, but PYNQ supports mixed Python/C++ workflows through ctypes and Cython.

Can I run Zynq Python code on a standard Raspberry Pi for development?

Not directly, since PYNQ requires the Zynq hardware for overlay operations. However, you can develop and test pure Python logic on any system. For hardware-in-the-loop development, consider using PYNQ’s built-in simulation capabilities or the cocotb framework for testbench development. The Jupyter notebook interface also allows remote development against a physical board.

Which board should I buy for machine learning projects?

For learning and prototyping, the Ultra96-V2 offers good value with WiFi, sufficient DPU performance, and reasonable cost. For production-oriented work or larger models, the Kria KV260 provides better performance and a system-on-module form factor suitable for custom carrier boards. The PYNQ-Z2, while excellent for general learning, lacks the UltraScale+ architecture needed for modern Vitis AI workloads.

How does PYNQ compare to other edge AI platforms like NVIDIA Jetson?

Both platforms excel at edge AI but serve different niches. NVIDIA Jetson provides GPU-based acceleration with mature CUDA tooling, making it ideal for applications already developed for GPU inference. PYNQ/Zynq offers more flexibility for custom hardware acceleration beyond neural networks, deterministic latency for control applications, and integration with other FPGA IP. For pure neural network inference, Jetson often provides simpler deployment; for mixed workloads requiring custom hardware, Zynq typically wins.

Video Processing with Zynq Python

One of the most compelling applications for Xilinx Python development is real-time video processing. The PYNQ-Z2’s HDMI input and output ports make it particularly suitable for this use case.

HDMI Pipeline Architecture

The base overlay provides a complete video pipeline that you can manipulate from Python:

from pynq.overlays.base import BaseOverlay

from pynq.lib.video import *

base = BaseOverlay(“base.bit”)

# Configure HDMI input and output

hdmi_in = base.video.hdmi_in

hdmi_out = base.video.hdmi_out

hdmi_in.configure()

hdmi_out.configure(hdmi_in.mode)

hdmi_in.start()

hdmi_out.start()

# Process frames in Python

while True:

frame = hdmi_in.readframe()

# Apply processing here

hdmi_out.writeframe(frame)

Software-only processing achieves roughly 3-5 frames per second for 1080p video. Adding hardware acceleration through custom overlays can push this to 30+ fps for many filters, demonstrating the practical value of the hybrid PS-PL architecture.

OpenCV Integration

PYNQ includes OpenCV, enabling familiar image processing workflows. You can capture frames from HDMI, process them with OpenCV functions, and display results – all from Python. For production applications, the compute-intensive OpenCV functions can be replaced with hardware-accelerated equivalents through custom overlays.

The Future of Python in FPGA Development

The trend toward higher-level FPGA development tools shows no signs of slowing. AMD’s acquisition of Xilinx has accelerated investment in software stacks, and PYNQ continues receiving updates with broader board support and improved integration with Vitis AI.

For embedded engineers, this means Python increasingly becomes a viable option for systems that previously demanded low-level HDL expertise. You can prototype in Python, identify performance bottlenecks, and selectively accelerate critical paths – all without completely changing your development workflow.

The combination of Xilinx Python tools, Zynq Python capabilities, and the broader Python ecosystem creates a genuinely productive environment for embedded AI and acceleration projects. Whether you’re using PYNQ’s overlays or diving deeper with Amaranth HDL, Python has earned its place in the FPGA developer’s toolkit.

Contact Sales & After-Sales Service

Printed Circuit Board

RF PCB

PCB Surface Finish

Special Process

Special Materials

PCB Assembly

PCBA Services

Testing

Application

Resources

News & Blog