Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Notes: For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.
When AMD acquired Xilinx in 2022, they inherited what remains the most powerful FPGA family on the market. Having designed PCBs around the Virtex UltraScale+ for demanding applications from 100G networking to radar systems, I can say firsthand that these devices deliver capabilities that simply aren’t available elsewhere. This guide covers everything engineers need to know about the Xilinx Virtex UltraScale+ FPGA family—from device selection to PCB implementation.
The Virtex UltraScale+ represents AMD/Xilinx’s flagship FPGA portfolio, built on TSMC’s 16nm FinFET+ process technology. These devices occupy the highest performance tier of the UltraScale architecture, offering the industry’s highest transceiver bandwidth, DSP compute capacity, and on-chip memory density in a programmable device.
What distinguishes the Xilinx Virtex UltraScale+ FPGA from competing families is the combination of 3D IC technology with advanced power optimization. Using stacked silicon interconnect (SSI) technology, AMD breaks through traditional die size limitations to deliver devices with up to 9 million logic cells—something that would be impossible with monolithic silicon. The registered inter-die routing enables clock frequencies exceeding 600 MHz, providing a virtual monolithic design experience despite the multi-die architecture.
Virtex UltraScale+ Sub-Families
The Virtex UltraScale+ portfolio divides into four distinct sub-families, each optimized for different application requirements:
The foundation Virtex UltraScale+ devices span from the entry-level VU3P to the massive VU19P, providing options for virtually any high-performance application:
Device
System Logic Cells
CLB LUTs
Flip-Flops
Block RAM (Mb)
UltraRAM (Mb)
DSP Slices
XCVU3P
862,050
394,080
788,160
25.3
90.0
2,280
XCVU5P
1,313,763
600,577
1,201,154
36.0
132.2
3,474
XCVU7P
1,724,100
788,160
1,576,320
50.6
180.0
4,560
XCVU9P
2,586,150
1,182,240
2,364,480
75.9
270.0
6,840
XCVU11P
2,835,000
1,296,000
2,592,000
70.9
270.0
9,216
XCVU13P
3,780,000
1,728,000
3,456,000
94.5
360.0
12,288
XCVU19P
8,937,600
4,085,760
8,171,520
75.9
90.0
3,840
The VU19P deserves special attention as the world’s largest FPGA, containing 35 billion transistors across its SSI die configuration. With 9 million system logic cells, over 2,000 user I/Os, and 80 GTY transceivers, the VU19P enables prototyping and emulation of the most advanced ASICs and SoCs before silicon tape-out.
High Bandwidth Memory (HBM) Series
For applications demanding extreme memory bandwidth, the Virtex UltraScale+ HBM devices integrate high bandwidth memory directly into the package using chip-on-wafer-on-substrate (CoWoS) technology:
Device
Logic Cells
HBM Capacity
HBM Bandwidth
GTY Transceivers
XCVU31P
961,800
4 GB
230 GB/s
32
XCVU33P
961,800
8 GB
460 GB/s
32
XCVU35P
1,906,800
8 GB
460 GB/s
64
XCVU37P
2,851,800
8 GB
460 GB/s
96
XCVU45P
1,906,800
16 GB
460 GB/s
64
XCVU47P
2,851,800
16 GB
460 GB/s
96
The integrated HBM2 delivers 20× more bandwidth than DDR4 DIMMs while consuming only ~7 pJ/bit. The embedded HBM controller saves approximately 250K LUTs that would otherwise be required for external memory interfaces, freeing resources for application logic.
The transceiver architecture is where the Xilinx Virtex UltraScale+ FPGA truly differentiates itself. Two transceiver types provide flexibility across different applications:
Transceiver
Line Rate Range
Key Protocols
Features
GTY
500 Mb/s – 32.75 Gb/s
25GE, 100GE, PCIe Gen4
NRZ signaling, backplane capable
GTM
19.6 – 58 Gb/s
50/100/200/400GE
PAM4 signaling, chip-to-optics
The GTY transceivers feature third-generation auto-adaptive equalization technology, enabling robust operation across the most challenging backplane channels. For 100GBASE-KR4 applications, the receivers achieve IEEE specification compliance without requiring manual tuning—a significant time saver during board bring-up.
The GTM transceivers in the 58G PAM4 variants double the bandwidth on existing infrastructure. By using 4-level pulse amplitude modulation, these devices support the latest 50G/100G/200G/400G optics and protocols with superior port density. The built-in KP4-FEC handles the error correction required for PAM4 signaling.
UltraRAM On-Chip Memory
Beyond traditional block RAM, Virtex UltraScale+ devices include UltraRAM—a new memory resource providing 8× the capacity per block compared to standard BRAM:
Memory Type
Block Size
Total Capacity (VU13P)
Characteristics
Block RAM
36 Kb
94.5 Mb
Dual-port, FIFO mode
UltraRAM
288 Kb
360.0 Mb
Cascade-able, deep sleep mode
Distributed RAM
64 bits/LUT
48.3 Mb
Fast, small storage
UltraRAM blocks can cascade to create extremely deep memory structures without consuming routing resources. This architecture is ideal for packet buffering, video line buffers, and coefficient storage in signal processing applications. The deep sleep power mode allows UltraRAM to retain data while minimizing static power consumption.
DSP48E2 Slice Architecture
The DSP48E2 slices in Virtex UltraScale+ provide substantial improvements over previous generations:
Feature
Specification
Pre-adder
27-bit
Multiplier
27×18 (signed)
Accumulator
48-bit
Max frequency
891 MHz (-3 speed grade)
Peak INT8 performance
Up to 38 TOPs
Peak FP32 performance
Up to 22 TeraMACs
The architecture supports single-cycle multiply-accumulate operations for both fixed and floating-point computations. For AI inference workloads, the INT8 mode achieves up to 38 TOPs across the largest devices—competitive with dedicated AI accelerators while maintaining full programmability.
Virtex UltraScale+ Power Supply Design
Voltage Rail Requirements
Designing the power distribution network for a Xilinx Virtex UltraScale+ FPGA requires careful attention to multiple voltage domains:
Rail
Voltage
Tolerance
Function
VCCINT
0.85V or 0.72V
±3%
Core logic
VCCBRAM
0.85V
±3%
Block RAM, UltraRAM
VCCAUX
1.8V
±5%
Auxiliary circuits
VCCO
1.0V–1.8V
±5%
I/O banks (HP)
VMGTAVCC
0.9V
±3%
Transceiver analog
VMGTAVTT
1.2V
±3%
Transceiver termination
VCCINT_GT
0.85V
±3%
Transceiver digital
The -2LE speed grade devices offer the option to operate VCCINT at 0.72V for reduced static power, though with corresponding reduction in maximum performance. For production designs, I typically start with 0.85V and evaluate whether the lower voltage meets timing requirements.
Power Sequencing Requirements
Proper power sequencing is critical for reliable operation:
Both VMGTAVCC and VCCINT can ramp simultaneously. If sequencing requirements are not met, current drawn from VMGTAVTT can exceed specifications during power-up.
Decoupling Capacitor Strategy
For a device like the XCVU9P in an FLGA2104 package, the PCB decoupling network typically requires:
Capacitor Value
Package
Quantity
Location
680 µF
Bulk electrolytic
1-2 per rail
Near VRM
100 µF
Polymer aluminum
2-4 per rail
Board perimeter
47 µF
1206 ceramic
4-8 per rail
Mid-distance
4.7 µF
0805 ceramic
40-80 total
Under FPGA
0.47 µF
0402 ceramic
100-200 total
Under FPGA
The 0402 capacitors provide high-frequency decoupling and should be placed on the bottom layer directly opposite the FPGA power balls. Use low-ESL mounting with vias placed at pad sides rather than ends.
PCB Design Guidelines for Virtex UltraScale+
Stack-Up Recommendations
For packages with 2000+ balls, a minimum 20-layer PCB is typical. A recommended stack-up structure:
Layer
Function
Impedance Target
L1
GTY TX/RX, component
85Ω differential
L2
GND reference
–
L3
High-speed signals
50Ω single-ended
L4
VCCINT plane
–
L5
General routing
–
L6
GND reference
–
…
Alternating signal/plane
–
L20
Bottom decoupling, component
–
Route GTY/GTM differential pairs on outer layers with solid ground reference. Maintain 100Ω ±5% differential impedance with length matching to ±5 mils within each pair.
High-Speed Routing Constraints
Interface
Impedance
Length Match
Material Requirement
GTY (32.75G)
100Ω diff
±5 mils
Dk < 3.8, Df < 0.008
GTM (58G)
100Ω diff
±3 mils
Dk < 3.5, Df < 0.005
DDR4
40-50Ω SE
Per byte lane
Standard FR4 acceptable
LVDS
100Ω diff
±50 mils
FR4 acceptable
For 58G PAM4 signaling, use ultra-low-loss materials like Megtron 6 or comparable. The tighter loss budgets at PAM4 rates demand exceptional dielectric performance throughout the signal path.
Virtex UltraScale+ Development Boards and Evaluation Kits
AMD provides official evaluation platforms for the Virtex UltraScale+ family:
Board
FPGA
Key Features
Typical Price
VCU118
XCVU9P
PCIe Gen3 x16, 4× DDR4, FMC+
~$8,000
VCU128
XCVU37P
8GB HBM, PCIe Gen3, 2× DDR4
~$15,000 (discontinued)
VCU129
XCVU29P
58G transceivers, QSFP-DD
Contact sales
The VCU118 remains the workhorse development platform for most Virtex UltraScale+ applications. It includes the XCVU9P with 2.5M logic cells, 4× DDR4 SO-DIMM sockets (up to 16GB each), PCIe Gen3 x16 edge connector, FMC and FMC+ expansion, and 4× QSFP28 cages.
Target Applications for Xilinx Virtex UltraScale+ FPGA
High-Speed Networking (100G/400G)
The Virtex UltraScale+ excels in networking applications requiring:
Up to 128 transceivers per device for multi-port line cards
Integrated 100G Ethernet MAC with RS-FEC
150G Interlaken for fabric interfaces
PCIe Gen3 x16 for host connectivity
A single VU13P can implement a 1 Tb/s line card with full packet processing capability. The integrated hard IP saves approximately 60K-100K logic cells per 100G port compared to soft implementations.
AI Inference and Machine Learning
The combination of high DSP density and HBM memory makes Virtex UltraScale+ devices compelling for AI inference:
Reconfigurable architecture adapts to evolving model architectures
Lower latency than GPU-based solutions for real-time inference
ASIC Prototyping and Emulation
The VU19P specifically targets ASIC prototyping:
9 million logic cells for the largest SoC designs
1.5 Tb/s DDR4 bandwidth for state storage
4.5 Tb/s transceiver bandwidth for system interfaces
Comprehensive debug and visibility tools
Hardware/software co-validation enables developers to begin software integration before physical silicon is available, accelerating time-to-market for complex SoCs.
Aerospace and Defense
The XQ defense-grade variants provide:
Full -55°C to +125°C junction temperature operation
Ruggedized packages with <97% Sn solder
MIL-STD-883 Group D environmental characterization
28.2 Gb/s transceiver performance
These specifications enable deployment in radar systems, electronic warfare, satellite communications, and avionics applications where commercial-grade components cannot survive.
The Xilinx Virtex UltraScale+ FPGA family offers multiple speed grades for different performance and power requirements:
Speed Grade
VCCINT
Temperature Range
Performance
-1
0.85V
Extended/Industrial
Standard
-2
0.85V
Extended/Industrial
High
-2LE
0.85V or 0.72V
Extended
Low power option
-3
0.85V
Extended
Highest
The -3 speed grade provides maximum performance but is typically only available for specific device/package combinations. For most production designs, the -2 speed grade offers the best balance of performance, availability, and cost.
Frequently Asked Questions
What is the difference between Virtex UltraScale and Virtex UltraScale+?
The Virtex UltraScale (non-plus) devices use 20nm planar process technology, while Virtex UltraScale+ uses 16nm FinFET+. The UltraScale+ devices offer approximately 30% lower power consumption, higher maximum frequencies, UltraRAM memory blocks, and higher-speed transceivers (up to 32.75 Gb/s GTY vs. 30.5 Gb/s in UltraScale). UltraScale+ also includes the HBM and 58G PAM4 variants that have no UltraScale equivalents.
Which Virtex UltraScale+ device should I select for 100G Ethernet?
For single-port 100GE applications, the VU3P or VU5P provides sufficient logic and transceiver resources with good cost-effectiveness. For multi-port line cards, the VU9P or VU13P supports higher port density with their larger transceiver counts. If using 100GE with PAM4 optics (50G per lane), consider the VU27P or VU29P with 58G GTM transceivers. The integrated 100G Ethernet MAC IP saves significant logic resources compared to soft implementations.
How does HBM compare to DDR4 for FPGA applications?
HBM provides 20× the bandwidth of DDR4 DIMM interfaces (460 GB/s vs. ~25 GB/s per channel) with lower power per bit (~7 pJ/bit vs. ~15 pJ/bit). HBM also eliminates external memory routing complexity since the memory is integrated in-package. However, HBM devices cost significantly more and the memory capacity is fixed at manufacturing (4-16 GB). DDR4 offers more flexibility in capacity selection and lower entry cost. Choose HBM when bandwidth is the bottleneck; choose DDR4 when capacity or cost is the primary concern.
What thermal solution is required for Virtex UltraScale+ devices?
Thermal requirements vary significantly by device size and utilization. For the largest devices (VU13P, VU19P) at high utilization, active cooling with heatsinks rated for 50-100W TDP is typical. The VU9P in a VCU118 evaluation kit uses a passive heatsink with adequate airflow. Always run the Xilinx Power Estimator (XPE) with your actual design utilization before finalizing thermal design. The System Monitor (SYSMON) provides real-time junction temperature monitoring—design your thermal solution to maintain Tj below 100°C under worst-case conditions.
Can I migrate designs between Virtex UltraScale+ devices?
Yes, AMD provides footprint compatibility within package families. Packages with the same footprint identifier (e.g., A2104, B2104) are pin-compatible, enabling designs to migrate between devices with different logic capacities. However, HBM devices only migrate among HBM variants—they are not footprint-compatible with standard Virtex UltraScale+ devices. Always verify I/O bank assignments and transceiver locations when planning migration paths, as these resources vary between devices even in compatible packages.
Integrated Hard IP Blocks
One of the significant advantages of the Xilinx Virtex UltraScale+ FPGA platform is the integration of ASIC-class hard IP blocks that save logic resources and power while providing guaranteed performance:
PCIe Hard Blocks
Feature
Specification
PCIe Generation
Gen3 x16 (Gen4 x8 in select devices)
Maximum Link Rate
8 GT/s (Gen3), 16 GT/s (Gen4)
Blocks per Device
Up to 6 (device dependent)
CCIX Support
Select 58G devices
The integrated PCIe blocks support advanced features including extended tags, end-to-end data protection, and SRIOV virtualization. For data center applications, the Gen4-capable devices with CCIX support enable cache-coherent connections to host processors.
Ethernet and Interlaken MAC
IP Core
Line Rate
Features
100G Ethernet MAC
100 Gb/s
RS-FEC, IEEE 1588
150G Interlaken
150 Gb/s
Fabric interface
KP4-FEC
Integrated
For PAM4 optics
KR4-FEC
Integrated
For backplane
The hard MAC implementations consume 90% less dynamic power than soft implementations while saving 60K-100K logic cells per port. This resource savings is critical for achieving high port density in networking applications.
Vivado Design Suite Support
The Virtex UltraScale+ family is fully supported by Vivado Design Suite, including:
Vivado Synthesis and Implementation with ML-optimized algorithms
IP Integrator for block-based design
Vivado Simulator with mixed-language support
ChipScope debugging with IBERT for transceiver testing
Dynamic Function eXchange (DFX) for partial reconfiguration
The Vivado ML features can significantly reduce compile times and improve timing closure for complex designs. For the largest devices like the VU19P, incremental compilation and Abstract Shell methodologies become essential for managing design iterations efficiently.
Conclusion
The Xilinx Virtex UltraScale+ FPGA family represents the current state of the art in programmable logic, delivering unmatched performance for the most demanding applications. From the entry-level VU3P with 862K logic cells to the massive VU19P with 9 million cells and the HBM-equipped variants offering 460 GB/s memory bandwidth, these devices provide solutions across the full spectrum of high-performance computing requirements.
For PCB engineers, the Virtex UltraScale+ presents significant design challenges in power delivery, signal integrity, and thermal management—but the payoff is access to computing capabilities that simply aren’t available in any other form factor. Whether you’re designing networking infrastructure, AI accelerators, radar systems, or ASIC prototyping platforms, the Virtex UltraScale+ delivers the performance and flexibility to realize your most ambitious designs.
The continued investment by AMD in this platform, including ongoing Vivado tool development and expanded device offerings, ensures that the Virtex UltraScale+ will remain relevant for years to come. For new high-performance FPGA projects, this family should be at the top of your evaluation list.
Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Notes: For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.