Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Notes: For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.
The Xilinx Kintex UltraScale+ family represents AMD’s mid-range FPGA lineup built on 16nm FinFET+ technology, delivering what I consider the optimal balance of performance, power efficiency, and cost for production designs. Having worked extensively with both the original Kintex UltraScale devices (XCKU040, XCKU060, XCKU115) and the newer UltraScale+ variants (XCKU5P, XCKU9P, XCKU15P), this guide covers the architectural details and practical design considerations that matter when bringing these devices into production hardware.
The UltraScale+ architecture builds upon the proven UltraScale foundation with critical enhancements that impact real-world designs: improved power efficiency through FinFET+ process optimization, on-chip UltraRAM blocks, faster GTY transceivers, and PCIe Gen4 capability. These aren’t marketing bullet points—they translate directly into reduced BOM costs, simpler thermal management, and expanded design headroom.
Process Technology and Power Advantages
Kintex UltraScale+ devices use TSMC’s 16nm FinFET+ process, which delivers measurable improvements over the original UltraScale 20nm technology:
Parameter
Kintex UltraScale
Kintex UltraScale+
Process Node
20nm
16nm FinFET+
Static Power
Baseline
Up to 30% lower
Dynamic Power
Baseline
Up to 20% lower
Core Voltage (VCCINT)
0.95V nominal
0.85V/0.72V options
Max Clock Speed
~600 MHz
~650 MHz
Power per GMAC
Higher
Optimized
The voltage scaling options in UltraScale+ are particularly valuable for power-constrained applications. The -2LE and -1LI speed grades can operate at 0.85V or 0.72V VCCINT, reducing both static and dynamic power while maintaining timing closure for most designs.
Kintex UltraScale+ Device Family Specifications
The Xilinx Kintex UltraScale+ family spans six devices ranging from entry-level to high-capacity variants. Each device targets specific application requirements based on logic density, DSP resources, and transceiver counts.
Kintex UltraScale+ Device Comparison
Device
Logic Cells
CLB LUTs
DSP Slices
Block RAM (Mb)
UltraRAM (Mb)
GTY Transceivers
XCKU3P
356,160
162,720
1,368
25.1
36.0
16
XCKU5P
475,400
217,200
1,824
34.9
36.0
16
XCKU9P
599,550
274,080
2,280
47.2
54.0
28
XCKU11P
653,100
298,560
2,928
26.2
45.0
32
XCKU13P
746,550
341,280
3,528
42.2
54.0
32
XCKU15P
1,143,450
522,720
3,984
55.4
63.0
40
Kintex UltraScale (Non-Plus) Device Reference
For comparison and migration planning, here are the original Kintex UltraScale devices that remain widely deployed:
Device
Logic Cells
CLB LUTs
DSP Slices
Block RAM (Mb)
GTH Transceivers
XCKU040
530,250
242,400
1,920
21.1
20
XCKU060
725,550
331,680
2,760
38.9
32
XCKU115
1,451,100
663,360
5,520
75.9
64
The XCKU040, XCKU060, and XCKU115 devices continue to serve production designs requiring the original UltraScale architecture. The XCKU115 is particularly notable as a multi-die (SSI) device combining two XCKU060 dies for maximum capacity.
UltraScale+ Configurable Logic Block Architecture
The CLB architecture in Kintex UltraScale+ represents a significant evolution from 7-Series FPGAs. Each CLB contains one slice with eight 6-input LUTs and sixteen flip-flops, providing increased density and routing flexibility.
CLB Slice Resources
Resource
Per Slice
Function
6-Input LUTs
8
Combinatorial logic, 64-bit memory, SRL32
Flip-Flops
16
Register storage, pipeline stages
Carry Chain
8-bit
Fast arithmetic operations
Wide MUX
32:1
Large multiplexer functions
Distributed RAM
512 bits
Small, fast local memory
The slice architecture enables flexible resource utilization. Each LUT can function as a 6-input logic function, two 5-input functions with shared inputs, or be configured as 64-bit distributed RAM (SLICEM only). The 32-bit shift register capability (SRL32) per LUT is particularly useful for implementing delay lines and FIFOs without consuming block RAM.
CLB Design Considerations
When targeting high utilization designs (>70% CLB usage), I’ve found these practices essential:
The routing architecture in UltraScale+ handles congestion better than 7-Series, but designs with highly localized logic clusters still benefit from pipelining. Adding register stages every 3-4 levels of logic typically improves both timing and routability.
Clock region boundaries matter for timing closure. The UltraScale+ architecture organizes resources into clock regions 60 CLBs tall, with each region containing dedicated clock distribution. Designs that minimize clock domain crossings at region boundaries achieve more predictable timing results.
DSP48E2 Slice Architecture in Kintex UltraScale+
The DSP48E2 slice in Kintex UltraScale+ provides substantial improvements over the DSP48E1 in 7-Series devices. These enhancements directly impact signal processing, machine learning, and mathematical computation performance.
DSP48E2 Key Features
Feature
DSP48E2 (UltraScale+)
DSP48E1 (7-Series)
Multiplier
27×18
25×18
Pre-adder Width
27-bit
25-bit
Accumulator
48-bit
48-bit
Pattern Detect
48-bit
48-bit
XOR Function
96-bit wide
Not available
A Input Width
30-bit
30-bit
The 27×18 multiplier handles larger operands without cascading multiple DSP slices, improving resource efficiency for audio processing, image filtering, and neural network implementations. The 96-bit XOR functionality enables efficient CRC and error correction implementations.
DSP Slice Placement and Cascading
DSP slices in Kintex UltraScale+ are arranged in columns with dedicated cascade paths. Each DSP column contains 24 slices per clock region, aligned horizontally with 18Kb block RAM pairs for optimal data flow.
For FIR filter implementations, the PCIN/PCOUT cascade allows systolic array architectures that achieve one multiply-accumulate per clock cycle per DSP slice. A 256-tap filter using cascaded DSP slices can run at 650 MHz in a -2 speed grade device.
The memory subsystem in Kintex UltraScale+ combines traditional block RAM with UltraRAM—a feature exclusive to UltraScale+ devices that fundamentally changes on-chip memory design strategies.
Block RAM Specifications
Configuration
Capacity
Data Width
Depth
RAMB36E2
36 Kb
1-72 bits
512-32K
RAMB18E2
18 Kb
1-36 bits
512-16K
FIFO36E2
36 Kb
Built-in FIFO
512-32K
Each block RAM supports true dual-port operation with independent clocks, widths, and addresses. The built-in ECC support detects single-bit errors and corrects them without consuming additional logic resources.
UltraRAM: On-Chip High-Density Memory
UltraRAM provides 288Kb per block (4Kx72) with characteristics that differ significantly from block RAM:
Parameter
UltraRAM
Block RAM
Capacity per Block
288 Kb
36 Kb
Data Width
72 bits fixed
1-72 bits configurable
Ports
Single (shared R/W)
True dual-port
Cascade
Built-in hardware
Logic required
Latency
2 clock cycles
1-2 clock cycles
Power
Lower per bit
Higher per bit
UltraRAM excels in applications requiring large on-chip buffers: video frame stores, network packet buffers, and machine learning weight storage. A single XCKU5P provides 36Mb of UltraRAM—enough for two 1080p video frames or significant neural network model storage.
Memory Design Strategy
The choice between distributed RAM, block RAM, and UltraRAM depends on access patterns and capacity requirements:
Use distributed RAM for small FIFOs (<512 bits), lookup tables accessed every clock cycle, and register files with many ports. The single-cycle access and location flexibility make distributed RAM ideal for these applications.
Block RAM suits medium-capacity buffers, dual-port memories, and applications requiring ECC. The configurable width and true dual-port capability handle most general memory needs efficiently.
UltraRAM is optimal for large sequential buffers where the 2-cycle latency and single-port access aren’t limiting factors. The hardware cascade feature enables memories up to the full UltraRAM capacity (36-63Mb depending on device) without routing penalties.
GTY Transceiver Architecture for High-Speed Serial
Kintex UltraScale+ devices feature GTY transceivers capable of 32.75 Gb/s—nearly double the 16.3 Gb/s maximum of GTH transceivers in original UltraScale devices. This enables support for modern protocols without external PHY devices.
GTY Transceiver Specifications
Parameter
GTY (UltraScale+)
GTH (UltraScale)
Max Line Rate
32.75 Gb/s
16.3 Gb/s
Min Line Rate
500 Mb/s
500 Mb/s
Supported Protocols
PCIe Gen4, 100G Ethernet, 28G FC
PCIe Gen3, 10G Ethernet
TX Pre-emphasis
Programmable
Programmable
RX Equalization
DFE + CTLE
DFE + CTLE
PLL Types
QPLL0, QPLL1, CPLL
QPLL, CPLL
Each GTY Quad contains four transceivers sharing two QPLL channels (QPLL0 and QPLL1) plus individual CPLLs per channel. The QPLL provides lower jitter for high line rates, while CPLLs offer flexibility for protocols requiring specific frequencies.
Transceiver Protocol Support
Protocol
Line Rate
Transceivers Required
PCIe Gen3 x8
8.0 GT/s
8 GTY
PCIe Gen4 x8
16.0 GT/s
8 GTY
100G Ethernet
25.78125 Gb/s
4 GTY
25G Ethernet
25.78125 Gb/s
1 GTY
CPRI Option 10
24.33024 Gb/s
1 GTY
The integrated 100G Ethernet MAC in Kintex UltraScale+ saves thousands of LUTs compared to soft implementations, enabling efficient 100G PON OLT line cards and data center applications.
Power Supply Design for Kintex UltraScale+
Successful Kintex UltraScale+ designs require careful power supply planning. The devices use multiple voltage rails with specific sequencing requirements and current demands.
Power Rail Requirements
Rail
Voltage
Tolerance
Typical Current (XCKU5P)
VCCINT
0.85V/0.72V
±3%
1.2-2.5A
VCCINT_IO
0.85V
±3%
50-150mA
VCCBRAM
0.85V
±3%
20-50mA
VCCAUX
1.8V
±5%
150-250mA
VCCAUX_IO
1.8V
±5%
30-80mA
VCCO (per bank)
1.0-3.3V
±5%
Application dependent
MGTAVCC
0.9V
±3%
100-300mA per quad
MGTAVTT
1.2V
±3%
100-200mA per quad
Power Sequencing Requirements
Kintex UltraScale+ requires specific power-on sequencing to ensure proper device initialization:
VCCINT and VCCINT_IO must ramp together or VCCINT first
VCCBRAM can ramp with VCCINT or after
VCCAUX and VCCAUX_IO must ramp after VCCINT reaches 80%
VCCO rails can ramp after VCCAUX is stable
GTY supplies (MGTAVCC, MGTAVTT, MGTAVCCAUX) ramp after core supplies
For power-down, the sequence reverses. Violating these requirements can cause latch-up or device damage.
Decoupling Capacitor Strategy
The PCB design guide (UG583) specifies capacitor quantities per device. For a typical XCKU5P design:
Rail
Bulk Capacitor
Decoupling (4.7µF)
Decoupling (0.47µF)
VCCINT
680µF
10
20
VCCBRAM
47µF
2
4
VCCAUX
47µF
4
8
VCCO (per bank)
47µF
1
2
Place 0.47µF capacitors directly under the FPGA on the backside PCB layer, with 4.7µF capacitors in the immediate vicinity. The 680µF bulk capacitor can be located near the voltage regulator.
High-speed signals and dense BGA packages demand careful PCB layout. These guidelines derive from AMD’s recommendations and practical production experience.
Recommended PCB Stack-Up
For fine-pitch BGA packages (1.0mm ball pitch), a minimum 12-layer stack-up is recommended:
Layer
Function
Notes
L1
Signal
Component side, escape routing
L2
Ground
Solid reference plane
L3
Signal
DDR4, high-speed signals
L4
Power
VCCINT plane
L5
Signal
General routing
L6
Ground
Solid reference plane
L7
Power
VCCAUX, VCCO planes
L8
Signal
General routing
L9
Ground
Solid reference plane
L10
Signal
GTY differential pairs
L11
Ground
Solid reference plane
L12
Signal
Bottom side escape
High-Speed Signal Routing
GTY transceiver differential pairs require controlled impedance routing (100Ω differential) with matched lengths within 5 mils. For 28 Gb/s operation, use low-loss PCB materials (Megtron 6 or equivalent) with Dk < 3.6 and Df < 0.004.
DDR4 interfaces at 2666 MT/s require careful attention to length matching within byte groups (±10 mils) and address/command signals (±25 mils). Place termination resistors within 500 mils of the DDR4 device.
Development Tools and Resources
AMD provides comprehensive tools for Kintex UltraScale+ design, verification, and implementation.
Vivado Design Suite
Vivado serves as the primary development environment for all UltraScale+ devices. Key features include:
Tool
Function
Vivado Synthesis
RTL to netlist synthesis
Vivado Implementation
Place and route
Vivado Simulator
Functional simulation
Vivado Hardware Manager
JTAG programming and debug
Vivado IP Integrator
Block design assembly
The device-locked licenses included with evaluation kits (KCU116, KCU105) provide full Vivado functionality for the specific FPGA on each board.
Essential Documentation
Document
Number
Description
Architecture Overview
DS890
UltraScale+ architecture details
DC/AC Data Sheet
DS922
Kintex UltraScale+ specifications
SelectIO User Guide
UG571
I/O standards and configuration
CLB User Guide
UG574
Logic block architecture
DSP48E2 User Guide
UG579
DSP slice details
Memory Resources
UG573
Block RAM and UltraRAM
GTY Transceivers
UG578
High-speed serial interfaces
PCB Design Guide
UG583
Layout and power guidelines
Packaging and Pinouts
UG575
Package specifications
Download Resources
Resource
Location
Vivado Design Suite
amd.com/vivado
Device Documentation
docs.amd.com
Xilinx Power Estimator (XPE)
amd.com/power
Reference Designs
amd.com/kintex-ultrascale-plus
Board Design Files
Product pages
Target Applications for Kintex UltraScale+
The Kintex UltraScale+ family addresses applications requiring high bandwidth, significant compute capability, and power efficiency:
Wireless Infrastructure (5G/LTE)
The combination of DSP slices and UltraRAM enables digital front-end processing, beamforming, and MIMO implementations. The 100G Ethernet MAC supports fronthaul connectivity without external PHY devices.
Data Center Acceleration
Network interface cards, storage controllers, and compute accelerators leverage PCIe Gen4 and high-speed Ethernet support. The power efficiency allows deployment in standard server thermal envelopes.
Medical Imaging
Ultrasound processing, CT reconstruction, and MRI signal processing benefit from the DSP architecture and large on-chip memory. The GTY transceivers handle high-resolution sensor data streams.
Video Processing
4K/8K video encoding, decoding, and switching utilize the combination of logic, DSP, and memory resources. UltraRAM provides frame buffer storage without external memory latency.
Aerospace and Defense
XQ defense-grade variants (XQKU5P, XQKU15P) offer extended temperature operation (-55°C to +125°C) and ruggedized packages for harsh environments.
Frequently Asked Questions
What is the difference between Kintex UltraScale and Kintex UltraScale+?
Kintex UltraScale uses 20nm process technology with GTH transceivers (up to 16.3 Gb/s), while Kintex UltraScale+ uses 16nm FinFET+ with GTY transceivers (up to 32.75 Gb/s). UltraScale+ adds UltraRAM memory blocks, voltage scaling options for lower power, and PCIe Gen4 support. The UltraScale+ devices (XCKU3P, XCKU5P, etc.) provide 20-30% power reduction compared to equivalent-capacity UltraScale devices (XCKU040, XCKU060, XCKU115).
Can I use Vivado WebPACK for Kintex UltraScale+ development?
No, Kintex UltraScale+ devices are not supported by the free Vivado WebPACK edition. You need Vivado Design Edition or higher. Evaluation kits like the KCU116 include device-locked licenses that provide full functionality for the specific FPGA on the board.
How do I choose between block RAM and UltraRAM?
Use block RAM for dual-port memories, configurable widths, and single-cycle latency requirements. UltraRAM is better for large sequential buffers where 2-cycle latency is acceptable and single-port access is sufficient. For video frame buffers or neural network weights, UltraRAM typically provides better resource efficiency.
What power supply ICs work well with Kintex UltraScale+?
Texas Instruments offers reference designs (PMP10630) using SIMPLE SWITCHER modules. Common solutions include LMZ31704 for VCCINT, LMZ21700 for auxiliary rails, and LM3880 for sequencing. Analog Devices, Infineon, and Monolithic Power Systems also provide FPGA-optimized power solutions.
How do I migrate from XCKU040 or XCKU060 to UltraScale+?
Device migration requires attention to transceiver differences (GTH vs GTY), memory architecture (no UltraRAM in original UltraScale), and package pinouts. Many packages share footprint compatibility (same ball pattern) between families, simplifying PCB migration. Use Vivado’s migration tools to analyze IP compatibility and timing impacts.
Thermal Management Considerations
Proper thermal design is critical for reliable Kintex UltraScale+ operation. Junction temperature limits vary by device grade:
Temperature Grade
Operating Range
Package Options
Commercial (C)
0°C to +85°C
Standard
Extended (E)
0°C to +100°C
Standard
Industrial (I)
-40°C to +100°C
Standard
Military (M)
-55°C to +125°C
Ruggedized
For designs consuming more than 5W, active cooling (heatsink with fan) is recommended. The Xilinx Power Estimator (XPE) spreadsheet provides accurate thermal power estimates based on resource utilization and switching activity. Run XPE analysis early in the design cycle to size the thermal solution appropriately.
Heat spreader lids on larger packages (FFVA1156, FFVE1517, FFVA1760) improve thermal transfer to heatsinks. For applications without lids, ensure direct contact between the heatsink and die using appropriate thermal interface material (TIM).
Conclusion
The Xilinx Kintex UltraScale+ family delivers exceptional value for designs requiring high performance, power efficiency, and modern connectivity. The architecture improvements over original UltraScale—particularly UltraRAM, GTY transceivers, and voltage scaling—enable applications that weren’t practical in previous generations.
Whether targeting wireless infrastructure, data center acceleration, or embedded signal processing, the Kintex UltraScale+ devices provide the resources and tools necessary for successful production designs. The comprehensive documentation, proven evaluation platforms, and robust development tools reduce risk and accelerate time-to-market for complex FPGA implementations.
For teams currently using devices like the XCKU040, XCKU060, or XCKU115, the migration path to UltraScale+ offers tangible benefits in power consumption and transceiver performance while maintaining design methodology compatibility through the Vivado toolchain.
Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Notes: For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.