Contact Sales & After-Sales Service

Contact & Quotation

  • Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.
  • Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.
Drag & Drop Files, Choose Files to Upload You can upload up to 3 files.

Notes:
For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.

Xilinx Kintex UltraScale+ FPGA: Architecture & Design Guide

The Xilinx Kintex UltraScale+ family represents AMD’s mid-range FPGA lineup built on 16nm FinFET+ technology, delivering what I consider the optimal balance of performance, power efficiency, and cost for production designs. Having worked extensively with both the original Kintex UltraScale devices (XCKU040, XCKU060, XCKU115) and the newer UltraScale+ variants (XCKU5P, XCKU9P, XCKU15P), this guide covers the architectural details and practical design considerations that matter when bringing these devices into production hardware.

Xilinx Kintex UltraScale+ Architecture Overview

The UltraScale+ architecture builds upon the proven UltraScale foundation with critical enhancements that impact real-world designs: improved power efficiency through FinFET+ process optimization, on-chip UltraRAM blocks, faster GTY transceivers, and PCIe Gen4 capability. These aren’t marketing bullet points—they translate directly into reduced BOM costs, simpler thermal management, and expanded design headroom.

Process Technology and Power Advantages

Kintex UltraScale+ devices use TSMC’s 16nm FinFET+ process, which delivers measurable improvements over the original UltraScale 20nm technology:

ParameterKintex UltraScaleKintex UltraScale+
Process Node20nm16nm FinFET+
Static PowerBaselineUp to 30% lower
Dynamic PowerBaselineUp to 20% lower
Core Voltage (VCCINT)0.95V nominal0.85V/0.72V options
Max Clock Speed~600 MHz~650 MHz
Power per GMACHigherOptimized

The voltage scaling options in UltraScale+ are particularly valuable for power-constrained applications. The -2LE and -1LI speed grades can operate at 0.85V or 0.72V VCCINT, reducing both static and dynamic power while maintaining timing closure for most designs.

Kintex UltraScale+ Device Family Specifications

The Xilinx Kintex UltraScale+ family spans six devices ranging from entry-level to high-capacity variants. Each device targets specific application requirements based on logic density, DSP resources, and transceiver counts.

Kintex UltraScale+ Device Comparison

DeviceLogic CellsCLB LUTsDSP SlicesBlock RAM (Mb)UltraRAM (Mb)GTY Transceivers
XCKU3P356,160162,7201,36825.136.016
XCKU5P475,400217,2001,82434.936.016
XCKU9P599,550274,0802,28047.254.028
XCKU11P653,100298,5602,92826.245.032
XCKU13P746,550341,2803,52842.254.032
XCKU15P1,143,450522,7203,98455.463.040

Kintex UltraScale (Non-Plus) Device Reference

For comparison and migration planning, here are the original Kintex UltraScale devices that remain widely deployed:

DeviceLogic CellsCLB LUTsDSP SlicesBlock RAM (Mb)GTH Transceivers
XCKU040530,250242,4001,92021.120
XCKU060725,550331,6802,76038.932
XCKU1151,451,100663,3605,52075.964

The XCKU040, XCKU060, and XCKU115 devices continue to serve production designs requiring the original UltraScale architecture. The XCKU115 is particularly notable as a multi-die (SSI) device combining two XCKU060 dies for maximum capacity.

UltraScale+ Configurable Logic Block Architecture

The CLB architecture in Kintex UltraScale+ represents a significant evolution from 7-Series FPGAs. Each CLB contains one slice with eight 6-input LUTs and sixteen flip-flops, providing increased density and routing flexibility.

CLB Slice Resources

ResourcePer SliceFunction
6-Input LUTs8Combinatorial logic, 64-bit memory, SRL32
Flip-Flops16Register storage, pipeline stages
Carry Chain8-bitFast arithmetic operations
Wide MUX32:1Large multiplexer functions
Distributed RAM512 bitsSmall, fast local memory

The slice architecture enables flexible resource utilization. Each LUT can function as a 6-input logic function, two 5-input functions with shared inputs, or be configured as 64-bit distributed RAM (SLICEM only). The 32-bit shift register capability (SRL32) per LUT is particularly useful for implementing delay lines and FIFOs without consuming block RAM.

CLB Design Considerations

When targeting high utilization designs (>70% CLB usage), I’ve found these practices essential:

The routing architecture in UltraScale+ handles congestion better than 7-Series, but designs with highly localized logic clusters still benefit from pipelining. Adding register stages every 3-4 levels of logic typically improves both timing and routability.

Clock region boundaries matter for timing closure. The UltraScale+ architecture organizes resources into clock regions 60 CLBs tall, with each region containing dedicated clock distribution. Designs that minimize clock domain crossings at region boundaries achieve more predictable timing results.

DSP48E2 Slice Architecture in Kintex UltraScale+

The DSP48E2 slice in Kintex UltraScale+ provides substantial improvements over the DSP48E1 in 7-Series devices. These enhancements directly impact signal processing, machine learning, and mathematical computation performance.

DSP48E2 Key Features

FeatureDSP48E2 (UltraScale+)DSP48E1 (7-Series)
Multiplier27×1825×18
Pre-adder Width27-bit25-bit
Accumulator48-bit48-bit
Pattern Detect48-bit48-bit
XOR Function96-bit wideNot available
A Input Width30-bit30-bit

The 27×18 multiplier handles larger operands without cascading multiple DSP slices, improving resource efficiency for audio processing, image filtering, and neural network implementations. The 96-bit XOR functionality enables efficient CRC and error correction implementations.

DSP Slice Placement and Cascading

DSP slices in Kintex UltraScale+ are arranged in columns with dedicated cascade paths. Each DSP column contains 24 slices per clock region, aligned horizontally with 18Kb block RAM pairs for optimal data flow.

For FIR filter implementations, the PCIN/PCOUT cascade allows systolic array architectures that achieve one multiply-accumulate per clock cycle per DSP slice. A 256-tap filter using cascaded DSP slices can run at 650 MHz in a -2 speed grade device.

Read more Xilinx FPGA Series:

Memory Architecture: Block RAM and UltraRAM

The memory subsystem in Kintex UltraScale+ combines traditional block RAM with UltraRAM—a feature exclusive to UltraScale+ devices that fundamentally changes on-chip memory design strategies.

Block RAM Specifications

ConfigurationCapacityData WidthDepth
RAMB36E236 Kb1-72 bits512-32K
RAMB18E218 Kb1-36 bits512-16K
FIFO36E236 KbBuilt-in FIFO512-32K

Each block RAM supports true dual-port operation with independent clocks, widths, and addresses. The built-in ECC support detects single-bit errors and corrects them without consuming additional logic resources.

UltraRAM: On-Chip High-Density Memory

UltraRAM provides 288Kb per block (4Kx72) with characteristics that differ significantly from block RAM:

ParameterUltraRAMBlock RAM
Capacity per Block288 Kb36 Kb
Data Width72 bits fixed1-72 bits configurable
PortsSingle (shared R/W)True dual-port
CascadeBuilt-in hardwareLogic required
Latency2 clock cycles1-2 clock cycles
PowerLower per bitHigher per bit

UltraRAM excels in applications requiring large on-chip buffers: video frame stores, network packet buffers, and machine learning weight storage. A single XCKU5P provides 36Mb of UltraRAM—enough for two 1080p video frames or significant neural network model storage.

Memory Design Strategy

The choice between distributed RAM, block RAM, and UltraRAM depends on access patterns and capacity requirements:

Use distributed RAM for small FIFOs (<512 bits), lookup tables accessed every clock cycle, and register files with many ports. The single-cycle access and location flexibility make distributed RAM ideal for these applications.

Block RAM suits medium-capacity buffers, dual-port memories, and applications requiring ECC. The configurable width and true dual-port capability handle most general memory needs efficiently.

UltraRAM is optimal for large sequential buffers where the 2-cycle latency and single-port access aren’t limiting factors. The hardware cascade feature enables memories up to the full UltraRAM capacity (36-63Mb depending on device) without routing penalties.

GTY Transceiver Architecture for High-Speed Serial

Kintex UltraScale+ devices feature GTY transceivers capable of 32.75 Gb/s—nearly double the 16.3 Gb/s maximum of GTH transceivers in original UltraScale devices. This enables support for modern protocols without external PHY devices.

GTY Transceiver Specifications

ParameterGTY (UltraScale+)GTH (UltraScale)
Max Line Rate32.75 Gb/s16.3 Gb/s
Min Line Rate500 Mb/s500 Mb/s
Supported ProtocolsPCIe Gen4, 100G Ethernet, 28G FCPCIe Gen3, 10G Ethernet
TX Pre-emphasisProgrammableProgrammable
RX EqualizationDFE + CTLEDFE + CTLE
PLL TypesQPLL0, QPLL1, CPLLQPLL, CPLL

Each GTY Quad contains four transceivers sharing two QPLL channels (QPLL0 and QPLL1) plus individual CPLLs per channel. The QPLL provides lower jitter for high line rates, while CPLLs offer flexibility for protocols requiring specific frequencies.

Transceiver Protocol Support

ProtocolLine RateTransceivers Required
PCIe Gen3 x88.0 GT/s8 GTY
PCIe Gen4 x816.0 GT/s8 GTY
100G Ethernet25.78125 Gb/s4 GTY
25G Ethernet25.78125 Gb/s1 GTY
CPRI Option 1024.33024 Gb/s1 GTY

The integrated 100G Ethernet MAC in Kintex UltraScale+ saves thousands of LUTs compared to soft implementations, enabling efficient 100G PON OLT line cards and data center applications.

Power Supply Design for Kintex UltraScale+

Successful Kintex UltraScale+ designs require careful power supply planning. The devices use multiple voltage rails with specific sequencing requirements and current demands.

Power Rail Requirements

RailVoltageToleranceTypical Current (XCKU5P)
VCCINT0.85V/0.72V±3%1.2-2.5A
VCCINT_IO0.85V±3%50-150mA
VCCBRAM0.85V±3%20-50mA
VCCAUX1.8V±5%150-250mA
VCCAUX_IO1.8V±5%30-80mA
VCCO (per bank)1.0-3.3V±5%Application dependent
MGTAVCC0.9V±3%100-300mA per quad
MGTAVTT1.2V±3%100-200mA per quad

Power Sequencing Requirements

Kintex UltraScale+ requires specific power-on sequencing to ensure proper device initialization:

  1. VCCINT and VCCINT_IO must ramp together or VCCINT first
  2. VCCBRAM can ramp with VCCINT or after
  3. VCCAUX and VCCAUX_IO must ramp after VCCINT reaches 80%
  4. VCCO rails can ramp after VCCAUX is stable
  5. GTY supplies (MGTAVCC, MGTAVTT, MGTAVCCAUX) ramp after core supplies

For power-down, the sequence reverses. Violating these requirements can cause latch-up or device damage.

Decoupling Capacitor Strategy

The PCB design guide (UG583) specifies capacitor quantities per device. For a typical XCKU5P design:

RailBulk CapacitorDecoupling (4.7µF)Decoupling (0.47µF)
VCCINT680µF1020
VCCBRAM47µF24
VCCAUX47µF48
VCCO (per bank)47µF12

Place 0.47µF capacitors directly under the FPGA on the backside PCB layer, with 4.7µF capacitors in the immediate vicinity. The 680µF bulk capacitor can be located near the voltage regulator.

Read more Xilinx Products:

PCB Design Guidelines for Kintex UltraScale+

High-speed signals and dense BGA packages demand careful PCB layout. These guidelines derive from AMD’s recommendations and practical production experience.

Recommended PCB Stack-Up

For fine-pitch BGA packages (1.0mm ball pitch), a minimum 12-layer stack-up is recommended:

LayerFunctionNotes
L1SignalComponent side, escape routing
L2GroundSolid reference plane
L3SignalDDR4, high-speed signals
L4PowerVCCINT plane
L5SignalGeneral routing
L6GroundSolid reference plane
L7PowerVCCAUX, VCCO planes
L8SignalGeneral routing
L9GroundSolid reference plane
L10SignalGTY differential pairs
L11GroundSolid reference plane
L12SignalBottom side escape

High-Speed Signal Routing

GTY transceiver differential pairs require controlled impedance routing (100Ω differential) with matched lengths within 5 mils. For 28 Gb/s operation, use low-loss PCB materials (Megtron 6 or equivalent) with Dk < 3.6 and Df < 0.004.

DDR4 interfaces at 2666 MT/s require careful attention to length matching within byte groups (±10 mils) and address/command signals (±25 mils). Place termination resistors within 500 mils of the DDR4 device.

Development Tools and Resources

AMD provides comprehensive tools for Kintex UltraScale+ design, verification, and implementation.

Vivado Design Suite

Vivado serves as the primary development environment for all UltraScale+ devices. Key features include:

ToolFunction
Vivado SynthesisRTL to netlist synthesis
Vivado ImplementationPlace and route
Vivado SimulatorFunctional simulation
Vivado Hardware ManagerJTAG programming and debug
Vivado IP IntegratorBlock design assembly

The device-locked licenses included with evaluation kits (KCU116, KCU105) provide full Vivado functionality for the specific FPGA on each board.

Essential Documentation

DocumentNumberDescription
Architecture OverviewDS890UltraScale+ architecture details
DC/AC Data SheetDS922Kintex UltraScale+ specifications
SelectIO User GuideUG571I/O standards and configuration
CLB User GuideUG574Logic block architecture
DSP48E2 User GuideUG579DSP slice details
Memory ResourcesUG573Block RAM and UltraRAM
GTY TransceiversUG578High-speed serial interfaces
PCB Design GuideUG583Layout and power guidelines
Packaging and PinoutsUG575Package specifications

Download Resources

ResourceLocation
Vivado Design Suiteamd.com/vivado
Device Documentationdocs.amd.com
Xilinx Power Estimator (XPE)amd.com/power
Reference Designsamd.com/kintex-ultrascale-plus
Board Design FilesProduct pages

Target Applications for Kintex UltraScale+

The Kintex UltraScale+ family addresses applications requiring high bandwidth, significant compute capability, and power efficiency:

Wireless Infrastructure (5G/LTE)

The combination of DSP slices and UltraRAM enables digital front-end processing, beamforming, and MIMO implementations. The 100G Ethernet MAC supports fronthaul connectivity without external PHY devices.

Data Center Acceleration

Network interface cards, storage controllers, and compute accelerators leverage PCIe Gen4 and high-speed Ethernet support. The power efficiency allows deployment in standard server thermal envelopes.

Medical Imaging

Ultrasound processing, CT reconstruction, and MRI signal processing benefit from the DSP architecture and large on-chip memory. The GTY transceivers handle high-resolution sensor data streams.

Video Processing

4K/8K video encoding, decoding, and switching utilize the combination of logic, DSP, and memory resources. UltraRAM provides frame buffer storage without external memory latency.

Aerospace and Defense

XQ defense-grade variants (XQKU5P, XQKU15P) offer extended temperature operation (-55°C to +125°C) and ruggedized packages for harsh environments.

Frequently Asked Questions

What is the difference between Kintex UltraScale and Kintex UltraScale+?

Kintex UltraScale uses 20nm process technology with GTH transceivers (up to 16.3 Gb/s), while Kintex UltraScale+ uses 16nm FinFET+ with GTY transceivers (up to 32.75 Gb/s). UltraScale+ adds UltraRAM memory blocks, voltage scaling options for lower power, and PCIe Gen4 support. The UltraScale+ devices (XCKU3P, XCKU5P, etc.) provide 20-30% power reduction compared to equivalent-capacity UltraScale devices (XCKU040, XCKU060, XCKU115).

Can I use Vivado WebPACK for Kintex UltraScale+ development?

No, Kintex UltraScale+ devices are not supported by the free Vivado WebPACK edition. You need Vivado Design Edition or higher. Evaluation kits like the KCU116 include device-locked licenses that provide full functionality for the specific FPGA on the board.

How do I choose between block RAM and UltraRAM?

Use block RAM for dual-port memories, configurable widths, and single-cycle latency requirements. UltraRAM is better for large sequential buffers where 2-cycle latency is acceptable and single-port access is sufficient. For video frame buffers or neural network weights, UltraRAM typically provides better resource efficiency.

What power supply ICs work well with Kintex UltraScale+?

Texas Instruments offers reference designs (PMP10630) using SIMPLE SWITCHER modules. Common solutions include LMZ31704 for VCCINT, LMZ21700 for auxiliary rails, and LM3880 for sequencing. Analog Devices, Infineon, and Monolithic Power Systems also provide FPGA-optimized power solutions.

How do I migrate from XCKU040 or XCKU060 to UltraScale+?

Device migration requires attention to transceiver differences (GTH vs GTY), memory architecture (no UltraRAM in original UltraScale), and package pinouts. Many packages share footprint compatibility (same ball pattern) between families, simplifying PCB migration. Use Vivado’s migration tools to analyze IP compatibility and timing impacts.

Thermal Management Considerations

Proper thermal design is critical for reliable Kintex UltraScale+ operation. Junction temperature limits vary by device grade:

Temperature GradeOperating RangePackage Options
Commercial (C)0°C to +85°CStandard
Extended (E)0°C to +100°CStandard
Industrial (I)-40°C to +100°CStandard
Military (M)-55°C to +125°CRuggedized

For designs consuming more than 5W, active cooling (heatsink with fan) is recommended. The Xilinx Power Estimator (XPE) spreadsheet provides accurate thermal power estimates based on resource utilization and switching activity. Run XPE analysis early in the design cycle to size the thermal solution appropriately.

Heat spreader lids on larger packages (FFVA1156, FFVE1517, FFVA1760) improve thermal transfer to heatsinks. For applications without lids, ensure direct contact between the heatsink and die using appropriate thermal interface material (TIM).

Conclusion

The Xilinx Kintex UltraScale+ family delivers exceptional value for designs requiring high performance, power efficiency, and modern connectivity. The architecture improvements over original UltraScale—particularly UltraRAM, GTY transceivers, and voltage scaling—enable applications that weren’t practical in previous generations.

Whether targeting wireless infrastructure, data center acceleration, or embedded signal processing, the Kintex UltraScale+ devices provide the resources and tools necessary for successful production designs. The comprehensive documentation, proven evaluation platforms, and robust development tools reduce risk and accelerate time-to-market for complex FPGA implementations.

For teams currently using devices like the XCKU040, XCKU060, or XCKU115, the migration path to UltraScale+ offers tangible benefits in power consumption and transceiver performance while maintaining design methodology compatibility through the Vivado toolchain.

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Sales & After-Sales Service

Contact & Quotation

  • Inquire: Call 0086-755-23203480, or reach out via the form below/your sales contact to discuss our design, manufacturing, and assembly capabilities.

  • Quote: Email your PCB files to Sales@pcbsync.com (Preferred for large files) or submit online. We will contact you promptly. Please ensure your email is correct.

Drag & Drop Files, Choose Files to Upload You can upload up to 3 files.

Notes:
For PCB fabrication, we require PCB design file in Gerber RS-274X format (most preferred), *.PCB/DDB (Protel, inform your program version) format or *.BRD (Eagle) format. For PCB assembly, we require PCB design file in above mentioned format, drilling file and BOM. Click to download BOM template To avoid file missing, please include all files into one folder and compress it into .zip or .rar format.