Benchmarks

There are several sets of benchmark designs which can be used with VTR.

VTR Benchmarks

The VTR benchmarks [LAK+14, RLY+12] are a set of medium-sized benchmarks included with VTR. They are fully compatible with the full VTR flow. They are suitable for FPGA architecture research and medium-scale CAD research.

Table 1 The VTR 7.0 Benchmarks.

Benchmark

Domain

bgm

Finance

blob_merge

Image Processing

boundtop

Ray Tracing

ch_intrinsics

Memory Init

diffeq1

Math

diffeq2

Math

LU8PEEng

Math

LU32PEEng

Math

mcml

Medical Physics

mkDelayWorker32B

Packet Processing

mkPktMerge

Packet Processing

mkSMAdapter4B

Packet Processing

or1200

Soft Processor

raygentop

Ray Tracing

sha

Cryptography

stereovision0

Computer Vision

stereovision1

Computer Vision

stereovision2

Computer Vision

stereovision3

Computer Vision

The VTR benchmarks are provided as Verilog under:

$VTR_ROOT/vtr_flow/benchmarks/verilog

This provides full flexibility to modify and change how the designs are implemented (including the creation of new netlist primitives).

The VTR benchmarks are also included as pre-synthesized BLIF files under:

$VTR_ROOT/vtr_flow/benchmarks/vtr_benchmarks_blif

Titanium Benchmarks

The Titanium benchmarks are a set of 25 large modern FPGA benchmarks (Titanium25) that augment the Titan Benchmarks suite. They include recent applications such as deep learning accelerators, RISC-V processors, and DSP designs, with an average primitive count of approximately 850k. Like the Titan benchmarks, they incorporate Intel/Altera-specific IPs and are compiled using Quartus synthesis with VTR placement and routing (the Titan flow). They are compatible with both Intel Stratix IV and Stratix 10 devices, making them suitable for large-scale FPGA CAD and architecture research.

Table 2 The Titanium benchmarks.

Benchmark

Approximate Number of Netlist Primitives

S10 Smallest Feasible Device

Description

mem_test_max

7605183

Can’t fit on any S10 device

Mem. parametric failure testing

rocket31

1448187

1SG280HH1F55E1VG

31-core RISC-V Rocket chip

ASU_LRN

955146

1SG211HN1F43E1VG

AlexNet accelerator

ChainNN_LRN

937695

1SG280HH1F55E1VG

AlexNet accelerator

ChainNN_ELT

937300

1SG280HH1F55E1VG

ResNet-50 accelerator

ChainNN_BSC

905098

1SG280HH1F55E1VG

VGG-16 accelerator

rocket17

801897

1SG280HH1F55E1VG

17-core RISC-V Rocket chip

ASU_ELT

767837

1SG211HN1F43E1VG

ResNet-50 accelerator

ASU_BSC

734883

1SG211HN1F43E1VG

VGG-16 accelerator

tdfir

706338

1SX110HN1F43E1VG

DSP

pricing

668537

1SX110HN1F43E1VG

Option pricing algorithm

mem_tester

621351

1SX110HN1F43E1VG

Mem. parametric failure testing

mandelbrot

579813

1SX110HN1F43E1VG

Fractal rendering

channelizer

462003

1SG280HH1F55E1VG

DSP

fft1d_offchip

440661

1SG280HH1F55E1VG

DSP spectral analysis

DLA_LRN

414250

1SX110HN1F43E1VG

AlexNet accelerator

matrix_mult

392682

1SX110HN1F43E1VG

Matrix multiplication

fft1d

389350

1SX110HN1F43E1VG

DSP spectral analysis

fft2d

354506

1SX110HN1F43E1VG

DSP spectral analysis

neko

304581

1SX110HN1F43E1VG

GPU simulation

DLA_ELT

296292

1SX110HN1F43E1VG

ResNet-50 accelerator

DLA_BSC

285347

1SX110HN1F43E1VG

VGG-16 accelerator

jpeg_deco

209313

1SG280HH1F55E1VG

Image processing

nyuzi

90857

1SX040HH1F35E1VG

GPGPU processor

sobel

23224

1SX065HH1F35E1VG

Image processing

Note

The Titanium benchmarks are not included with the VTR release (due to their size). However they can be downloaded and extracted by running make get_titan_benchmarks from the root of the VTR tree.

Titan Benchmarks

The Titan benchmarks are a set of large modern FPGA benchmarks compatible with Intel Stratix IV [MWL+13, MWL+15] and Stratix 10 [KTK23] devices. The pre-synthesized versions of these benchmarks are compatible with recent versions of VPR.

The Titan benchmarks are suitable for large-scale FPGA CAD research, and FPGA architecture research which does not require synthesizing new netlist primitives.

Note

The Titan benchmarks are not included with the VTR release (due to their size). However they can be downloaded and extracted by running make get_titan_benchmarks from the root of the VTR tree. They can also be downloaded manually.

Koios 2.0 Benchmarks

The Koios benchmarks [ABR+21] are a set of Deep Learning (DL) benchmarks. They are suitable for DL related architecture and CAD research. There are 40 designs that include several medium-sized benchmarks and some large benchmarks. The designs target different network types (CNNs, RNNs, MLPs, RL) and layer types (fully-connected, convolution, activation, softmax, reduction, eltwise). Some of the designs are generated from HLS tools as well. These designs use many precisions including binary, different fixed point types int8/16/32, brain floating point (bfloat16), and IEEE half-precision floating point (fp16).

Table 3 The Koios Benchmarks.

Benchmark

Description

dla_like

Intel-DLA-like accelerator

clstm_like

CLSTM-like accelerator

deepfreeze

ARM FixyNN design

tdarknet_like

Accelerator for Tiny Darknet

bwave_like

Microsoft-Brainwave-like design

lstm

LSTM engine

bnn

4-layer binary neural network

lenet

Accelerator for LeNet-5

dnnweaver

DNNWeaver accelerator

tpu_like

Google-TPU-v1-like accelerator

gemm_layer

20x20 matrix multiplication engine

attention_layer

Transformer self-attention layer

conv_layer

GEMM based convolution

robot_rl

Robot+maze application

reduction_layer

Add/max/min reduction tree

spmv

Sparse matrix vector multiplication

eltwise_layer

Matrix elementwise add/sub/mult

softmax

Softmax classification layer

conv_layer_hls

Sliding window convolution

proxy

Proxy/synthetic benchmarks

The Koios benchmarks are provided as Verilog (enabling full flexibility to modify and change how the designs are implemented) under:

$VTR_ROOT/vtr_flow/benchmarks/verilog/koios

To use these benchmarks, please see the documentation in the README file at: https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/vtr_flow/benchmarks/verilog/koios

MCNC20 Benchmarks

The MCNC benchmarks [Yan91] are a set of small and old (circa 1991) benchmarks. They consist primarily of logic (i.e. LUTs) with few registers and no hard blocks.

Warning

The MCNC20 benchmarks are not recommended for modern FPGA CAD and architecture research. Their small size and design style (e.g. few registers, no hard blocks) make them unrepresentative of modern FPGA usage. This can lead to misleading CAD and/or architecture conclusions.

The MCNC20 benchmarks included with VTR are available as .blif files under:

$VTR_ROOT/vtr_flow/benchmarks/blif/

The versions used in the VPR 4.3 release, which were mapped to K-input look-up tables using FlowMap [CD94], are available under:

$VTR_ROOT/vtr_flow/benchmarks/blif/<#>

where K= <#>.

Table 4 The MCNC20 benchmarks.

Benchmark

Approximate Number of Netlist Primitives

alu4

934

apex2

1116

apex4

916

bigkey

1561

clma

3754

des

1199

diffeq

1410

dsip

1559

elliptic

3535

ex1010

2669

ex5p

824

frisc

3291

misex3

842

pdc

2879

s298

732

s38417

4888

s38584.1

4726

seq

1041

spla

2278

tseng

1583

SymbiFlow Benchmarks

SymbiFlow benchmarks are a set of small and medium sized tests to verify and test the SymbiFlow-generated architectures, including primarily the Xilinx Artix-7 device families.

The tests are generated by nightly builds from the symbiflow-arch-defs repository, and uploaded to a Google Cloud Platform from where they are fetched and executed in the VTR benchmarking suite.

The circuits are the following:

Table 5 The SymbiFlow benchmarks.

Benchmark

Description

picosoc @100 MHz

simple SoC with a picorv32 CPU running @100MHz

picosoc @50MHz

simple SoC with a picorv32 CPU running @50MHz

base-litex

LiteX-based SoC with a VexRiscv CPU booting into a BIOS only

ddr-litex

LiteX-based SoC with a VexRiscv CPU and a DDR controller

ddr-eth-litex

LiteX=based SoC with a VexRiscv CPU, a DDR controller and an Ethernet core

linux-litex

LiteX-based SoC with a VexRiscv CPU capable of booting linux

The SymbiFlow benchmarks can be downloaded and extracted by running the following:

cd $VTR_ROOT
make get_symbiflow_benchmarks

Once downloaded and extracted, benchmarks are provided as post-synthesized blif files under:

$VTR_ROOT/vtr_flow/benchmarks/symbiflow

NoC Benchmarks

NoC benchmarks are composed of synthetic and MLP benchmarks and target NoC-enhanced FPGA architectures. Synthetic benchmarks include a wide variety of traffic flow patterns and are divided into two groups: 1) simple and 2) complex benchmarks. As their names imply, simple benchmarks use very simple and small logic modules connected to NoC routers, while complex benchmarks implement more complicated functionalities like encryption. These benchmarks do not come from real application domains. On the other hand, MLP benchmarks include modules that perform matrix-vector multiplication and move data. Pre-synthesized netlists for the synthetic benchmarks are added to VTR project, but MLP netlists should be downloaded separately.

Note

The NoC MLP benchmarks are not included with the VTR release (due to their size). However they can be downloaded and extracted by running make get_noc_mlp_benchmarks from the root of the VTR tree. They can also be downloaded manually.