Alberto Nannarelli - Research page

Main Area of Research: Digital Systems, Computer Arithmetic and VLSI Design

Research Interests:

Arithmetic units and numerical processors

Power and Thermal Management of Systems-on-Chips

Power efficient digital signal processors

Hardware acceleration on reconfigurable platforms


Arithmetic units and numerical processors

Division, Square-Root and Reciprocals

This area of research is related to hardware algorithms for numerical computations and their effective implementation in terms of speed of execution, area, and energy.

Algorithms are being developed to reduce the execution time of division and square-root computations. The corresponding implementations are modeled and evaluated for different technologies. The techniques being considered include scaling of the operands, retiming, and speculation of the result digits. These techniques are also extensible to the computation of the reciprocal (1/d) and the square-root reciprocal (1/sqrt(d)).

Decimal Arithmetic

Computers resort to binary arithmetic to have a reduced number of components in order to save area on silicon and to reduce the space of the system. However, humans are used to deal with the decimal system and, sometimes, binary arithmetic is not so accurate in performing computation of decimals.

Binary floating-point cannot exactly represent decimal fractions. For example
10% = 10/100 = (0.1)10 = (0.0001100110011001100110011001 .... )2

For these reasons, financial applications implement decimal arithmetic operations in software and run 100 times slower than the corresponding binary operations.

Nowadays, because of the shrinking of devices, it is realistic to design arithmetic units working in decimal arithmetic and to speed up operations done in the decimal system by several times. In the 2008 revision of IEEE standard 754, the specification to represent decimal floating-point numbers was added to the binary one.

Current work:
"Radix-16 Combined Division and Square Root Unit" (ARITH 20, 2011).
Decimal Arithmetic

Past and not so current work:
Division and Square Root
M.S.'s Thesis


Power and Thermal Management of Systems-on-Chips

The main objective of this line of work is the study of techniques to reduce the power dissipation, without penalizing the performance, and to prevent the temperature to rise in excess.

The power consumption reduction is carried out at different levels of abstraction: from the algorithm level down to the implementation, or gate, level.

When it is not possible to reduce the power dissipation any further, the chip temperature rise can be mitigated by changing the power density of the system: by reorganizing the floorplan (statically), or by thermal-aware scheduling the SoC operations (dynamically).

Current work:
"Power Dissipation Challenges in Multicore Floating-Point Units" (ASAP, 2010).
"Post-placement Temperature Reduction Techniques" (DATE, 2010).

Past and not so current work:
High-level Power Characterization of Systems implemented on FPGAs
Low Power Design for Arithmetic Structures
Power Consumption Characterization
Ph. D. Dissertation


Power efficient digital signal processors
The main objective is to implement traditional DSP processors (filters, etc.) by using low-power methods to obtain significant reductions in power consumption.

Two main approaches are currently followed.

Residual Arithmetic

The Residue Number System (RNS) allows the decomposition of a given dynamic range (bit-width) in slices of smaller range on which the computation can be implemented in parallel at higher speed.

The RNS has been successfully applied to the implementation of digital FIR filters, and it resulted also advantageous in terms of power dissipation with respect to filters implemented in the traditional two's complement (or sign and magnitude) system.

Although filters based on the RNS show good performance and low power dissipation, they are not widely used in DSP systems, because of the complexity of the algorithms involved. Another area of research is to develop tools to design RNS based DSP systems which hide the RNS algorithms to designers, and generate synthesizable HDL.

Imprecise Arithmetic

Sometimes reducing the power dissipation of resource constrained electronic systems is a top priority. In signal processing, it is possible to have an acceptable quality even introducing some errors. The results obtained show that the use of RNS in FIR filters, implemented both in ASICs and FPGAs, leads to a reduction of the power dissipation up to 70% of their counterparts in the conventional number system.

Current work:
Imprecise Arithmetic
Residue Number System based, Fast Architectures for DSP


Hardware acceleration on reconfigurable platforms
A hardware accelerator is a co-processor, connected to the computer's CPU (for example via a standard bus, such as the PCI express), optimized to run efficiently a specific set of numerical computations. Graphics Processing Units (GPUs) are an example of such accelerators.

When processing "non-conventional" data, such as very long integers and modular arithmetic used in cryptography, or financial computation requiring the decimal number system, general-purpose CPUs and GPUs are less efficient that Application Specific Processors (ASPs). These ASPs can be implemented on FPGAs which can be fine tuned to match exactly the algorithm, and which are easy to reconfigure according to the application.

Current work:
"FPGA Based Acceleration of Decimal Operations" (ReConFig, 2011).

Past and not so current work:
Reconfigurable datapath for signal processing


Alberto Nannarelli