Alberto Nannarelli - Research page

Main Area of Research: Digital Systems, Computer Arithmetic and VLSI Design

Research Interests:

Arithmetical units and numerical processors

Design for low-power

Residual Arithmetic applied to digital signal processors

Reconfigurable processors


Arithmetical units and numerical processors

Division, Square-Root and Reciprocals

This area of research is related to hardware algorithms for numerical computations and their effective implementation in terms of speed of execution, area, and energy.

Algorithms are being developed to reduce the execution time of division and square-root computations. The corresponding implementations are modeled and evaluated for different technologies. The techniques being considered include scaling of the operands, retiming, and speculation of the result digits. These techniques are also extensible to the computation of the reciprocal (1/d) and the square-root reciprocal (1/sqrt(d)).

Division, square-root, 1/d, and 1/sqrt(d) are usually included in the set of instructions implemented in hardware in general-purpose processors for scientific computations. Due to their frequent use in computer graphics, these operations are implemented in hardware in graphics processors, such as the nVIDIA® GPUs, and in processors in gaming consoles, such as the Emotion Engine powering the Sonys PlayStation®.

Decimal Arithmetic

Computers resort to binary arithmetic to have a reduced number of components in order to save area on silicon and to reduce the space of the system. However, humans are used to deal with the decimal system and, sometimes, binary arithmetic is not so accurate in performing computation of decimals.

Binary floating-point cannot exactly represent decimal fractions. For example
10% = 10/100 = (0.1)10 = (0.0001100110011001100110011001 .... )2

For these reasons, financial applications implement decimal arithmetic operations in software and run 100 times slower than the corresponding binary operations.

Nowadays, because of the shrinking of devices, it is realistic to design arithmetic units working in decimal arithmetic and to speed up operations done in the decimal system by several times. The IEEE is working on a standard for decimal floating point.

Current work:
x/d, 1/d, 1/sqrt(d)
Decimal Arithmetic

Past and not so current work:
Retimed Division with Selection by Comparisons
High Radix Dividers
M.S.'s Thesis


Design for low-power

The main objective of this line of work is the study of techniques to reduce the power dissipation without penalizing the performance. The power consumption reduction is carried out at different levels of abstraction: from the algorithm level down to the implementation, or gate, level. Recently, with technology scaling and increased leakage, the static power dissipation cannot be neglected any longer in power budgets of nanometer technologies.

Current work:
Low Power Design for Digital Signal Processing

Past and not so current work:
High-level Power Characterization of Systems implemented on FPGAs
Low Power Design for Arithmetic Structures
Power Consumption Characterization
Ph. D. Dissertation


Residual Arithmetic applied to digital signal processors
The Residue Number System (RNS) allows the decomposition of a given dynamic range (bit-width) in slices of smaller range on which the computation can be implemented in parallel at higher speed.

The RNS has been successfully applied to the implementation of digital FIR filters, and it resulted also advantageous in terms of power dissipation with respect to filters implemented in the traditional two's complement (or sign and magnitude) system. The results obtained show that the use of RNS in FIR filters, implemented both in ASICs and FPGAs, leads to a reduction of the power dissipation up to 70% of their counterparts in the conventional number system.

Although filters based on the RNS show high performance and low power dissipation, they are not widely used in DSP systems, because of the complexity of the algorithms involved. For this reason we developed a tool to design RNS FIR filters which hides the RNS algorithms to designers, and generates a synthesizable VHDL description of the filter taking into account several design constraints such as: delay, area and energy.

Current work:
Residue Number System based, Fast Architectures for DSP


Reconfigurable processors
Recently, reconfigurable hardware, such as FPGAs, has gained a large portion of the market because it allows fast prototyping and tuning-up of the products which reduce design time and costs. We want to follow a similar approach for the design of a configurable datapath oriented to DSP applications, in which the basic blocks are arithmetic units (such as multipliers and adders) instead of logic gates (such as CLBs for FPGAs). Our datapath should be reconfigurable, or programmable, in terms of bit-width, sequence of operations, and type of operations.

Past and not so current work:
Reconfigurable datapath for signal processing


Alberto Nannarelli