Won-Ho Lee and Kee-Young Yoo
Department of Computer Engineering
Kyungpook National University
Daegu, 702-701, South Korea
email: yook@knu.ac.kr
The performance of computing power for elliptic curve cryptography or error correcting codes is primarily determined by the efficient realization of the arithmetic operations in the underlying finite fields GF(2m). The important operations involved in GF(2m) are addition, multiplication, division. Addition is very simple circuit if the field elements are presented in a polynomial form. However, the other operations are all much more complex. The conventional approaches for computing division in GF(2m) include the table lookup method, Euclid¡¯s algorithm, and Fermat¡¯s theorem based method. In this method, Fermat¡¯s theorem based method is using successive squaring and multiplication such as A/B = AB-1 = A(B(B(B ... B(B(B)2)2 ... )2)2)2. Therefore, the division can be performed by the iterative application of AB2 multiplication. The systolic arrays for performing the arithmetic operations have been reported in previous literature. In reality, the systolic AB2 mul! tipliers and dividers of Wei and Wang have an I/O format with a bit-parallel-input-output. Whereas, the Lee¡¯s systoliAB2 multiplier has an I/O format with a digit-serial-input-output. In addition, the systolic design of Wei has the bi-directional data flow, while the circuits for Wang and Lee have the unidirectional data flow. We focus on the digit-serial systolic implementation of AB2 multiplication and division in GF(2m)with the standard basis representation.
We derived the dependence graph from the modified AB2 multiplication algorithm. Thereafter, we designed new digit-serial systolic AB2multiplier and divider using the proposed multiplier. The proposed AB2multiplier is composed of a Type-1 cell and m-1 Type-2 cells. The critical-path delay in each cell is the total delay of one 2-input AND gate and one 4-input XOR gate and one 1-bit latch, and the latency of the overall circuit is 2m+m/L-1 clock cycles. The proposed divider consists of m-1 AB2 multipliers for GF(2m) and some delay elements. The proposed systolic arrays were described in VHDL with ALTERA MAX PLUS-II tool, and then were simulated using FLEX 10k devices of the ALTERA family for its computation time and correctness. The proposed AB2 multiplier has a similar latency and critical-path then the architectures of Wei and Wang, but it has a very smaller hardware complexity. Also, it has a short critical-path and a similar latency then the Lee's architectur.
The proposed divider has a similar latency and critical-path then the architectures of Wei and Wang, but it has a smaller hardware complexity then the previous architectures.
Therefore, the proposed systolic arrays have a significant improvement in reducing the area-time (AT) complexity compared with previous architecture, although they have one control signal. That is, for example m=160 and L=2, the AT complexity of the proposed AB2 multiplier can be reduced approximately by 98% and 35% than that of Wang and Lee, respectively. They have regularity, modularity, and unidirectional data flow, and thus are well suited to VLSI implementation. Accordingly, the proposed arrays could be used in cryptographic hardware and IC cards.