[Gnuk-users] Gnuk on a faster MCU

NIIBE Yutaka gniibe at fsij.org
Mon Sep 11 01:17:59 UTC 2017


Hello,

Firstly, let me explain current status of Chopstx/NeuG/Gnuk.

In 2016, I designed FS-BB48 with Cortex-M0+ (to explore smaller
footprint of PCB with crystal-less USB, less powerful MCU, and touch
button).  Chopstx has been ported to Cortex-M0+ with USB driver, then, I
found we need to port bignum, flash ROM handling, etc.  ADC routine for
NeuG was ported, afterward.  I realized that the assumption of flash ROM
handling is too specific to STM32F103.  Besides, I realized that most
people didn't agree the idea of less powerful MCU.  So, I didn't develop
further.  I only made engineering prototype of FS-BB48.

These days, with the purpose of porting Gnuk to GNU/Linux emulation, I
ported Chopstx to GNU/Linux by USBIP, partially.  Initially, I took the
approach doing within Gnuk (gnu/emulation directory), but I changed the
apporach doing that by porting Chopstx, NeuG routine, up to Gnuk.  Now,
Chopstx has USBIP driver for GNU/Linux, and I mostly ported NeuG
(standalone) to GNU/Linux.  And then, I am about to port flash ROM
access routines to GNU/Linux emulation.

I am doing this work, so that people with no physical hadware can "run"
Gnuk on GNU/Linux, and porting of Gnuk for other MCU can be easier.

Aurelien Jarno <aurelien at aurel32.net> wrote:
> I therefore started to prototype things a bit, and I "ported" Gnuk on a
> STM32L432 MCU. I say "ported" because I have done things quick and dirty
> and the keys are not even stored in flash, but in RAM. This MCU has a
> Cortex-M4 CPU running at 80MHz and tiny caches (1kB for instructions,
> 256B for data). It's available in a QFN32 case, even smaller than the
> STM32F103. It's also able to do crystal-less USB (I haven't tried yet).
>
> On such a CPU, Gnuk is able to do a RSA2048 decryption in 0.84s and
> a RSA4096 decryption in 5.18s (vs 1.27s and 8.22s on FST-01). The gains
> are mainly due to the instruction cache, as it hides the wait states of
> the flash memory. The remaining gain comes from the single cycle
> multiply-and-add instructions. I have been able to get these down to
> respectively 0.65s and 3.87s by using the UMAAL DSP instruction in
> MULADDC and mpi_montsqr.

Great!

QFN32, crystal-less USB and single cycle multiply-and-add sound great to
me.  I'm afraid STM32L432 has more features.

> I am still pondering wether to try with even faster MCU, like an STM32F4
> at 168MHz even if it comes in a bigger LQFP64 case. I would consider
> getting a < 2s signature / decryption for a RSA4096 something
> acceptable.

I think that 2 seconds is acceptable.  When I started the development of
OpenPGPcard alternative, it took like 5 seconds for RSA1024 with
ATmega328 running 20MHz.  I didn't feel it's acceptable for my own use
cases.  Then, for RSA2048, it was something like 2 seconds with
STM32F103 with PolarSSL in 2010.  Thus, I started Gnuk.

BTW, currently we are using p*q modulus.  It is known that multi prime
modulus can speed up RSA computation (It is patented by US5848159, still
effective).  There was a technique of p^k*q modulus, which was patented
by US6396926.  I found that the latter patent was expired in 2010, due
to failure to pay maintenance fee.  For me, the latter technique seems
to be covered by more general multi prime modulus technique.  If not, I
wonder we can use that.

> It seems the biggest portability issue concerns the flash.

Right.

> The current code assumes that the pages are small (1 or 2kB) and that
> the writes are done 2 bytes by 2 bytes. These assumptions are used in
> src/flash.c, but also define the format of the data in
> src/openpgp-do.c.

I feel that src/openpgp-do.c requires major surgery.  It assumes "2
bytes by 2 bytes", and data can be overwritten by more 0-bit data.

> I wonder if one way to fix that would be to use a single data pool,
> with the possibility to store longer objects like keys or certificates.
> It would mean triggering the garbage collector each time a sensitive
> data like a private key has been removed or replace. This is however a
> significant change to the current code.

I think that since data for private key is better to be handled
carefully, it is good we have different data storage and different
access routines.

For certificate data object, I want to kill the feature.
-- 



More information about the gnuk-users mailing list