• Andreas Steinmetz's avatar
    [CRYPTO] Add x86_64 asm AES · a2a892a2
    Andreas Steinmetz authored
    Implementation:
    ===============
    The encrypt/decrypt code is based on an x86 implementation I did a while
    ago which I never published. This unpublished implementation does
    include an assembler based key schedule and precomputed tables. For
    simplicity and best acceptance, however, I took Gladman's in-kernel code
    for table generation and key schedule for the kernel port of my
    assembler code and modified this code to produce the key schedule as
    required by my assembler implementation. File locations and Kconfig are
    kept similar to the i586 AES assembler implementation.
    It may seem a little bit strange to use 32 bit I/O and registers in the
    assembler implementation but this gives the best code size. My
    implementation takes one instruction more per round compared to
    Gladman's x86 assembler but it doesn't require any stack for local
    variables or saved registers and it is less serialized than Gladman's
    code.
    Note that all comparisons to Gladman's code were done after my code was
    implemented. I did only use FIPS PUB 197 for the implementation so my
    implementation is independent work.
    If anybody has a better assembler solution for x86_64 I'll be pleased to
    have my code replaced with the better solution.
    
    Testing:
    ========
    The implementation passes the in-kernel crypto testing module and I'm
    running it without any problems on my laptop where it is mainly used for
    dm-crypt.
    
    Microbenchmark:
    ===============
    The microbenchmark was done in userspace with similar compile flags as
    used during kernel compile.
    Encrypt/decrypt is about 35% faster than the generic C implementation.
    As the generic C as well as my assembler implementation are both table
    I don't really expect that there is much room for further
    improvements though I'll be glad to be corrected here.
    The key schedule is about 5% slower than the generic C implementation.
    This is due to the fact that some more work has to be done in the key
    schedule routine to fit the schedule to the assembler implementation.
    
    Code Size:
    ==========
    Encrypt and decrypt are together about 2.1 Kbytes smaller than the
    generic C implementation which is important with regard to L1 cache
    usage. The key schedule routine is about 100 bytes larger than the
    generic C implementation.
    
    Data Size:
    ==========
    There's no difference in data size requirements between the assembler
    implementation and the generic C implementation.
    
    License:
    ========
    Gladmans's code is dual BSD/GPL whereas my assembler code is GPLv2 only
    (I'm  not going to change the license for my code). So I had to change
    the module license for the x86_64 aes module from 'Dual BSD/GPL' to
    'GPL' to reflect the most restrictive license within the module.
    Signed-off-by: default avatarAndreas Steinmetz <ast@domdv.de>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    a2a892a2
aes.c 8.17 KB