• Jack Morgenstein's avatar
    IB/mlx4: Use multiple WQ blocks to post smaller send WQEs · ea54b10c
    Jack Morgenstein authored
    ConnectX HCA supports shrinking WQEs, so that a single work request
    can be made of multiple units of wqe_shift.  This way, WRs can differ
    in size, and do not have to be a power of 2 in size, saving memory and
    speeding up send WR posting.  Unfortunately, if we do this then the
    wqe_index field in CQEs can't be used to look up the WR ID anymore, so
    our implementation does this only if selective signaling is off.
    
    Further, on 32-bit platforms, we can't use vmap() to make the QP
    buffer virtually contigious. Thus we have to use constant-sized WRs to
    make sure a WR is always fully within a single page-sized chunk.
    
    Finally, we use WRs with the NOP opcode to avoid wrapping around the
    queue buffer in the middle of posting a WR, and we set the
    NoErrorCompletion bit to avoid getting completions with error for NOP
    WRs.  However, NEC is only supported starting with firmware 2.2.232,
    so we use constant-sized WRs for older firmware.  And, since MLX QPs
    only support SEND, we use constant-sized WRs in this case.
    
    When stamping during NOP posting, do stamping following setting of the
    NOP WQE valid bit.
    Signed-off-by: default avatarMichael S. Tsirkin <mst@dev.mellanox.co.il>
    Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
    ea54b10c
mlx4_ib.h 8.69 KB