• Roland Dreier's avatar
    [IB] mthca: first pass at catastrophic error reporting · 3d155f8c
    Roland Dreier authored
    Add some initial support for detecting and reporting catastrophic
    errors reported by Mellanox HCAs.  We start a periodic timer which
    polls the catastrophic error reporting buffer in device memory.  If an
    error is detected, we dump the contents of the buffer for port-mortem
    debugging, and report a fatal asynchronous error to higher levels.
    
    In the future we can try to recover from these errors by resetting the
    device, but this will require some work in higher-level code as well.
    Let's get this in now, so that we at least get catastrophic errors
    reported in logs.
    Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
    3d155f8c
mthca_cmd.c 51.8 KB