1. 01 May, 2007 5 commits
    • Tejun Heo's avatar
      libata: reimplement reset sequencing · 31daabda
      Tejun Heo authored
      libata previously depended upon waits in prereset to get resets after
      hotplug right for both spin up and device ready wait.  This was
      necessary both for reliablity and speed as reset was likely to fail if
      initiated too early and each try usually took more than 30secs to
      fail.  Previous patches fixed the reliability part by fixing status
      and SCR handling in resets.  This patch remedies the speed part by
      improving reset sequencing.
      
      Prereset waiting timeout is adjusted to 10s because spinup wait is
      replaced by reset sequencing and !BSY wait is not as important as
      before.  During boot or module loading where the drive is already
      fully spun up, !BSY wait succeeds immediately, so 10s should be enough
      in most cases.  It matters after hotplugging or other error
      conditions, but in those cases, !BSY wait in prereset simply can't be
      relied upon due to the varied and weird behaviors ATA controllers and
      devices show.
      
      Reset is now driven by ata_eh_reset_timeouts[] table which contains
      timeouts for each reset try.  The first reset can be softreset but the
      following ones are always hardreset if available.  Each timeout
      defines deadline for the reset try.  If a reset try fails, reset is
      retried with the next timeout till the end of the timeout table is
      reached.  If a reset try fails before the timeout with error, libata
      waits till the deadline of the failed try before retrying.
      
      IOW, the timeout table defines timetable of reset tries such that the
      n'th try always begins at least after the sum of all previous timeouts
      has passed.  The current timetable defines 4 tries and takes around 1
      minute.
      
      @0	: First try.  This should succeed most of the time during boot.
      @10	: 10s is enough to spin up most consumer harddrives.  Give it
      	  another shot.
      @20	: 20s should spin up > 99% of working drives.  This has 30s
      	  timeout for retarded devices needing long idleness post reset.
      @55	: Final try with 5s timeout just in case.
      
      The above timetable is trade off between not annoying the device too
      much with frequent resets and taking reasonable amount of time in most
      cases.  Some controllers may do better with shorter timeouts while
      others may fare better with longer but we just can't rely upon LLD
      writers to test each controller with wide variety of devices using
      various scenarios.  We need default behavior which reasonably fits
      most cases.
      
      I've tested the above timetable on a dozen SATA controllers and a few
      PATA controllers with about a dozen different drives from all major
      vendors and 4 different ODDs from three different vendors for both
      boot and hotplug (if available) cases.
      
      Boot probing is not affected unless the device is broken in which
      cases new code gives up on the port after a minute rather than five or
      nine minutes.  When hotplugging, most devices get detected on the
      first or second try.  Multi-platter drives with long spin up time
      which sometimes took > 40 secs with the original code, now usually
      comes up during the second try and at least right after the third try
      @20.
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      31daabda
    • Tejun Heo's avatar
      libata: improve ata_std_prereset() · b8cffc6a
      Tejun Heo authored
      This patch updates ata_std_prereset() as follows.
      
      * Don't fail on phy resume failure.  Just whine and continue.  Failure
        from prereset makes libata abort whole reset sequence and give up
        the port, so prereset() should be best effort.  This is more
        important with the coming EH updates as prereset() will be called
        with shorter timeout.
      
      * If ata_wait_ready() fails, whine and request hardreset instead.
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      b8cffc6a
    • Tejun Heo's avatar
      libata: improve 0xff status handling · 9b89391c
      Tejun Heo authored
      For PATA, 0xff status indicates empty port.  For SATA, it depends on
      how the controller emulates status register.  On some controllers,
      0xff is used to represent broken link or certain stage during reset.
      
      libata currently deals SATA the same.  This hasn't caused any problem
      because problematic situations usually only occur after hotplug or
      other link disruption events and libata blindly waited for the device
      to spin up and settle after hotplug giving the link and device
      whatever time to go through those stages.
      
      libata is going to replace unconditional spinup wait with generic
      timed sequence of resets, so not only getting 0xff handling right for
      SATA is, well, the right thing to do, it's much more important now.
      
      This patch makes the following changes.
      
      * Make ata_bus_softreset() return -ENODEV if any of its wait fails
        due to 0xff status.
      
      * Fail soft/hardreset if status wait returns -ENODEV indicating 0xff
        status while SStatus says the link is online.  e.g. Reset fails if
        status is 0xff after reset when SStatus reports the linke is online.
        If SCR registers are not available, everything is the same as
        before.
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      9b89391c
    • Tejun Heo's avatar
      libata: add deadline support to prereset and reset methods · d4b2bab4
      Tejun Heo authored
      Add @deadline to prereset and reset methods and make them honor it.
      ata_wait_ready() which directly takes @deadline is implemented to be
      used as the wait function.  This patch is in preparation for EH timing
      improvements.
      
      * ata_wait_ready() never does busy sleep.  It's only used from EH and
        no wait in EH is that urgent.  This function also prints 'be
        patient' message automatically after 5 secs of waiting if more than
        3 secs is remaining till deadline.
      
      * ata_bus_post_reset() now fails with error code if any of its wait
        fails.  This is important because earlier reset tries will have
        shorter timeout than the spec requires.  If a device fails to
        respond before the short timeout, reset should be retried with
        longer timeout rather than silently ignoring the device.
      
        There are three behavior differences.
      
        1. Timeout is applied to both devices at once, not separately.  This
           is more consistent with what the spec says.
      
        2. When a device passes devchk but fails to become ready before
           deadline.  Previouly, post_reset would just succeed and let
           device classification remove the device.  New code fails the
           reset thus causing reset retry.  After a few times, EH will give
           up disabling the port.
      
        3. When slave device passes devchk but fails to become accessible
           (TF-wise) after reset.  Original code disables dev1 after 30s
           timeout and continues as if the device doesn't exist, while the
           patched code fails reset.  When this happens, new code fails
           reset on whole port rather than proceeding with only the primary
           device.
      
        If the failing device is suffering transient problems, new code
        retries reset which is a better behavior.  If the failing device is
        actually broken, the net effect is identical to it, but not to the
        other device sharing the channel.  In the previous code, reset would
        have succeeded after 30s thus detecting the working one.  In the new
        code, reset fails and whole port gets disabled.  IMO, it's a
        pathological case anyway (broken device sharing bus with working
        one) and doesn't really matter.
      
      * ata_bus_softreset() is changed to return error code from
        ata_bus_post_reset().  It used to return 0 unconditionally.
      
      * Spin up waiting is to be removed and not converted to honor
        deadline.
      
      * To be on the safe side, deadline is set to 40s for the time being.
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      d4b2bab4
    • Linus Torvalds's avatar
      libata: honour host controllers that want just one host · dc87c398
      Linus Torvalds authored
      The Marvell IDE interface on my machine would hit a BUG_ON() in
      lib/iomem.c because it was calling ata_pci_init_one() specifying just a
      single port on the host, but that would actually end up trying to
      initialize two ports, the second one with bogus information.
      
      This fixes "ata_pci_init_one()" so that it actually passes down the
      n_ports variable that it got from the low-level driver to the host
      allocation routine ("ata_host_alloc_pinfo()"), which results in the ATA
      layer actually having the correct port number information.
      
      And in order to make it all work, I also needed to fix a few places that
      had incorrectly hard-coded the fact that a host always had exactly two
      ports (both ata_pci_init_bmdma() and ata_request_legacy_irqs() would
      just always iterate over both ports).
      Acked-by: default avatarJeff Garzik <jeff@garzik.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc87c398
  2. 30 Apr, 2007 35 commits