Commit fcb43042 authored by Zhang, Yanmin's avatar Zhang, Yanmin Committed by Ingo Molnar

x86: fix cpu hotplug crash

Vegard Nossum reported crashes during cpu hotplug tests:

  http://marc.info/?l=linux-kernel&m=121413950227884&w=4

In function _cpu_up, the panic happens when calling
__raw_notifier_call_chain at the second time. Kernel doesn't panic when
calling it at the first time. If just say because of nr_cpu_ids, that's
not right.

By checking the source code, I found that function do_boot_cpu is the culprit.
Consider below call chain:
 _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.

So do_boot_cpu is called in the end. In do_boot_cpu, if
boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later
on, when _cpu_up calls __raw_notifier_call_chain at the second time to
report CPU_UP_CANCELED, because this cpu is already cleared from
cpu_possible_map, get_cpu_sysdev returns NULL.

Many resources are related to cpu_possible_map, so it's better not to
change it.

Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
cpu_possible_map.
Signed-off-by: default avatarZhang Yanmin <yanmin_zhang@linux.intel.com>
Tested-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
parent 0b1faeef
......@@ -996,7 +996,6 @@ do_rest:
#endif
cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
cpu_clear(cpu, cpu_possible_map);
cpu_clear(cpu, cpu_present_map);
per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment