Commits · edcafe3c5a06f46407c3f60145a36f269e56ff7f · linux / linux-davinci

01 Mar, 2010 40 commits

KVM: VMX: Give the guest ownership of cr0.ts when the fpu is active · edcafe3c

Avi Kivity authored Dec 30, 2009

If the guest fpu is loaded, there is nothing interesing about cr0.ts; let
the guest play with it as it will.  This makes context switches between fpu
intensive guest processes faster, as we won't trap the clts and cr0 write
instructions.

[marcelo: fix cr0 read shadow update on fpu deactivation; kills F8 install]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

edcafe3c

KVM: Lazify fpu activation and deactivation · 02daab21

Avi Kivity authored Dec 30, 2009

Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep
it loaded until the next heavyweight exit (where we are forced to unload it).
This reduces unnecessary exits.

We also defer fpu activation on clts; while clts signals the intent to use the
fpu, we can't be sure the guest will actually use it.
Signed-off-by: Avi Kivity <avi@redhat.com>

02daab21

KVM: VMX: Allow the guest to own some cr0 bits · e8467fda

Avi Kivity authored Dec 29, 2009

We will use this later to give the guest ownership of cr0.ts.
Signed-off-by: Avi Kivity <avi@redhat.com>

e8467fda

KVM: Replace read accesses of vcpu->arch.cr0 by an accessor · 4d4ec087

Avi Kivity authored Dec 29, 2009

Since we'd like to allow the guest to own a few bits of cr0 at times, we need
to know when we access those bits.
Signed-off-by: Avi Kivity <avi@redhat.com>

4d4ec087

KVM: VMX: trace clts and lmsw instructions as cr accesses · a1f83a74

Avi Kivity authored Dec 29, 2009

clts writes cr0.ts; lmsw writes cr0[0:15] - record that in ftrace.
Signed-off-by: Avi Kivity <avi@redhat.com>

a1f83a74

KVM: PPC: Make large pages work · 4b5c9b7f

Alexander Graf authored Jan 10, 2010

An SLB entry contains two pieces of information related to size:

  1) PTE size
  2) SLB size

The L bit defines the PTE be "large" (usually means 16MB),
SLB_VSID_B_1T defines that the SLB should span 1 GB instead of the
default 256MB.

Apparently I messed things up and just put those two in one box,
shaked it heavily and came up with the current code which handles
large pages incorrectly, because it also treats large page SLB entries
as "1TB" segment entries.

This patch splits those two features apart, making Linux guests boot
even when they have > 256MB.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

4b5c9b7f

KVM: PPC: Pass through program interrupts · 5f2b105a

Alexander Graf authored Jan 10, 2010

When we get a program interrupt in guest kernel mode, we try to emulate the
instruction.

If that doesn't fail, we report to the user and try again - at the exact same
instruction pointer. So if the guest kernel really does trigger an invalid
instruction, we loop forever.

So let's better go and forward program exceptions to the guest when we don't
know the instruction we're supposed to emulate.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

5f2b105a

KVM: PPC: Pass program interrupt flags to the guest · ff1ca3f9

Alexander Graf authored Jan 08, 2010

When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

ff1ca3f9

KVM: PPC: Fix HID5 setting code · d35feb26

Alexander Graf authored Jan 08, 2010

The code to unset HID5.dcbz32 is broken.
This patch makes it do the right rotate magic.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

d35feb26

KVM: PPC: Emulate trap SRR1 flags properly · 25a8a02d

Alexander Graf authored Jan 08, 2010

Book3S needs some flags in SRR1 to get to know details about an interrupt.

One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.

This patch implements above behavior, making WARN_ON behave like WARN_ON.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

25a8a02d

KVM: PPC: Call SLB patching code in interrupt safe manner · 021ec9c6

Alexander Graf authored Jan 08, 2010

Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.

To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:

  A small helper in linear mapped memory that does mtmsr with IR=0 and
  then RFIs info the actual handler.

Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

021ec9c6

KVM: PPC: Get rid of unnecessary RFI · bc90923e

Alexander Graf authored Jan 08, 2010

Using an RFI in IR=1 is dangerous. We need to set two SRRs and then do an RFI
without getting interrupted at all, because every interrupt could potentially
overwrite the SRR values.

Fortunately, we don't need to RFI in at least this particular case of the code,
so we can just replace it with an mtmsr and b.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

bc90923e

KVM: PPC: Implement 'skip instruction' mode · b4433a7c

Alexander Graf authored Jan 08, 2010

To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.

Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.

To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.

That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as "fetch didn't work".
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

b4433a7c

KVM: PPC: Use PACA backed shadow vcpu · 7e57cba0

Alexander Graf authored Jan 08, 2010

We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.

After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.

That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

7e57cba0

KVM: PPC: Add helpers for CR, XER · 992b5b29

Alexander Graf authored Jan 08, 2010

We now have helpers for the GPRs, so let's also add some for CR and XER.

Having them in the PACA simplifies code a lot, as we don't need to care
about where to store CC or not to overflow any integers.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

992b5b29

KVM: PPC: Use accessor functions for GPR access · 8e5b26b5

Alexander Graf authored Jan 08, 2010

All code in PPC KVM currently accesses gprs in the vcpu struct directly.

While there's nothing wrong with that wrt the current way gprs are stored
and loaded, it doesn't suffice for the PACA acceleration that will follow
in this patchset.

So let's just create little wrapper inline functions that we call whenever
a GPR needs to be read from or written to. The compiled code shouldn't really
change at all for now.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

8e5b26b5

KVM: Fix the explanation of write_emulated · 0d178975

Takuya Yoshikawa authored Jan 06, 2010

The explanation of write_emulated is confused with
that of read_emulated. This patch fix it.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0d178975

KVM: VMX: Enable EPT 1GB page support · 878403b7

Sheng Yang authored Jan 05, 2010

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

878403b7

KVM: x86: Rename gb_page_enable() to get_lpage_level() in kvm_x86_ops · 17cc3935

Sheng Yang authored Jan 05, 2010

Then the callback can provide the maximum supported large page level, which
is more flexible.

Also move the gb page support into x86_64 specific.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

17cc3935

KVM: x86: Moving PT_*_LEVEL to mmu.h · c9c54174

Sheng Yang authored Jan 05, 2010

We can use them in x86.c and vmx.c now...
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c9c54174

KVM: PPC: Enable lightweight exits again · 97c4cfbe

Alexander Graf authored Jan 04, 2010

The PowerPC C ABI defines that registers r14-r31 need to be preserved across
function calls. Since our exit handler is written in C, we can make use of that
and don't need to reload r14-r31 on every entry/exit cycle.

This technique is also used in the BookE code and is called "lightweight exits"
there. To follow the tradition, it's called the same in Book3S.

So far this optimization was disabled though, as the code didn't do what it was
expected to do, but failed to work.

This patch fixes and enables lightweight exits again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

97c4cfbe

KVM: PPC: Fix typo in rebolting code · b480f780

Alexander Graf authored Jan 04, 2010

When we're loading bolted entries into the SLB again, we're checking if an
entry is in use and only slbmte it when it is.

Unfortunately, the check always goes to the skip label of the first entry,
resulting in an endless loop when it actually gets triggered.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b480f780

KVM: avoid taking ioapic mutex for non-ioapic EOIs · 46a929bc

Avi Kivity authored Dec 28, 2009

When the guest acknowledges an interrupt, it sends an EOI message to the local
apic, which broadcasts it to the ioapic.  To handle the EOI, we need to take
the ioapic mutex.

On large guests, this causes a lot of contention on this mutex.  Since large
guests usually don't route interrupts via the ioapic (they use msi instead),
this is completely unnecessary.

Avoid taking the mutex by introducing a handled_vectors bitmap.  Before taking
the mutex, check if the ioapic was actually responsible for the acked vector.
If not, we can return early.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

46a929bc

KVM: Fill out ftrace exit reason strings · f4c9e87c

Avi Kivity authored Dec 28, 2009

Some exit reasons missed their strings; fill out the table.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f4c9e87c

KVM: Bump maximum vcpu count to 64 · 0680fe52

Avi Kivity authored Dec 27, 2009

With slots_lock converted to rcu, the entire kvm hotpath on modern processors
(with npt or ept) now scales beautifully.  Increase the maximum vcpu count to
64 to reflect this.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0680fe52

KVM: convert slots_lock to a mutex · 79fac95e
Marcelo Tosatti authored Dec 23, 2009
```
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
79fac95e
KVM: switch vcpu context to use SRCU · f656ce01
Marcelo Tosatti authored Dec 23, 2009
```
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
f656ce01
KVM: convert io_bus to SRCU · e93f8a0f
Marcelo Tosatti authored Dec 23, 2009
```
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
e93f8a0f

KVM: x86: switch kvm_set_memory_alias to SRCU update · a983fb23

Marcelo Tosatti authored Dec 23, 2009

Using a similar two-step procedure as for memslots.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a983fb23

KVM: use SRCU for dirty log · b050b015
Marcelo Tosatti authored Dec 23, 2009
```
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
b050b015

KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update · bc6678a3

Marcelo Tosatti authored Dec 23, 2009

Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.

Also simplifies kvm_handle_hva locking.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

bc6678a3

KVM: use gfn_to_pfn_memslot in kvm_iommu_map_pages · 3ad26d81

Marcelo Tosatti authored Dec 23, 2009

So its possible to iommu map a memslot before making it visible to
kvm.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

3ad26d81

KVM: introduce gfn_to_pfn_memslot · 506f0d6f

Marcelo Tosatti authored Dec 23, 2009

Which takes a memslot pointer instead of using kvm->memslots.

To be used by SRCU convertion later.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

506f0d6f

KVM: split kvm_arch_set_memory_region into prepare and commit · f7784b8e
Marcelo Tosatti authored Dec 23, 2009
```
Required for SRCU convertion later.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
f7784b8e

KVM: modify alias layout in x86s struct kvm_arch · fef9cce0

Marcelo Tosatti authored Dec 23, 2009

Have a pointer to an allocated region inside x86's kvm_arch.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

fef9cce0

KVM: modify memslots layout in struct kvm · 46a26bf5

Marcelo Tosatti authored Dec 23, 2009

Have a pointer to an allocated region inside struct kvm.

[alex: fix ppc book 3s]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

46a26bf5

KVM: trivial document fixes · 2044892d

Wu Fengguang authored Dec 24, 2009

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

2044892d

KVM: powerpc: Change maintainer · ddf0289d

Alexander Graf authored Dec 20, 2009

Progress on KVM for Embedded PowerPC has stalled, but for Book3S there's quite
a lot of work to do and going on.

So in agreement with Hollis and Avi, we should switch maintainers for PowerPC.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

ddf0289d

KVM: powerpc: Remove AGGRESSIVE_DEC · 0bb1fb71

Alexander Graf authored Dec 21, 2009

Because we now emulate the DEC interrupt according to real life behavior,
there's no need to keep the AGGRESSIVE_DEC hack around.

Let's just remove it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

0bb1fb71

KVM: powerpc: Improve DEC handling · 7706664d

Alexander Graf authored Dec 21, 2009

We treated the DEC interrupt like an edge based one. This is not true for
Book3s. The DEC keeps firing until mtdec is issued again and thus clears
the interrupt line.

So let's implement this logic in KVM too. This patch moves the line clearing
from the firing of the interrupt to the mtdec emulation.

This makes PPC64 guests work without AGGRESSIVE_DEC defined.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

7706664d