hvt: x86_64: Allow for up to 4GB guest size by reynir · Pull Request #577 · Solo5/solo5

reynir · 2024-05-07T19:35:43Z

The following is a commit done by @mato. I am creating this draft PR to open up for review and hopefully eventually have it merged. I may add comments or changes that I find helpful in understanding the code.

Allow for up to 4GB guest size on x86_64 by using up to four PDE entries.

TODO: Lightly tested only, not sure if this arrangement will conflict with any platform memory holes that "plain" KVM may map into guest memory.

Allow for up to 4GB guest size on x86_64 by using up to four PDE entries. TODO: Lightly tested only, not sure if this arrangement will conflict with any platform memory holes that "plain" KVM may map into guest memory.

Kensan · 2024-05-07T20:14:14Z

I reviewed the construction of the new page table mappings and they look good to me.

While looking at the construction of hvt->mem I noticed that one assumption is that the underlying memory is zero initialised.

For hvt_kvm.c and hvt_openbsd.c, this is the case since mmap with MAP_ANONYMOUS is guaranteed to return zero'ed memory.
For hvt_freebsd, the mmap call is only MAP_SHARED. Maybe somebody familiar with FreeBSD can comment on the guarantees that are given in this case.

hannesm · 2024-05-07T21:08:53Z

Interesting observation. I have used this patch for some unikernels on FreeBSD, it worked nicely. But I don't quite understand the semantics (and FreeBSD mmap(2) man page doesn't guarantee zeroed out memory (also not for MAP_ANONYMOUS).

tenders/hvt/hvt_freebsd.c: hvt->mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_SHARED, hvb->vmfd, 0);

tenders/hvt/hvt_kvm.c: hvt->mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);

tenders/hvt/hvt_openbsd.c: p = mmap(NULL, vmr->vmr_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);

So, why does (a) FreeBSD use the vmfd (/dev/vmm/solo5-)? Why does OpenBSD use MAP_PRIVATE (instead of MAP_SHARED)? Is it worth to unify the flags across the platforms?

One PDE is 0x1000, but we use four.

reynir · 2024-05-31T10:52:37Z

I think this is ready to review. I added a minor comment why X86_PDE_SIZE is now four times as much. I think there could be some explanation on how we build these tables and maybe a reference to their documentation. Then again, maybe it is self-evident if you're familiar with them already. Maybe @Kensan has an opinion?

Another question is "why 4 GB?" -- and I think the answer is partly for simplicity, and maybe partly because this is the limit for aarch64 (for reasons I don't think apply to x86).

Kensan · 2024-06-10T18:07:09Z

Interesting observation. I have used this patch for some unikernels on FreeBSD, it worked nicely. But I don't quite understand the semantics (and FreeBSD mmap(2) man page doesn't guarantee zeroed out memory (also not for MAP_ANONYMOUS).

According to David Chisnall, the current usage guarantees zeroized memory, see this thread.

Kensan · 2024-06-10T18:13:25Z

I think this is ready to review. I added a minor comment why X86_PDE_SIZE is now four times as much. I think there could be some explanation on how we build these tables and maybe a reference to their documentation. Then again, maybe it is self-evident if you're familiar with them already. Maybe @Kensan has an opinion?

The most helpful reference would be Intel SDM Volume 3A, section 4.5 4-Level Paging and 5-Level Paging and Figure 4-9. Linear-Address Translation to a 2-MByte Page using 4-Level Paging. I think it is helpful to have this reference somewhere. If not in the code then at least in one of the commit messages but a comment would probably preferable.

Another question is "why 4 GB?" -- and I think the answer is partly for simplicity, and maybe partly because this is the limit for aarch64 (for reasons I don't think apply to x86).

I do not know what the reasoning was but probably that it is a nice round number and "should be enough for almost anyone" (tm). On x86_64 it is easy to add 1-GB by simply adding another, fully populated Page Directory (PD).

Since there were no real code changes, my conclusion from the first review still stands and from my side I think it is looking good.

hannesm · 2024-10-09T14:01:42Z

This PR still has a WIP prefix -- but I consider it to be done, and ready for being merged. Any ideas what is missing?

hannesm

looks fine

reynir · 2024-10-09T15:37:04Z

I marked it as ready for review (non-WIP) a while ago, but I didn't realize it still had WIP in the title.

The only thing maybe missing would be some comments or documentation.

On how to build the page table mappings. Co-authored-by: Adrian-Ken Rueegsegger <[email protected]>

reynir · 2024-10-09T15:40:48Z

I think this is ready to review. I added a minor comment why X86_PDE_SIZE is now four times as much. I think there could be some explanation on how we build these tables and maybe a reference to their documentation. Then again, maybe it is self-evident if you're familiar with them already. Maybe @Kensan has an opinion?

The most helpful reference would be Intel SDM Volume 3A, section 4.5 4-Level Paging and 5-Level Paging and Figure 4-9. Linear-Address Translation to a 2-MByte Page using 4-Level Paging. I think it is helpful to have this reference somewhere. If not in the code then at least in one of the commit messages but a comment would probably preferable.

I added the references (without looking them up myself! :/) as you wrote them.

Kensan · 2024-10-09T15:55:20Z

I just checked the references against the latest Intel SDM Volume 3A, from June 2024 (Order Number: 253668-084US). They are still correct.

dinosaure · 2024-10-10T14:36:37Z

Thanks!

WIP: hvt: x86_64: Allow for up to 4GB guest size

d379d82

Allow for up to 4GB guest size on x86_64 by using up to four PDE entries. TODO: Lightly tested only, not sure if this arrangement will conflict with any platform memory holes that "plain" KVM may map into guest memory.

Document why X86_PDE_SIZE is 0x4000

1d8b78c

One PDE is 0x1000, but we use four.

reynir marked this pull request as ready for review May 31, 2024 10:36

hannesm approved these changes Oct 9, 2024

View reviewed changes

reynir changed the title ~~WIP: hvt: x86_64: Allow for up to 4GB guest size~~ hvt: x86_64: Allow for up to 4GB guest size Oct 9, 2024

Add a reference to Intel documentation

abfb52d

On how to build the page table mappings. Co-authored-by: Adrian-Ken Rueegsegger <[email protected]>

dinosaure merged commit 978fb95 into Solo5:master Oct 10, 2024

reynir deleted the 4gb branch October 11, 2024 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hvt: x86_64: Allow for up to 4GB guest size#577

hvt: x86_64: Allow for up to 4GB guest size#577
dinosaure merged 3 commits intoSolo5:masterfrom
reynir:4gb

reynir commented May 7, 2024

Uh oh!

Kensan commented May 7, 2024 •

edited

Loading

Uh oh!

hannesm commented May 7, 2024

Uh oh!

reynir commented May 31, 2024

Uh oh!

Kensan commented Jun 10, 2024

Uh oh!

Kensan commented Jun 10, 2024

Uh oh!

hannesm commented Oct 9, 2024

Uh oh!

hannesm left a comment

Uh oh!

reynir commented Oct 9, 2024

Uh oh!

reynir commented Oct 9, 2024

Uh oh!

Kensan commented Oct 9, 2024

Uh oh!

dinosaure commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

reynir commented May 7, 2024

Uh oh!

Kensan commented May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hannesm commented May 7, 2024

Uh oh!

reynir commented May 31, 2024

Uh oh!

Kensan commented Jun 10, 2024

Uh oh!

Kensan commented Jun 10, 2024

Uh oh!

hannesm commented Oct 9, 2024

Uh oh!

hannesm left a comment

Choose a reason for hiding this comment

Uh oh!

reynir commented Oct 9, 2024

Uh oh!

reynir commented Oct 9, 2024

Uh oh!

Kensan commented Oct 9, 2024

Uh oh!

dinosaure commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Kensan commented May 7, 2024 •

edited

Loading