Skip to content

hvt: x86_64: Allow for up to 4GB guest size#577

Merged
dinosaure merged 3 commits intoSolo5:masterfrom
reynir:4gb
Oct 10, 2024
Merged

hvt: x86_64: Allow for up to 4GB guest size#577
dinosaure merged 3 commits intoSolo5:masterfrom
reynir:4gb

Conversation

@reynir
Copy link
Copy Markdown
Contributor

@reynir reynir commented May 7, 2024

The following is a commit done by @mato. I am creating this draft PR to open up for review and hopefully eventually have it merged. I may add comments or changes that I find helpful in understanding the code.


Allow for up to 4GB guest size on x86_64 by using up to four PDE entries.

TODO: Lightly tested only, not sure if this arrangement will conflict with any platform memory holes that "plain" KVM may map into guest memory.

Allow for up to 4GB guest size on x86_64 by using up to four PDE
entries.

TODO: Lightly tested only, not sure if this arrangement will conflict
with any platform memory holes that "plain" KVM may map into guest
memory.
@Kensan
Copy link
Copy Markdown
Contributor

Kensan commented May 7, 2024

I reviewed the construction of the new page table mappings and they look good to me.

While looking at the construction of hvt->mem I noticed that one assumption is that the underlying memory is zero initialised.

  • For hvt_kvm.c and hvt_openbsd.c, this is the case since mmap with MAP_ANONYMOUS is guaranteed to return zero'ed memory.
  • For hvt_freebsd, the mmap call is only MAP_SHARED. Maybe somebody familiar with FreeBSD can comment on the guarantees that are given in this case.

@hannesm
Copy link
Copy Markdown
Contributor

hannesm commented May 7, 2024

Interesting observation. I have used this patch for some unikernels on FreeBSD, it worked nicely. But I don't quite understand the semantics (and FreeBSD mmap(2) man page doesn't guarantee zeroed out memory (also not for MAP_ANONYMOUS).

tenders/hvt/hvt_freebsd.c: hvt->mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_SHARED, hvb->vmfd, 0);

tenders/hvt/hvt_kvm.c: hvt->mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);

tenders/hvt/hvt_openbsd.c: p = mmap(NULL, vmr->vmr_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);

So, why does (a) FreeBSD use the vmfd (/dev/vmm/solo5-)? Why does OpenBSD use MAP_PRIVATE (instead of MAP_SHARED)? Is it worth to unify the flags across the platforms?

One PDE is 0x1000, but we use four.
@reynir reynir marked this pull request as ready for review May 31, 2024 10:36
@reynir
Copy link
Copy Markdown
Contributor Author

reynir commented May 31, 2024

I think this is ready to review. I added a minor comment why X86_PDE_SIZE is now four times as much. I think there could be some explanation on how we build these tables and maybe a reference to their documentation. Then again, maybe it is self-evident if you're familiar with them already. Maybe @Kensan has an opinion?

Another question is "why 4 GB?" -- and I think the answer is partly for simplicity, and maybe partly because this is the limit for aarch64 (for reasons I don't think apply to x86).

@Kensan
Copy link
Copy Markdown
Contributor

Kensan commented Jun 10, 2024

Interesting observation. I have used this patch for some unikernels on FreeBSD, it worked nicely. But I don't quite understand the semantics (and FreeBSD mmap(2) man page doesn't guarantee zeroed out memory (also not for MAP_ANONYMOUS).

According to David Chisnall, the current usage guarantees zeroized memory, see this thread.

@Kensan
Copy link
Copy Markdown
Contributor

Kensan commented Jun 10, 2024

I think this is ready to review. I added a minor comment why X86_PDE_SIZE is now four times as much. I think there could be some explanation on how we build these tables and maybe a reference to their documentation. Then again, maybe it is self-evident if you're familiar with them already. Maybe @Kensan has an opinion?

The most helpful reference would be Intel SDM Volume 3A, section 4.5 4-Level Paging and 5-Level Paging and Figure 4-9. Linear-Address Translation to a 2-MByte Page using 4-Level Paging. I think it is helpful to have this reference somewhere. If not in the code then at least in one of the commit messages but a comment would probably preferable.

Another question is "why 4 GB?" -- and I think the answer is partly for simplicity, and maybe partly because this is the limit for aarch64 (for reasons I don't think apply to x86).

I do not know what the reasoning was but probably that it is a nice round number and "should be enough for almost anyone" (tm). On x86_64 it is easy to add 1-GB by simply adding another, fully populated Page Directory (PD).

Since there were no real code changes, my conclusion from the first review still stands and from my side I think it is looking good.

@hannesm
Copy link
Copy Markdown
Contributor

hannesm commented Oct 9, 2024

This PR still has a WIP prefix -- but I consider it to be done, and ready for being merged. Any ideas what is missing?

Copy link
Copy Markdown
Contributor

@hannesm hannesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine

@reynir reynir changed the title WIP: hvt: x86_64: Allow for up to 4GB guest size hvt: x86_64: Allow for up to 4GB guest size Oct 9, 2024
@reynir
Copy link
Copy Markdown
Contributor Author

reynir commented Oct 9, 2024

I marked it as ready for review (non-WIP) a while ago, but I didn't realize it still had WIP in the title.

The only thing maybe missing would be some comments or documentation.

On how to build the page table mappings.

Co-authored-by: Adrian-Ken Rueegsegger <[email protected]>
@reynir
Copy link
Copy Markdown
Contributor Author

reynir commented Oct 9, 2024

I think this is ready to review. I added a minor comment why X86_PDE_SIZE is now four times as much. I think there could be some explanation on how we build these tables and maybe a reference to their documentation. Then again, maybe it is self-evident if you're familiar with them already. Maybe @Kensan has an opinion?

The most helpful reference would be Intel SDM Volume 3A, section 4.5 4-Level Paging and 5-Level Paging and Figure 4-9. Linear-Address Translation to a 2-MByte Page using 4-Level Paging. I think it is helpful to have this reference somewhere. If not in the code then at least in one of the commit messages but a comment would probably preferable.

I added the references (without looking them up myself! :/) as you wrote them.

@Kensan
Copy link
Copy Markdown
Contributor

Kensan commented Oct 9, 2024

I just checked the references against the latest Intel SDM Volume 3A, from June 2024 (Order Number: 253668-084US). They are still correct.

@dinosaure
Copy link
Copy Markdown
Collaborator

Thanks!

@dinosaure dinosaure merged commit 978fb95 into Solo5:master Oct 10, 2024
@reynir reynir deleted the 4gb branch October 11, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants