Conversation
| } | ||
|
|
||
| #[derive(Clone, Debug)] | ||
| #[derive(Clone, Debug, Default)] |
There was a problem hiding this comment.
SyncPoint being default helps a lot with the ergonomics, nice!
There was a problem hiding this comment.
Nice! I want to put this into the engine and see what I can to. I might need to have several encoders per frame now which is fun!
Thanks for implementing it so quickly :)
The ping-pong raytrace fits perfectly into this API.
The other use-case I have is that one initial encoder enqueues some work, then we fire off the main encoder render task and some async compute encoder tasks that depend on that initial work being done. With this API the the async queue submission and main queue submission both can take in the initial encoder's syncpoint and the initial encoder's submit can take in the previous async compute syncpoint and main render syncpoint. I think this works too!
| @@ -55,14 +54,14 @@ impl FramePacer { | |||
| } | |||
|
|
|||
| pub fn end_frame(&mut self, context: &blade_graphics::Context) -> &blade_graphics::SyncPoint { | |||
There was a problem hiding this comment.
Probably not useful for you, but we actually use a version of this framepacer and temporary resources concept in our engine too :) Though I added support for all kinds of resources as well as more than just one in flight. So I will need to add the after: &[SyncPoint], here or somehow integrate it into the pacer
| /// Enable multi-queue support (async compute and transfer). | ||
| /// When enabled, every `submit` call must provide explicit | ||
| /// synchronization via a non-empty list of sync points. | ||
| pub multi_queue: bool, |
There was a problem hiding this comment.
Now we can request multi_queue, but it is difficult to inspect the context we get back to determine if internal async compute queue is truly a unique queue or just the same as main.
In the Game, if we have true async compute we will want to do our resource "Ping-Ponging", but if we don't it would be nice to only allocate one set of probe data resources and render and sample the same all the time.
So, maybe it would be nice to be able to query somehow if they are all the same queues really under the hood after Context creation, maybe Context::get_selected_queue_id(&self, kind: QueueType) -> u32, which would allow us to know if they are just the same..
Or, Context::enumerate could report more details in the DeviceInfo about the queues as well.
But maybe I am worrying about something that isn't really an issue and this is too nieche to expose.. Its not the end of the world if we have unnecessary probe datum
There was a problem hiding this comment.
I wanted to add this to Capabilities at some point but then it slipped. I'll add it.
|
I just wanted to demonstrate in one of the examples how to specify dependencies. But you are totally right to expect the actual parallelism in the particle example. I'll look into it more. |
|
@EriKWDev please take another look. |
|
Nice with getting access to what queues are available! I'm afraid we did the same thing with the example xD But nice, now the example contains the parallelism. The problem I tried to point out with the measurements was that my gains were very inconsistent frame to frame, and the best performance I got when the compute task overlapped with the vertex shading in this particle example. But with this API I don't have the granularity to specify that it specifically run with the vertex and not the fragment. In the game however, multiple encoders should be enough. We probably want the raytracing compute to always overlap with the shadowmap generation and reflection pass, and I think I can achieve it with multiple encoders and correct My experiment also seemed to suggest that removing the barriers internal to blade had about a similar effect to the async compute^ for this workload, but having manual barriers is perhaps undesirable. Though, I could see a way for blade to expose it with Also, for correctness of the simulation, I believe there has to be a buffer copy as well so that each compute task works on the resulting data of the most recent one if let mut transfer = self.compute_encoder.transfer("copy") {
let prev_compute_idx = draw_idx; // only one in flight atm
let a = &self.particle_systems[prev_compute_idx];
let b = &self.particle_systems[compute_idx];
// NOTE: an indirect_dispatch or at least just compute pass could copy only the alive particles..
transfer.copy_buffer_to_buffer(
a.particle_buf.at(0),
b.particle_buf.at(0),
a.particle_buf.size(),
);
transfer.copy_buffer_to_buffer(
a.free_list_buf.at(0),
b.free_list_buf.at(0),
a.free_list_buf.size(),
);
} |
|
Yeah, I'm not sure how I feel about skipping the barriers between passes. Something like this should be very explicit about what you are expecting to run at the same time. I'll think about it. |
Yeah the barriers was just an experiment on my part and not related really, though I did open #343 as we have a perhaps much more motivating example for custom barriers, or just a single little pass without an automatic barrier. |
|
Interesting. So in this case multi-queue would not help you. I'm actually surprised you are seeing this much cost for the barriers. |
Closes #329