Most command buffers here are rather small (fewer than 256 words); it's
a waste of time to dynamically allocate memory for such a small buffer
when it could easily fit on the stack.
Conditionally using an on-stack command buffer when the size is small
enough eliminates the need for using a dynamically-allocated buffer most
of the time, reducing GPU command submission latency.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>