I have no experience in CUDA. So my comments on this might be totally wrong. But I do know the Cura code, as I wrote most of it.
Most processes in Cura are serial, not parallel. For example, the translation to GCode has a dependency on the previous GCode piece, start positions and printing order is determined during the gcode export step. And while some stuff here could run in threads with in/output, it's not massively parallel that you need for CUDA as far as I know.
What could potentially be done is looking 2 steps. First step is "generating insets" and the second one is "generating up/down skin". These steps are done per layer, and do not need the information from that step for any other layer. So they could be done parallel with each handler doing 1 layer at a time.
However, I think the memory bandwidth to the GPU is quickly going to be the bottleneck instead of the actual processing.
(And the support material generation code could most likely run on the GPU, but that code sucks big balls anyhow, as it is using the wrong method)
Hmm, I guess it might be worth trying for my purposes. Though it might not offer any real speedup for all cases, maybe in a few cases parallelism could be exploited. I think I might go for it. What is the purpose of generating insets and up/down skins?
A related thought, though I don't know what I am talking about yet. If you generated move codes for a polygon, couldn't you generated a g-code for each line in the polygon in parallel then sort them based on endpoints so the the proper order of moves is found? Say we have a square, I could find the g-code for each line in the square, then order the g-code moves (basicaly build a "linked list") based on their end and start points, then each polygon in the layer could be order again to get the proper movement of the head for that layer?
Recommended Posts
Daid 306
I have no experience in CUDA. So my comments on this might be totally wrong. But I do know the Cura code, as I wrote most of it.
Most processes in Cura are serial, not parallel. For example, the translation to GCode has a dependency on the previous GCode piece, start positions and printing order is determined during the gcode export step. And while some stuff here could run in threads with in/output, it's not massively parallel that you need for CUDA as far as I know.
What could potentially be done is looking 2 steps. First step is "generating insets" and the second one is "generating up/down skin". These steps are done per layer, and do not need the information from that step for any other layer. So they could be done parallel with each handler doing 1 layer at a time.
However, I think the memory bandwidth to the GPU is quickly going to be the bottleneck instead of the actual processing.
(And the support material generation code could most likely run on the GPU, but that code sucks big balls anyhow, as it is using the wrong method)
Link to post
Share on other sites