Looking for information about adding GPGPU functionality to the CuraEngine?

kdeck · October 30, 2014

I am doing a project for a introduction to parallel computing course and we are tasked with finding somecode and trying to speed it up with GPGPU functionality. I first had seen Slic3r and was going to use that but I saw how well documented and supported Cura is and want to switch before I start on the project (writing a introduction right now).

So I was wondering, I haven't gone over all of the code yet but I have read the readme for the CuraEngine https://github.com/Ultimaker/CuraEngine. I see that it is written in C++ using Clipper to create the layers and then non-library code to export the resulting polygons (LayerParts) to g-code. I guess I don't think I have the ability to tackle accelerating the slicing, but maybe the translationg to g-code.

Is there room in the code-base for improvement, namely would it be worth it at all to try to implement some CUDA kernels to help g-code translation?

Daid · October 30, 2014

I have no experience in CUDA. So my comments on this might be totally wrong. But I do know the Cura code, as I wrote most of it.

Most processes in Cura are serial, not parallel. For example, the translation to GCode has a dependency on the previous GCode piece, start positions and printing order is determined during the gcode export step. And while some stuff here could run in threads with in/output, it's not massively parallel that you need for CUDA as far as I know.

What could potentially be done is looking 2 steps. First step is "generating insets" and the second one is "generating up/down skin". These steps are done per layer, and do not need the information from that step for any other layer. So they could be done parallel with each handler doing 1 layer at a time.

However, I think the memory bandwidth to the GPU is quickly going to be the bottleneck instead of the actual processing.

(And the support material generation code could most likely run on the GPU, but that code sucks big balls anyhow, as it is using the wrong method)

kdeck · October 30, 2014

I have no experience in CUDA. So my comments on this might be totally wrong. But I do know the Cura code, as I wrote most of it.

Most processes in Cura are serial, not parallel. For example, the translation to GCode has a dependency on the previous GCode piece, start positions and printing order is determined during the gcode export step. And while some stuff here could run in threads with in/output, it's not massively parallel that you need for CUDA as far as I know.

What could potentially be done is looking 2 steps. First step is "generating insets" and the second one is "generating up/down skin". These steps are done per layer, and do not need the information from that step for any other layer. So they could be done parallel with each handler doing 1 layer at a time.

However, I think the memory bandwidth to the GPU is quickly going to be the bottleneck instead of the actual processing.

(And the support material generation code could most likely run on the GPU, but that code sucks big balls anyhow, as it is using the wrong method)

Hmm, I guess it might be worth trying for my purposes. Though it might not offer any real speedup for all cases, maybe in a few cases parallelism could be exploited. I think I might go for it. What is the purpose of generating insets and up/down skins?

A related thought, though I don't know what I am talking about yet. If you generated move codes for a polygon, couldn't you generated a g-code for each line in the polygon in parallel then sort them based on endpoints so the the proper order of moves is found? Say we have a square, I could find the g-code for each line in the square, then order the g-code moves (basicaly build a "linked list") based on their end and start points, then each polygon in the layer could be order again to get the proper movement of the head for that layer?

Daid · October 31, 2014

Generating the GCode from a polygon is something that takes no time at all.

Insets and up/down skins are the two most important things in the whole slicing. Insets are also known as "perimeters" and up/down skin is also called "top/bottom infill" in other tools.

codemaven · October 31, 2014

couldn't you generated a g-code for each line in the polygon in parallel then sort them based on endpoints so the the proper order of moves is found? Say we have a square, I could find the g-code for each line in the square, then order the g-code moves (basicaly build a "linked list") based on their end and start points, then each polygon in the layer could be order again to get the proper movement of the head for that layer?

I'm not overly familliar with the Cura code myself, but in essense I don't think there's any need to sort the list of path segments based on any solid ordering criteria... You have is a 'travelling salesman' problem, which is notoriously difficult to solve optimally. But as far as TSM problems go, this one isn't the most challanging and Cura already does a good job at it and I doubt it really accounts for more than a few milliseconds worth of the overall slicing time. You basically just have a list of unordered discreet line segments, What you are looking to do is link the lines together in such a way as to minimize machine travel and how you go about doing this is open to all sorts of different approaches.

I don't think it's particulairly feasible to tackle the sorting algorithm itself in parallel since to find the next nearest point you need to know your current point, thus it's an intrinsically serial process again.. However it could be interesting to explore and evaluate multiple target solutions to the problem in parallel to choose an optimal one. I.e. you could choose different starting points for the printhead and run different algorithms for path finding and compre them at the end to see which path would be the fastest to actually print. Perhaps a CUDA implementation of 'simulated annealing' with running multiple simulations in parallel and comparing the results to find an optimal path. Or, you could also apply genetic programming and use parallel evaluations of fitness and possible branching evolutions.

All in all, this could be very interesting work and very educational, but I doubt you'd be able to really improve significantly on what Cura is already doing. With complex path optimization you may be able to reduce actual print time by a few seconds without significantly increasing slicing time... but I doubt in reality that the results would be worth the effort in any economical sense. However, since your goal isn't specifically to improve Cura's speed but only to learn and experiment then I encourage you to proceed and look forward to hearing your results.

Cheers,

Troy.

kdeck · November 17, 2014

Hello, sorry that I haven't posted in a while I have finally found the time. I am now starting on this project and was looking for guidence on the setup.

I current have a Windows 7 desktop and am not trying to switch or dual boot to *nix, it is also the only machine I have that has an Nvdia GPU which I need so I can run the CUDA code. What is the recommended or what could anyone recommended I use to develop with Cura? I know it is a mix of Python and C++, and I would be working with the engine which is C++ so could I use Visual Studio 2013 (has the NSight CUDA Plugin) along with its Git integration?

Another idea I had is to use a virtual machine running Ubuntu, but I am not sure if VMWare can expose the, I guess, "raw GPU" to the VM? When I used "lspci | grep VGA" on a VM I have running I saw that the adapter was listed as something like "VMWare VGA Adapter" and not the graphics card I have. I know this is way off the forums expertise but might there be a way to use a VM running ubuntu?

----------------------------

As for the implementation I think I am going to aim for the simulated annealing/genetic algorithm approach to finding faster target model to g-code outputs. I had an AI class last semester that really interested me and I like the idea of GA's very much. Also, I don't have as much time as I would like so I think approaching it this way gives me less technical details to try to understand about how the actuall paths are determined, all I need to do is tweak the input to determine the best path times I can in a reasonable time.

I think/want the GA should/to specifically target finding the fastest estimated print times for a model. I would assume (sorry, research han't happened heavily yet) that the current software already estimates print times and therefore supplies a nice cost function for the GA (Any other ideas? Maybe the ratio of print time to print resolution/speed), thus I think I need to determine the parameters that will evolve over time.

kdeck · November 17, 2014

Another big concern I just had, which might wreck the entire GA idea. Approaching it from the parallel side, would an entire model to g-code calculation even be able to take place on a single CUDA thread? I guess would the GA idea be two fold, porting the entire generation process to a CUDA kernel and then running a GA over the kernel inputs?

kdeck · November 20, 2014

So I have read more into the code and have some questions and realize that the GA approach won't work very well for my situation.

I am trying to understand the entire process and specifically looking at the insets and up/down skins in the processSliceData(...) call. I am going to use a cube for an example.

From reading the code I would image the process of generating insets is inscribing cubes within the current one of size cubeLegLength-offset a few times to create a "wall" on each face of the cube, because if you simply traced one line for the exterior of the cube the final print would turn out quite flimsy. Extending this to any model, its the processes of giving the manifold that represents the models 3D outline thickness so that when it is print it has thickness? I can tell because in inset.h the main functionality first adds an offset to the outline and saves that polygon in the inset vector, and then continually does that with offset*i up to i = number of insets.

I think I have that down but I am not sure about the skin.cpp functions. Both functions loop until they go over each LayerPart in the model. They then generate "thinWalls" and adds that to each the upskin and downskin Polygons. I see they then look up and down in the model layers to create the up and down skin. It then unions the up and down skin polygons to make a skinOutline, which I have no idea what it represents. Then removes some polygons if they are not big enough than some thresehold.

---------------

After this research I think it looks like I might be able to attempt writing a CUDA kernel for generating insets. I am still currently having a difficult time getting a environment setup to test some code in. It looks like I might be able to use Visual Studio with Git along with Python tools for the UI. Would it be possible to to just build a CuraEngine and drop it somewhere in the Cura application on my desktop and test it?

kdeck · November 23, 2014

Well I have tried unsuccessfully for a few days to get some sort of environment setup to program in. First thing I did was to download the source to CuraEngine as a zip and load the files into visual studio in a new CUDA project. I kept encountering errors from the compiler and fixing them but more and more would pop up, I assume this is because the engine is meant to be developed under linux. So I have tentatively ruled out developing on windows.

So what I have done since then is to try to setup an Ubuntu virtual machine to get the software built. I installed git and g++ and managed to clone and make the software (with a few warnings from clipper). I also downloaded and installed the CUDA toolkit in the VM. I am now notsure how to modify the makefile to properly build the software with the CUDA libraries referenced. It's worth noting I am not very good with compilers or cross platform software.

So now I am waiting to figure out how to do that. But all of this hinges on the fact that I think I should be able to build the software on the virtual machine (under linux) and copy the resulting executable into an already working Cura application on my physical windows machine so that the new software has access to a CUDA enabled GPU. Is this possible or correct thinking?

A contigency plan I have created, and failed terribly so far, is to uninstall my current drives and install a blank drive and install linux on it (the drive I have has something wrong with it, Ubuntu refused to install; trying a new one here in an hour or so once I get it). Though, I assume, that I won't be able to get the proper drivers for the hardware in my machine. So that might not work at all. Though I think I could go without all the audio and other drivers and jsut try to get the graphics driver...

This brings me to the question I should have asked a while back. What environment do the Cura developers use?

-----

I am also still trying to understand what the process of generating insets and up/down skins is specifically.

nallath · November 24, 2014

The engine was developed on windows. We use codeblocks as our IDE.

kdeck · November 24, 2014

I guess I didn't have the Visual Studio project setup properly then. I kept getting compile errors complaining about sys/time.h which is a linux header, so I figured I had to use linux. I guess with that response I am more lost than I thought. Not sure where to go with this now. I'll report later if I have anything specific to ask. I think I will retry the Visual Studio project.

---------

Still trying to find a complete descriptions of what insets and skins are.

Daid · November 25, 2014

sys/time.h is also provided by the mingw compiler, which provides some easy to supply posix/linux compatible headers where possible. Which makes cross platform developing a lot easier.

kdeck · November 25, 2014

I guess this presents a big issue, though I have no solid knowledge of compilers and the such. But CUDA has its own compiler called NVCC. Seeing as the CuraEngine is written with regards to MinGW, does this make it impossible to do what I aim to do?

codemaven · November 26, 2014

Impossible... I doubt it.... Extreamly difficult and frustrating - certainly.

Cheers,

Troy.

Daid · November 26, 2014

I guess this presents a big issue, though I have no solid knowledge of compilers and the such. But CUDA has its own compiler called NVCC. Seeing as the CuraEngine is written with regards to MinGW, does this make it impossible to do what I aim to do?

Well, time.h isn't strictly needed, as it's just used for reporting how long certain processes took. Most of the code is quite simple plain C++ with use of std::vector (which is also very standard and supported by all compilers). So you might just have to strip out some parts that you do not need.

Looking for information about adding GPGPU functionality to the CuraEngine?

Recommended Posts

kdeck 0

Link to post

Share on other sites

Daid 306

Link to post

Share on other sites

kdeck 0

Link to post

Share on other sites

Daid 306

Link to post

Share on other sites

codemaven 0

Link to post

Share on other sites

kdeck 0

Link to post

Share on other sites

kdeck 0

Link to post

Share on other sites

kdeck 0

Link to post

Share on other sites

kdeck 0

Link to post

Share on other sites

nallath 1,124

Link to post

Share on other sites

kdeck 0

Link to post

Share on other sites

Daid 306

Link to post

Share on other sites

kdeck 0

Link to post

Share on other sites

codemaven 0

Link to post

Share on other sites

Daid 306

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Our picks

Picked By

Picked By