Developing 02580

Going from C++ to Python in a number of (not always) easy steps

J. Andreas Bærentzen, January 2018

The following is a blog-entries style account of the process that we went through in changing our course on Digital Geometry Processing from being C++ based to Python based.

The change was motivated by the fact that we deliberately demoted the course from an MSc level course to an advanced, but still more basic, BSc level course. Partly, this was motivated by a need to use the course in a BSc programme, but, also, it seemed that more people could benefit from a geometry processing taught at a slightly simpler level. In fact, we were most concerned about the programming language. The core features of C++ are not that hard, but one has to understand quite a bit more to use C++ effectively than Python and so we decided to change the programming language. Precisely how to best do this was not obvious, so I decided to document the process.

Mesh manipulation

For the longest time I was torn between whether to develop my own library for polygonal mesh manipulation in Python or simply use something else.

There are not that many mesh libraries in Python but one could look farther afield: one possibility would be to simply base the course on Blender or even a CAD tool like Rhino.

However, I was a bit fearful that there might be some dealbreaker. Cost would be an issue when it comes to Rhino and while Blender is awesome, I was a bit afraid that it would take too long for people to become acquainted with Blender that is so much more than a mesh library.

I was seriously considering both OpenMesh and LibIGL. Both are excellent libraries as far as I can tell, and they both have Python bindings. I am fairly sure that both had worked for me, but it also seemed that a lot of effort would be needed if I were to base my teaching on either library. It was not that easy to make the Python bindings work in the case of OpenMesh, and in the case of libIGL, the library simply works somewhat differently from GEL and is documented mostly through examples. So … that led me back to considering whether to simply expose the GEL API to Python.

But how?

It seems that a great many people have made tools for creating Python bindings for C or even C++ libraries or otherwise facilitate interoperability. An incomplete list includes

ctypes - a part of Python’s standard library
Boost/Python
swig - a tool for creating bindings to numerous scripting languages
Cython - a compiler that supposedly compiles Python and allows you to call C.

The list goes on ….

But ctypes emerged as a clear winner. Not least because of some really insightful remarks on StackExchange by a programmer called Florian Boesch. One point he made was that ctypes is a part of the Python standard library and is completely independent of the C/C++ library. This greatly reduces versioning problems.

One downside with all of these approaches is that you cannot call a C++ method from Python (although some libraries might hide that fact) and really passing a structured type is inconvenient.

Of course, there is really no way that the abstraction mechanisms in C++ could be transparently accessed from Python since that language has different mechanisms. The problem is very fundamental: we do things differently in different programming languages. Even if a direct port of every class and every function were possible, it would not be desirable. In some cases we might want to use, say, numpy from Python instead of a direct port of our favorite C++ linear algebra library. Such substitutions just don't automate well, and we should not try.

So, we go by way of C and the (admittedly tedious) process is as follows:

Expose the features needed through a plain C API
Make those features accessible via an OO Python API that uses ctypes to call the required C functions.

Very few tricks are needed, but I had to have some way of creating an array in the C++ code and passing it back to Python. Fortunately the Python notion of a generator that quickly makes a chunk of data “iterable” made that manageable.

So, now there is a clear strategy, but how to draw things?!

Graphics

When it comes to graphics, the plethora of options is even worse than for bindings.

It seems that all graphics geeks as well as their grandmothers (both of them) have decided that if they code graphics using Python, they must create their own tool for visualisation. In some cases these tools are simply OpenGL bindings and in other cases more full fledged data visualisation libraries - and then there are the combinations. A non-exhaustive list of Python libraries for visualisation is as follows:

PyOpenGL - OpenGL bindings
Pyglet - OpenGL bindings
glumpy - OpenGL bindings
ModernGL - OpenGL bindings
VisPy - OpenGL bindings and general visualisation
Matplotlib - Matlab-like visualisation
Bokeh - Visualisation using HTML output
PyGame - OpenGL based game lib
Seaborn - no familiarity
Kivy - no familiarity

This makes it fairly hard to select a really good approach.

For the upcoming course, I needed something that could handle data visualisation and also a tool for rendering meshes.

I have some familiarity with Matplotlib, but very little and it is also not fast: it is similar to Matlab and it is extremely flexible but too big and complex for my taste.

Introducing VisPy. A library born out of the need for really efficient data visualisation. The approach is layered - allowing for both low level OpenGL access and also more high level visualisation.

Unfortunately it did not pass my only test. In a relatively short time span I must be able to figure out how to use it. It sort of worked for simple graphs but I could not make it work for the more generic visualisation tools I would need for meshes. I am also concerned that one of the Glumpy developers started contributing to VisPy but then retreated back to Glumpy.

Glumpy on the other hand also did not really work for me. First of all, installation was hard: It needs the freetype library and this, in turn, is invisible in the default location on my machine which is /opt/local/lib. I created a link from /usr/local/lib but it felt like Mac was not supported as well as I had liked. There was also the nebulous issue of snippets - little pieces of shader code that you can inject in other shaders. It sounded cool but did not work for me.

This led me to ModernGL which is a set of OpenGL bindings that allow you to create OpenGL objects with few lines of code. That sounded enticing and for a bit of time I thought I had a winner. I was able to make it work with GLUT (from PyOpenGL) but then I tried the other OpenGL window API glfw and it really did not work: some catch 22 with versions that led me to believe the combination was not robust enough.

I might have stuck with ModernGL but at that point I realised that since exposing other parts of GEL to Python through ctypes worked so well, I might as well simply use my existing C++ render tools for GEL from Python. It is a little less general than rendering from Python, but ultimately it makes sense and allows me to have an ultra simple interface of the type:

viewer = Viewer()
viewer.display(mesh)

So, now I have created bindings to my own library in order to do graphics using Python. It is nice when you live up to your own preconceived notions.

Be that as it may, while this approach is great for mesh rendering, it does not take care of the need for more general visualisation of data. Fortunately, Bokeh turned out to be great. To draw a bunch of points, all I have to do is:

output_file("scattered.html")
p = figure()
p.circle(pts[:,0],pts[:,1], size = 10)
show(p)

assuming pts is a matrix with xs in the first column and ys in the second. Bokeh is not hw accelerated I gather, but that is no longer a concern.

Graphics II

Of course what we really want is to allow students to hand in a static HTML page with snippets of Python, nicely formatted text and interactive 3D models showing results. That should not be a tall order in this day and age. Jupyter Notebooks allow for two out of these three desiderata. However, it turned out to be really tricky to get number three.

The goal was (no longer) to render superfluous the viewer described above, but simply to provide a way for the students to hand in an interactive 3D model making it much easier to gauge the quality of their results. The alternative would be that students would hand in code and words combined with screenshots. Now, if the code and words were composed as a Jupyter Notebook, having the 3D visualization also be part of that would be so much more elegant.

So, I decided to revisit the visualization question. Again, we seemed to have options - all based on systems that can output JavaScript with WebGL elements.

PyThreeJS
K3D
Plot.ly
Rolling our own based on ThreeJS

I guess this is the one instance where I simply gave up on coding things from scratch. I am not sure how hard it would have been, but my familiarity with WebGL is not that great. For a while I thought PyThreeJS would be the solution, but I had great difficulties with that library. The main problem was that even though I could output 3D models in a Jupyter Notebook, those 3D models would not survive into the static pages created when one converts the notebook to HTML. Documentation was also terrible, but that goes for many libraries. The forum users were helpful, but the issue with exporting to HTML seems to have been with nbconvert, but it looked like a long standing issue which could not be resolved by PyThreeJS developers. Unsurprisingly then, I had the exact same problem with K3D which is otherwise nice because it is fairly simple.

Finally, I turned to Plot.ly. I am not very impressed with the documentation for that library either, but it is a bit more comprehensive and finished. Importantly, it is really, really possible to create 3D plots inside a notebook, and they survive export to static HTML.

Moreover, Plot.ly is also convenient for simpler 2D plots, so it has currently replaced Bokeh and become my one stop solution for plotting in Jupyter.

One absurd and ironical twist is that when you use Plotly, Mathjax does not work. Well, inside Jupyter Notebook, you can have both Plotly figures and Mathjax converted LaTeX math formulas, but when you export a static HTML page, the math remains LaTeX code.

Memory management

One challenge is that we need to pass information back and forth between the GEL C++ library and Python and this has to be done in several different ways depending on what we aim to achieve.

When a class (such as the big old Manifold class which represents a polygonal mesh) is created in the GEL C++ library, the library returns a pointer to the created instance to the Python library where it is stored as a member variable of the corresponding class instance (as ctypes.c_void_p) in python. Later when operations are carried out on the Manifold class, we call the GEL library (via a C interface) with the pointer to allow the GEL library to do its thing.

Making this work requires quite a bit of boilerplate code, but it is easy to write and it does not seem error prone. Python's ctypes library allows us to specify the argtypes and return types for the C functions that we call. If these match there are no surprises, but if we, e.g., convert a pointer to an int, things can go wrong.

An important additional point is that we can create python objects of specific type via the ctypes API and pass a pointer to the C library. This is ok for things like a single 3D vector.

    def vertex_normal(self, vid):
        n = (c_double*3)()
        return lib.vertex_normal(self.obj, vid, byref(n))

Where things get more complicated is when an N-face is created and we need to send an Nx3 matrix of type double to the GEL library. Fortunately, as mentioned, ctypes allows us to specify the argtypes for a function, and numpy.ctypeslib.ndpointer is a function that generates an argtype which might, for instance, be a c_double pointer. Now, when this c library function is called with a numpy array, the array it is automatically converted to the right type, and a pointer is passed to the c function.

Things are a bit more sticky when we need data passed back from the GEL library to Python. We often do not know a priori the size of the buffer that is returned. Otherwise, we could make the buffer in Python and pass a pointer to the GEL library. However, that is likely to lead to almost double work in many cases. What seems to work best is to wrap C++’s vector class in a C interface and create a corresponding Python interface. In practice, I use two such Python interface classes (and C interfaces): one for vectors of 3D vectors and one for integer vectors.

It goes like this: in Python we create an IntVector by calling the C api that calls new vector<int> and returns a pointer to the created, empty vector. When we need to fill in the vector, a pointer to the vector is passed back to the library where a functions calls resize or push_back on the vector. The data can now be retrieved from Python using other functions defined in the C interface and. When we are done and Python wants to garbage collect the IntVector the __del__ method is called, and this in turn invokes the C++ destructor (again) via the C interface.

I have been wondering whether there is a slightly cleaner simpler way to pass information back and forth, but I do not think so. We inherently have to sometimes create data on the C++ side and sometimes on the Python side.