As consultants to academics, we are often invited to join a project “already in progress.” Sometimes we’re called in to assist after a student has graduated and left the team, or to help a team that has hit a speed bump. In such cases we usually have the project’s programming language dictated to us by whatever the group has already chosen.
There are three languages we see most frequently chosen for scientific programming. In no particular order, they are Matlab, Python and C++.
Matlab is a great environment for playing around with your calculations until you get them right. It’s very easy to change calculations quickly and visualize the results. It is not, however, well-suited for creating applications. If you want to run your code more than a few times, or you are going to let others run it, it’s still ok to prototype in Matlab but its better write the actual application in another language.
Python, on the other hand, is well-suited for applications. It’s also has numerous advantages – such as being relatively easy to learn, and having plenty of tools available for scientific programming. When we have a choice, it is our go-to language. When we start a project from scratch we need a reason not to use Python. If we can’t find one, we use Python.
Finally, there’s C++. Oh C++. How often it lures otherwise capable researchers down an incorrect path, and seduces them into thinking its the best language choice. It rarely is. Clients often tell us that they chose C++ “because it’s faster.” That’s true. Properly written C++ code is a lot faster than code written in Python. However, coding in C++ also takes significantly longer to write and debug. There’s a very clear trade off there, and one not in your favor.
Our experience shows one side-effect of picking C++ is Premature Optimization. When you write everything in C++ you end up spending a lot of time developing code that reads and parses input files, writes output files, parses user options, etc. There’s really no reason to use C++ for that. You can use Python to accomplish the same thing in a fraction of the time, and the resulting code is small enough that the it will not be slower than the C++ code anyway.
Our standard process is this:
First we develop everything in Python and make sure we have a working (albeit a little slow) program.
Once we have a working program, we use it for a while and measure which sections are too slow. Then we try to make them faster using commonly accepted Python best-practices. We try to use numpy and scipy more efficiently, or use better suited algorithms if possible.
If our program is still too slow, we start pulling out the medium-sized guns – we take the slow code and reimplement it in Cython. Cython is an extension to Python that allows you to specify the types of your variables. This allows Cython to generate more efficient C++ code that is then compiled by your favorite C++ compiler. The final result can be 100s of times faster than Python.
Turning decent numpy-based Python code into Cython code is relatively straightforward, but does require some tinkering to get right. It’s still a lot simpler than implementing the same thing in C++.
If all this fails and the code is still too slow, we have no choice but to reimplement the slow part of the code in C++. We do this only for the slowest parts of the code. We sometimes end up with a program that mixes Python, Cython and straight C++ in a single project.
While this may sound unwieldy, it is in fact a lot better than using C++ from the start. It takes less time to implement and the resulting code is a lot easier to maintain since most of it will still be in Python. This means you can make future modifications a lot faster and spend more time researching and less time programming.