The best programming languages & platforms for data science in finance
We caught up with Graham Giller, the former head of data science research at JPMorgan and ex-head of primary research at Deutsche. These days, Giller is CEO of his own firm - Giller Investments - and has written a book, Adventures in Financial Data Science, out later this month.
If you're looking for develop a career in financial data, these are Giller's tips.
1. What are your favorite programming languages for data science? Why
For programming languages my practice is now almost completely concentrated on three to four platforms:
I use Python3 for data acquisition, preparation, and management, plus some computational operations that don’t fit easily into other systems. I do not use any “notebook” interfaces, I write code in an IDE that can be scheduled automatically or run manually from the command line.
I use a combination of R and more dedicated commercial time-series analysis software for inferential work. The T-S software I use (RATS) is a minority interest program that I get on well with, but is to some extent a legacy usage. I probably wouldn’t have started with it if it wasn’t the 1990’s when I began my career. I am a fan of Mathematica, but it is not a big part of my practice.
I use SQL databases extensively and pretty complex SQL queries and operations. I’m I big user of User Defined Aggregate functions, which I have written in C++, to deploy machine learning operations at scale within the SQL database. I use the database to manage organizing and scheduling the calculations, which it does very much more efficiently than I could pull off myself....
2. Does Hadoop have a future in finance (or anywhere, for that matter)?
I think big-iron NoSQL platforms, such as Hadoop and its kin, are going to fade from view. Most of their technical innovations, just as schema-free storage, column oriented storage, massive parallelization, geospatial operations, free-text operations etc., are provisioned in commercial RDBMS now and those platforms can not only provide scale but also strong data management if required. I would imagine these functions will continue to downshift into the open source platforms, such as MySQL and Postgres, over the next few years. For what I do, MySQL is my current data management platform of choice.
3. Which languages do you think are becoming more popular in data science in finance?
From my experience, I think Python3 is still in ascent. Some shops are probably still clinging to Python2, but that is a mistake. I always urge people to “fix it now” rather than “fix it later after you’ve lost money.” R is falling out of favor which, personally, I am unhappy about because it is more rooted in rigorous inference than in “coding.”
4. How is the role of the data scientist in finance changing?
The role of the data-scientist is becoming more that of an IT professional than a thought leader for organizations. Personally, I feel that this is the wrong direction, but it makes the IT leadership more comfortable and the non-technical leadership don’t realize that this is a problem.
5. What's your advice to people starting out?
For people starting out, who want to do analytics within a financial context, I would suggest spending time to learn time-series analysis and econometrics properly. Financial data has properties that make it quite difficult to deploy conventional tools on, and I see many pieces of work on venues like Medium etc. where people use very complex algorithms, the current favourite being LSTM networks, to essentially conclude that the best predictor of tomorrow's price is today’s price, or far worse than that.
Much of the work I do is quite computationally intense (hours to occasional days of compute), so it is important to understand how the algorithms you use scale with data size but do not kid yourself that you can write a better optimizer or linear algebra system than somebody who has built their career in that space. Also do not be scared of going back to square one when you find an error. If you know something is broken, it’s better to “fix it now” than labor under technical debt because that always leads to technical bankruptcy.
Have a confidential story, tip, or comment you’d like to share? Contact: firstname.lastname@example.org in the first instance. Whatsapp/Signal/Telegram also available.
Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)