The problem with ROOT (a.k.a. The ROOT of all Evil)

This page was written in about 2003, and has been little updated since then. If I write another ROOT critique, I'll do it in a separate location, since this one is interesting as an historical artifact.


Update (03/08/2006): I thought I should update this page after some substantial discussions on the ROOT mailing list about this page and, more particularly, the ROOT page on Wikipedia, to which I had added a criticism section. Here are some Web links to the mailing list archive:

But maybe I just like it because it includes phrases like this:

In my experience, many people who use ROOT at least have vague feelings that it is making their life more difficult than it rightly should. Nearly everyone I know that writes code that other people use feel even more strongly that ROOT's poor design leads to productivity losses. I grant that it is less frequent that someone levels the criticism as succintly and accurately as Andy.

Ha! And now the original page:


This piece is aimed at researchers, primarily in high energy physics, who use the ROOT analysis software and find it lacking in lots of ways. I'm going to whinge about the things that annoy me in ROOT and then suggest a few of the things that I do to minimise the pain. Suggestions and comments are very welcome. Apologies in advance for the fact that this article may be implicitly HEP-political, but I genuinely believe that ROOT's poor design is a very dangerous thing for particle physics and the other disciplines that use it.

Before beginning, I should point out that these are simply my own views and that I hold no animosity against the developers --- their design simply doesn't work for me. Presumably there are many people "out there" who think ROOT an excellent piece of software. In _complete_ honesty, though, I have yet to meet any of them. In fact, I've never had any complaints that this article mis-represents ROOT, and I've had a fair bit of "fan mail", not mention discussions with well-respected developers and physicists who hold precisely the same views :-) If you feel this way, then you might also be interested in my articles on dealing with some of ROOT's flaws and wishlist for things to be fixed.

The problem(s) with ROOT

ROOT can be an awkward piece of software --- but unless you want to use the defunct PAW program for your data analysis there's not really anything else around that handles histograms and tuples in the way that particle physicists have come to expect.

Some of the ideas in ROOT are good --- a data analysis application, with an open, robust (in principle) API, useable as stand-alone modular libraries. Great! However ROOT has failed to meet its promise for several reasons:

  • it retains a great deal of legacy behaviour from PAW, which is commonly acknowledged to be "rubbish, but we've inherited 20 years' worth of work-arounds"

  • rather than focus on providing a stunningly good set of these core facilities, the development team has proceeded to add more and more arcane functionality to the ROOT kernel (e.g. for GUI-building). It looks to me like ROOT's design is largely driven by the needs of the Alice collaboration, with a monolithic "many into one must go" design model, which is quite depressingly inflexible and unscalable if true.

  • ROOT's class structure is very broken: it bears all the hallmarks of a project to learn C++ and OO programming that was never thrown away (as they always should be) when the initial design mistakes were realised. I consider this one of the biggest problems; even if the code-bloat and interface mis-design issues are fixed, the inherently broken class structure and functional delegation are still there and fixing them will more-or-less involve a ground-up re-write and a breaking of backwards compatibility with existing "ROOT-leveraging" code. Or just use something else.

  • insistence on re-inventing the wheel: there are plenty of external projects which supply alternative, better-developed and standardised functionality which has also been developed in the context of ROOT. Admittedly, in some cases this is due to ROOT having started first... but they should know when to swap. Examples are the C++ STL objects, e.g. std::string, and containers, which are not properly supported in ROOT (see the next point); data formats and interfaces like AIDA, FITS and HDF5; and code documentation with Doxygen (ROOT's own C++ documentation class is a travesty by comparison with Doxygen's syntax and flexibility).

  • "(Matsuhara) Goto considered harmful". The ROOT team continues to insist that ROOT's natural runtime environment is the CINT interpreter. I disagree on several levels --- first the practicalities: CINT cannot and is unlikely to ever properly interpret ANSI/ISO C++. Its current deficiencies makes several possible ROOT improvements difficult or impossible, and have forced design decisions which would never have been made otherwise, most obviously the lack of real C++ templation support, and hence STL objects. Only pre-compiled faux-STL objects are possible from within CINT, and hence ROOT's own classes cannot be template-based or use the STL. CINT also encourages sloppy coding style (no pointer/object distinction, no required semi-colons, ...) which makes conversion to proper C++ code non-trivial. A second level of criticism is that C++ is a deeply inappropriate language for a high-level activity like data analysis: it's syntactically complex and forces explicit memory management by the user (while histogramming! and made harder by ROOT's ownership semantics). This is largely alleviated by PyROOT, although CINT is for some reason still the main interface and the underlying classes are still sub-optimal.

I will now consider several of these points in more detail:

General design issues

  • Why, oh why is it called "ROOT"? If there is one name that is guaranteed to cause confusion, conflicts and general aggro, it's choosing the same name as the system admin account. I'm almost surprised that the Windows version isn't distributed under the name "Administrator". The only worse name I can think of is "/".

  • The whole system is huge and bloated. What most physicists want from ROOT is not a GUI-building system, but a statistical data analysis system. That would involve providing a large array of statistical analysis tools, wide support for input and output formats (including the AIDA interfaces and data formats like FITS, HDF4/5 and plain text comma/tab/etc.-delimited data). Basic 1, 2 and 3D plotting, contour plots, pixel plots and suchlike would also be nice, but there are external systems that can do that very well given a standard output format, so why re-invent the wheel? (ROOT could always build its plotting functionality on external libraries.)

  • Why is CINT's pseudo-"interpreted C++" considered a good user interface? C++ is good if you want to write compiled programs, but I can't imagine it's ever been thought of as a good language for interactive commands: much of the syntax is designed to enforce strong type-safety and various code-reuse software engineering solutions that no-one whats to have to think about when they try to plot a dataset, or calculate a statistical measure. Why not write the backend code in C++, provide a Python-C++ interface for those who want to do things with full programmatic power (Python because it was actually designed as an interpreted language) and provide a simple "gnuplot-style" command interface for the basic stats, data I/O and plotting functionality? (Actually, ROOT does now have a Python interface, but the class structure is so poor that it doesn't make it much easier to use -- and you have to deal with type mismatches, too, since the binding hasn't been written very well. It's nicer to use... just. I think that the class structure would make a decent gnuplot-like interface hard to do, as well, hence my comment above that even if all the other issues are dealt with, the underlying classes are so bad that ROOT is probably unfixable without breaking all backwards compatibility).

Class structure issues

  • No native STL support, even where it could be introduced seamlessly, e.g. std::string function arguments can transparently handle char* old-style C strings and are much safer and more powerful. [1]

  • Perverse inheritance structure: is a 2D histogram really a kind of 1D histogram? ROOT thinks so, to the extent that 1D histograms (happily available in TH1F, TH1I and so-on flavours for floats, integers etc. --- a prime case for templation) contain an accessor method for the histogram's z-axis. Just don't touch that method if your histogram is really 1D! I would love to see a THistogram abstract base class for all histograms (or even better, a ROOT::Histogram, but namespaces also seem to have passed them by).

  • No separation of data and presentation: if you want to ensure that the data in a histogram is unmangled by declaring it const, then you can't change its plot style either because there's no separation between the data part of a histogram and its presentation. Other systems do this much better, with some separation like Histogram objects for the data container and e.g. a HistogramPainter object which contains all the presentation aspects. This also adds the flexibility of modular design.

  • Should classes have 300 methods? ROOT thinks so. This is largely due to a flat and monolithic design whereby hundreds of convenience methods are designed which simply pass on the work to other classes. For example, histograms can fit themselves to mathematical functions --- why not a Fitter class? Well, there is one, but ROOT is "helpful" enough to hand all its methods on to unrelated classes like histograms, too. It breaks a major, empirically successful rule of software design: each object should do one job and do it well.

  • Another rule of OO design broken --- ROOT will happily delete objects that it's given, even if you want to use them again. Take this, for example:

    void test(TH1* histo1, TH1* histo2) {
      THStack* hs = new THStack();
      if (0.5 < rand()) hs->Add(histo1); else hs->Add(histo2);
      delete hs;
    }
    
    int main() {
      TH1* histo1 = new TH1F(/* ... */);
      TH1* histo2 = new TH1F(/* ... */);
    
      test(histo1, histo2);
    
      delete histo1;
      delete histo2;
    
      return EXIT_SUCCESS;
    }
    

    The code will core dump either on delete histo1 or delete histo2, because the THStack destructor deletes the contained elements, even though it doesn't own them. To use code like this, the test method has to copy the passed histos, a needless waste of memory and CPU. Gah.

Functionality issues

  • Dreadful default plot style: you might think that, data presentation being almost the primary reason for ROOT's existence, it might be good at it. Well, for some reason the default plot style is unfeasibly ugly (grey background?!) and difficult to fix. In fact to fix it you have to go via several global ROOT objects. Gah.

  • Awkward ntuple handling: in particular handling indexed tuple entries is a nightmare of obsfucation.

  • What's with the "T" prefix on everything? Hello? Even CLHEP has got the hang of namespaces now: I would much rather deal with a ROOT::Tree than a TTree. Update: I think it's now ROOT::TTree, which misses the point even further.

  • Dataset error handling is very dangerous: binomial errors are calculated when a histogram is filled and aren't updated thereafter, presumably because the user might have over-ridden the error-settings by hand. This means that if you re-scale your histogram by 0.001, the errors are likely about 100-1000 times bigger than the data peaks! Solutions might be to always re-scale the data properly and to provide histograms with an error-calculating functor or member function (or a set of such things). That way a histogram could be sub-classed and the error handling over-ridden in a scalable way. There's a mismatch here between simple user interfaces and software engineering, but since it ends up mapping on to the same dichotomy between getting the wrong result or the right one, I know what I'd pick.

  • Unusability of the ACliC compiler: for increased performance, ROOT can call g++ from within CINT and compile your ROOT macros. You'd think that that might involve taking your single file with a bunch of user macros and building a binary library file from them, i.e. adding the standard C++ and ROOT header #includes and so-on behind the scenes so that any macro that will run in CINT can be compiled in ACLiC. But that isn't the case: ACLiC needs the full set of header declarations that a full C++ program needs to already be in the file to be compiled. And it can't handle the splitting of user classes into header and implementation files, which seems to be necessary. In addition, if ACLiC fails to compile your macros file (probably for one of the above reasons e.g. missing #includes), then debugging the failure point in ACliC is very hard, specifically because it uses lots of temporary files but doesn't map the C++ compiler errors back to the CINT macro file, so the reported error won't be easily reconcilable with any of your input files. Aaargh. In short, ACLiC requires you to have written your macros as if they're C++ programs to be compiled (with full C++ syntax strictness: none of the sloppiness encouraged by CINT will work), but actually makes things harder for you than if you ran the C++ compiler explicitly because it obsfucates the compiler output. Nice one, ACLiC.

  • What's up with the whole "passing processing directives by string" rubbish? For example, to plot a histogram stack (on to the "current canvas" --- a typical example of global scope in action) I might call this monstrosity:

    _hs->Draw("HIST,E,9,NOSTACK");
    

    What sort of argument is that? For starters I don't get to specify which TCanvas to draw it on to; instead I have to do some sort of hideous gROOT->cd("mydirectoryname"); thing first. And second, there's absolutely no type safety: that string has to be decoded at runtime and the parsing rules are not clearly defined. A set of class enums or, better, a config object (or collection thereof) would be much safer. Why would you do something as horrible as this? Yep: CINT and interactive use. Lovely.

  • Global objects and some horrible concept of a "currently focussed directory"! Uurgh --- this sort of thing would be okay if all ROOT scripts were linear and no more than 20 lines long. But they're not and this is a truly nasty "feature".

  • Why, in a C++ program, is there still a horrid mush of type-unsafety? Reading objects out of ntuples requires lots of blind casts from ROOT TObject to whatever you think your persistified object is. This reminds me of C casts from <code>void*</code>, and that's simply unacceptable in a C++ system. Surely there are other C++ persistence interfaces that don't have this problem (using RTTI or similar)?

In short, ROOT sucks and isn't likely to change its ways any time soon. Sorry HEP.

What to do about all this?

The best thing to do, in my opinion, would be to take what there is of ROOT and to split it into a kernel and a set of modules and for the whole thing to take the form of a C++ library rather than an executable. The executable is really secondary to the class structure. In addition, the class structure needs overhauled, STL compliance needs to be introduced, standard I/O formats and interfaces need to be developed, external solutions need to be dropped into place in many cases, and so-on. In fact, pluggable architectures like this have given rise to excellent collections of user contributed modules elsewhere, so it's a potentially rewarding move from a community standpoint, too. I can't see it happening :-( </p> <p> Next-best, or possibly best given the unfeasability of the above and the existence of better systems anyway, is to move your analysis to a multi-stage one which ignores ROOT as much as possible, uses Hippodraw, JAS or the BaBar StatPatternRecognition code [2] to do the statistical analysis, and uses something like <a href="http://pyx.sf.net">PyX</a> or <a href="http://tech-www.informatik.uni- hamburg.de/applets/jfig/">jFig</a> to produce the publication-quality plots, again using a standard data file format (or even just columned ASCII files) for communication in the final step. Although these programs don't (currently) support 3D plots, I don't believe that these often give information that can't be expressed more clearly in several 2D plots. The exception is rendering of actual 3D systems like detector structure, which admittedly can be useful in event reconstruction analyses. Update: I nowadays recommend SciPy, matplotlib and other Python-based scientific tools to everyone unhappy with ROOT. They are simply much better tools than anything HEP has yet produced.

Actually, I like this "modular" statistical analysis and presentation idea most of all: I've only put "rehacking ROOT" as the most desirable solution due to its large, established user base, since personally I'm more than happy to leave ROOT alone entirely. You might find my list of HEP software to be useful if you are similarly-minded. I see definite parallels here with the Unix "small tools, each of which does its job well" philosophy here; it's peculiar that high-energy physics has set its heart so firmly on monolithic systems given a) its traditional centring around Unix computing and b) the obvious success of the Unix philosophy. But maybe not that surprising, given that many physicists treat computing methods with contempt, as something that gets in the way of producing good work. Rant over!

As a next-to-next-best approach, if you really aren't allowed to use anything other than ROOT (maybe you depend on a bunch of ROOT analysis macros written by someone else, very probably you need to at least read ROOT data files), we can try to use the good bits of what ROOT and minimise the interaction with the lame bits. This primarily involves ignoring CINT entirely and using ROOT as a library --- note that you will still have to deal with the world's worst class structure! Hence, in addition I try to write STL wrapper classes of my own when possible. This tends to occur on an ad-hoc basis. Note that if ROOT had been done right in the first place, no-one would ever have to do any of these things. You can find some workarounds described in my article on basic ROOT usage, which in fact contains entirely of workarounds since any attempt to do robust statistical analysis in ROOT is made hideously complicated by its flaws! If I haven't convinced you of that by now, I never will!


Thanks for reading and please feed back your thoughts to me. Hopefully someone will listen and ROOT can be made into a well-designed, robust data analysis system for the LHC. [3]

Footnotes