Is MATLAB a compiler?

57 visualizaciones (últimos 30 días)
Snoopy
Snoopy el 24 de Sept. de 2017
Editada: Michael Listovski el 23 de Ag. de 2020
I have been using MATLAB for some time but I am still not clear how MATLAB compares to some alternative software in the following respect. One of these software is Fortran. Some people claim that their optimisation algorithm works (much) faster in Fortran than in MATLAB or in R. When I ask why, the answer I get is that MATLAB is a compiler. What does this mean? What is that MATLAB is compiling and Fortran is not? And if it is indeed true that MATLAB is slower in some optimisation algorithm, compared to Fortran, what exactly is causing this?
  2 comentarios
Michael Listovski
Michael Listovski el 23 de Ag. de 2020
Editada: Michael Listovski el 23 de Ag. de 2020
As you can see, the ongoing debate is between compilers and interpreters. Think of it like this, compiler makes your code do what you tell it to do by creating an "exe" file that remembers what needs to be done, while interpreter tries to "interpret" what is the best way for you code to do what you want it to do.
A direct example is simply solving a linear system A X = B. To solve this in C/C++/Python you need to know how your matrix A looks like and what numerical math method is most suitable to solve it. On top of that you need to find a good library that has those math methods available and you need to handle the data types, You cannot write
X = inv(A) * B or
X = solve(A,B) or
X = A \ B
as in MATLAB (you probably could the second one). The solution in C++ would most likely look something like
array_data_type X = library_name::method_name(A,B,'very_specific_solver_type','option1','option2'...)
and you would need to take great care to match data type of A and B with the function call and understand some of the options. Of course, if you don't know what solver to use, you can always call the general one and that may cost you some time even in Fortran/C++.
Advantage of MATLAB is that it is trying to do a lot of "nasty" work for you in terms of syntax. You don't have to worry about data types or optimal methods, however that is also its biggest issue, because every time you run a script MATLAB needs to figure out what method and what data type to use that would solve your problem the best way.
For example, you are solving a differential equation by finite difference method given by matrix A in multiple time steps in array t, so you write a code in MATLAB:
A=make_fdm_system_matrix(parameters);
for i=1:length(t)
X(i,:) = solve(A,B)
end
It looks very simple right? You have some function that creates matrix A, and now you iterate through time and solve in each step and place the solution in X as two dimensional array.
Now think about how will MATLAB actually interpret this code. Matrix A is made however you wrote the function, loop starts and MATLAB sees that it needs to solve something by calling its own solve method. Bare in mind MATLAB now needs to check its path to locate the library and the code for solve function. Then as someone mentioned, 120000 lines of code need to be looked up by MATLAB to determine what is the best method to invert matrix A and multiply by B. On top of that, each variable you have needs to have the appropriate data type, so if you have 5000 x 5000 matrix there will be additonal time that MATLAB needs to figure out what to use. And you say ok, it took some amount of time s to solve it in the first iteration, what happens in the next? The same bloody thing. So if you have 1000 time steps, the execution time will be 1000 times s.
This is why MATLAB is so slow. You cannot have simplicity and efficiency at the same time. MATLAB uses horrible generalizations to simplify its syntax. Try typing isinteger(5) in MATLAB. It's a function to check if number is integer, the result of this function will be false, because MATLAB thinks every single number you write is double data type, simply because it's more general and easier to use. On top of that, MATLAB interpreter is not all powerfull. Say that your differential equation by finite difference method is tridiagonal and you are solving a 5000 x 5000 problem. The interpreter will not bother to check if your matrix is tridiagonal and will solve you system by some LU factorisation method that would take a lot of time for matrix of that size. On the other hand, everyone with some algorithm and numerical math knowledge knows of Thomas algorithm where instead of inverting 5000 x 5000 matrix, you can store its three diagonals in three arrays, write one for loop and apply direct formula to get a solution (note that general methods for inverting a matrix can never be done by writing just one for loop).
You cannot escape the more serious coding from different languages. You really need to know what method solves your problem best for more serious stuff. I for instance was having a minimization algorithm in which I was solving iteratievely a system of differential equations in which I was inverting several 3000x3000 matrices and one 1000x1000 using some eigenvalue solvers. In simpler terms, I had a system of differential equations, solved by finite difference method in a loop because I needed the solution to converge, then this loop was nested in minimization algortihm which is again another loop, so code was something like
while (minimization_converge!=1)
while (iterative_solver_converge!=1)
solve_3000x3000_system();
solve_3000x3000_system();
solve_1000x1000_system();
end
end
I did this by using armadillo library in C++. How much time do you think it takes to do this there? Less than 1 minute! MATLAB will never ever achieve such speed. Why? Well think of the previous example, everytime you write solve_system, MATLAB needs to look it up and this takes time and more loops you nest, the slower your code is. In any language, you have to choose the best solver straigth away. Note that nesting is just one of the issues MATLAB has. Another terrible thing that MATLAB does is its workspace. It is great that you can look up any variable your code makes and you can easily debug your code, however no one has infinite amount of memory and if you ever checked in task manager how much MATLAB is using, then you know how bad it is and this slows done both your code and your own machine. You can make your MATLAB code run faster by knowing exactly the options in functions that you are using (to spare the interpreter the trouble), but the memory use and the data type conversion that needs to happen in the background of your code will significantly affect your code performance.
Don't get me wrong, I use MATLAB all the time. It is amazing in visualising simulation results, running simple codes as the MATLAB syntax is basically a psedo code and I would not need time to think about code construction and compilation as in C++.
However, overall advice, if you are nesting a lot for or while loops in MATLAB and using some of its functions within, your code will be slower and slower and you should consider switching to other language.
EDIT: The only problem with different language is, to implement the code above, you pretty much can write exactly something like the above and it would work in MATLAB. In C++ however, I needed to write a separate class that will deal with LAPACK and BLAS routines to solve differential equations and those 3000x3000 systems, than a seperate class to deal with how Armadillo solves 1000x1000 systems and all this was happening in the class that was running that inner while loop. The outer while loop is done by using GSL Brent minimization algorithm, and if you google it, you'd find out that GSL needs very specific way of formulating a problem before you use some solvers - you need to provide the function of the problem in this form: function(double x, pointer parameters) and than use the solvers. What you won't find easily online is how to make pointer for your parameters, most people do it as a structure. I made a class and parameters were its fields, put GSL function as the static method, and then made a method in that class that is using the static method and as pointer parameters sent "this" to it and than used GSL instrucitons and appropriate solver. It is a very elegant solution, but the code I needed to write needed advanced knowledge of C++ and it looks very complicated in comparison by equivalent solution in MATLAB.
So to conclude, estimate complexcity of your algorithm first, if it's not too complicated, implement it in MATLAB, if you're not sure, implement it in MATLAB and check how long it runs. If you're lucky, it will give you results in decent time, if you aren't, you simply have to learn another language, the essence of the algorithm you wish to implement would not be much different than approach in MATLAB, however the syntax and use of different specific libraries and their functions will force you to get much much more into those languages, but with great effort, there will be a great reward in terms of performance.
Snoopy
Snoopy el 23 de Ag. de 2020
Thanks a lot for this elaborate answer. It is a very contributing explanation.

Iniciar sesión para comentar.

Respuesta aceptada

Jan
Jan el 24 de Sept. de 2017
Editada: Jan el 24 de Sept. de 2017
No, Matlab is not a "compiler", but an "interpreter". A compiler converts the source code to an executable file, which is not readable by human anymore. When working with an interpreter, the readable source code remains the base of what is executed. But even in Matlab the code is interpreted and optimized, here by the "JIT accelerator". This removes e.g. redundant code and re-orders the commands, if it improves the processing speed. But the accelerated code is not written a the disk, but stored in the RAM only. It is debatable, if this can be called "compiling" or not.
In opposite to Matlab, FORTRAN compiles the source code into an executable file. This takes some time, but it is done once only, while Matlab interprets the source code each time it is loaded the first time in a Matlab session (or after a user called the bad clear all).
In fact some code is processed faster in FORTRAN or C compared to MATLAB. MATLAB's JIT accelerator could work more efficiently, I assume. But as usual I point out, that MATLAB as other programming languages is not designed for benchmarks only, but to solve problems. The time to solve a problem consists of different parts:
total time = design + programming + testing and debugging +
documentation + run time
When I create a tiny function for a linear algebra problem:
x = B \ (A * b + c) % A, B: Matrices, b, c: vectors
this can be done very compact in MATLAB. Internally very fast C/Assembler routines of the BLAS libraries are called to perform the calculations. You can call these functions from C or FORTRAN also, but the calling is much more complicated and prone to bugs. Therefore I assume, that programming and debugging will take more time in FORTRAN and especially in C than in MATLAB.
  2 comentarios
Cedric
Cedric el 24 de Sept. de 2017
Editada: Cedric el 24 de Sept. de 2017
To complement Jan answer (the 1st version), whether MATLAB or FORTRAN is "faster" is really application-specific.
In MATLAB, many operations add an overhead to the execution time, not only because of the "compiler vs interpreted" difference (and to be honest, JIT-accelerated is much better than just "interpreted"), but because MATLAB performs extra operations such as checking that what is passed to functions is appropriate, sometimes performing some analysis to determine the best internal method for accomplishing the operation.
A good example is the backslash operator (linsolve). While it looks trivial ( x = A \ b ), it is the outcome of ~120,000 lines of code (according to Tim Davis if I remember well, EDIT: for sparse matrices, here, p.147), and chooses the best method depending on the nature of its inputs (symmetric vs non-symmetric, etc).
This should make you realize that while the overhead introduced by MATLAB for analyzing A when you solve a specific linear system makes it slightly slower than if you call directly the most appropriate function in C/FORTRAN, the amount of time that you will need to acquire the body of knowledge for understanding which method is optimal and for implementing it efficiently will make the "time to solution" much longer using C/FORTRAN than using MATLAB. In addition, when you gain insights about the various types of matrices and methods, you can exploit e.g. LU or Cholesky decompositions wisely to accelerate the process.
This brings me to the fact that often the time to solution is what matters, and the run time is only a fraction of it. Then the question is which overhead is the most important, the one at run time, or the one associated with learning/programming?
Cedric
Cedric el 24 de Sept. de 2017
Isn't that funny that we both thought about \ ;-)

Iniciar sesión para comentar.

Más respuestas (2)

Image Analyst
Image Analyst el 24 de Sept. de 2017
MATLAB has a compiler: http://www.mathworks.com/access/helpdesk/help/toolbox/compiler/ but it is an extra toolbox that you have to buy. It compiles your source code into a standalone program that others can run without having to buy MATLAB. As I understand it (and use it), it does not speed up the programs over running source code in the development environment.

Snoopy
Snoopy el 24 de Sept. de 2017
From the answers above, I understand that a compiler and interpreter are two very different things, and therefore it does not make (much) sense to ask if MATLAB is a compiler or interpreter. I will now summarise what I understand an interpreter is. Please warn me for my possible misinterpretations, and feel free to edit my message so that we do not take another round to come to a conclusion (if you are able to edit my message). MATLAB is an interpreter and interpretation is about deciding on how to carry out a calculation. This said, MATLAB could be slower according to some benchmark but this is most likely because it allocates some time to select the most robust calculation method before carrying out an calculation. The definition of a compiler is given above. But I am still not so clear whether or how it relates to interpreting a code.
  4 comentarios
Walter Roberson
Walter Roberson el 24 de Sept. de 2017
MATLAB is a Threaded Interpreter. "In computer science, the term threaded code refers to a programming technique where the code has a form that essentially consists entirely of calls to subroutines." (wikipedia).
When code is scanned in, data structures are created that represent the code; those data structures are pretty much machine independent, and describe things like which parameter number to fetch a value from, and which subroutine calls need to be made. These data structures are not in assembly language: they use internal codes meaningful only to the MATLAB interpreter. The "MATLAB Compiler" is mostly the process of doing this kind of parsing and writing the resulting data structures out to files. (To emphasize, MATLAB Compiler does produce a machine-language executable but the executable is the part of MATLAB without the parser, the part that knows how to interpret the data structures.)
Executing MATLAB consists of an execution engine examine the "current" location in the data structure to figure out what to do, and making appropriate calls, and choosing what the next location in the data structure will be executed.
Interpreted code is almost always slower than compiled code.
However, there are a number of operations in MATLAB where MATLAB detects that it is working with "large enough" arrays to be manipulated according to pattern that it recognizes; when it detects this, instead of doing the calculations itself, MATLAB calls into a high-performance already compiled library. So not everything in MATLAB is interpreted. If it happens that the code is such that most of the time is spent in the high-performance routines, then the performance of MATLAB can approach that of C/C++ calling the same high-performance libraries.
"Interpreter" is not, in itself, about deciding how to handle a calculation; "interpreter" is more about using data structures to represent what needs to be done and making calls to pre-written C/C++/Fortran routines to do the real work. This has some advantages in debugging and rapid code changes and in portability.
Star Strider
Star Strider el 25 de Sept. de 2017
@ Walter —
If I could give a separate vote to your Comment here, I would.

Iniciar sesión para comentar.

Categorías

Más información sobre Write C Functions Callable from MATLAB (MEX Files) en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by