Testing and Analysis

I first carried out a test for correctness. The make_lp function generates a new random instance of LP problem. The input to the function (in1,in2,in3) are the dimensions of the A matrix and the sparsity of A matrix respectively. The presolver function generates the smaller LP problem , which is fed to solver and then postsolved into a feasible optimal solution for the original problem.

After several runs of 100 iterations each, I haven't found a lp instance which got a different solution with the presolve. The final message - “Presolve works subject to randomized testing” indicates that in each run every iteration resulted in a positive result.

There is some manual controlling of the lp instances and the solution. Although I am generating feasible optimal instances, there is a possibilty of alternate optimal solutions to the same problem and to prevent this I add an extra constraint to force into our desired solution. The lowerbound of every variable is set to the original x we used to generate the b vector

The main function in the notebook that is executed is the do_tests function with two boolean inputs. The first boolean refers to correctness tests and the second refers to time-profile tests. Setting any of the fields to true carries out the respective tests.

Inside the do_tests function, one can change the input to the both types of tests and play around to check the script.

The time-profile test is run by setting the corresponding boolean input field to true in the do_tests call. It calls the function time_test on inputs that scale by a factor of 10 to measure how the time cost grows.

The time_test function generates one random instance of LP according to the given inputs and measures the time taken for the GLPK solver with presolve turned on and compared against the time taken by the Presolving procedure with the internal solver presolve turned off.

Analysis

This is a sample output

Size Sparsity GLPK with Presolve GLPK Memory Allocation Overall Presolve Time Overall Presolve Memory Allocation singleton_rows! make_new Size of new problem
10 x 10 0.30.000586 490 : 13.750 KB 0.001928 1.36 k : 45.297 KB 0.000027 0.000044 7x8
100 x 100 0.010.002148 4.27 k : 99.844 KB 0.003610 3.28 k : 118.078 KB 0.000242 0.000624 46 x 49
1000 x 10000.001 0.018370 48.93 k : 1.036 MB 0.040094 19.05 k : 905.125 KB 0.018984 0.012776 329 x 326
10000 x 10000 0.00010.186067 552.94 k: 11.304 MB 5.038997 225.14 k : 23.665 MB 2.090054 2.861917 3281 x 3266

The time-profile can be viewed in the IPython notebook Presolvnb and we can see that a major chunk of time is spent in singleton_rows call and the make_new function. While both of these are expected, we still need to reduce the time taken by the singleton_rows function. Initially, with the rowindices approach the time taken was several times more , so bit vectors have helped. Infact, make_new takes far too much time and its really just making the smaller problem data, there could be a more efficient way.

So far, our presolve works best on sizes less than 100,000, It has lesser memory allocations and lesser garbage collection as compared to the GLPK solver. 100,000 is a significant number that I have mentioned as that's the size of the input problem at which GLPK takes 0.5 secons.

Ideally, our presolve should become more important as the size of input scales, but at input sizes of 100,000 , the time-taken by the solver is still very less and we end up doing much more effort in finding the redundancies. This is not good. Another point of concern is the GLPK solver seems very finely tuned and well thought out. It works efficiently even on input size of 100,000.

One thing to note though - I have never personally used solvers for problem instances this big, maybe their sparsity is a lot different than what I imagine. The ENRON email dataset has a sparsity factor of 0.00013 on 33,000 nodes. I guess there are no real underlying rules that these data-sets need to abide. I took a thumb rule of the sparsity reducing by a factor of 0.1 as the data size increases by a factor of 10. I can't scientifically justify this decision but it seemed reasonable and for 100,000 and more it was within the capacity of my local RAM. I have a hunch that the sparsity factor in real-world datasets infact change by a factor more than 0.1

Also, technically speaking, I have conducted all my tests on square matrices where the number of variables are as much as the number of constraints. This seems like a reasonable starting point for our analysis, but quite possibly in the Industry (or amongst users), they can be very imbalanced. I am not very clear on the algorithmic procedures used by the solver either, so they might also be sensitive to imbalanced dimensions.

The profile view seems very convenient but I think someone with more experience in the Julia Community will be able to spot other places in code that are less than ideal.

There are multiple directions in which we can focus the speed improvement -

  1. Optimizing my coding style and other tricks that gaurantee better performance in Julia

  2. Mimicking more of GLPK's implementation

  3. Algorithmic changes

Some topics that I haven't touched (which could improve performance) due to my lack of knowledge - GPU vs CPU and Parallel Processing, Working at a lowerlevel of Julia Internals, Using ccalls.

There are some specific isues that I think I should be able to improve very soon. Every call to a function that finds and marks redundancies traverses through the entire list of non-zero entries. We could instead have a single pass through each non-zero element and mark the redundancies. But, doing a single pass would mean we are neglecting some redundancies as removal of a row singleton could lead to other row singletons. But maybe we will have to sacrifice those redundacies for overall performance.

In GLPK, this seems to be the method followed. They process elements from a queue of nonzero entries and resolve and detect on each element.

Another issues is having views of a matrix vs creating a new matrix. If after the calls to empty_row and empty_col , we could have a reduced bit matrix which is fed to the singleton function, this could be a speed-up as compared to passing the whole original problem everytime. But how would we pass a reduced matrix? We will have to delete rows and columns that were empty (a commonly occuring redundancy) , but I'm not quite sure how that works in Julia. So far, my stack overflow searches tell me that this feature in MATLAB actually results in lower performance and that it isnt there in Julia. One workaround is contructing appropriate slices and doing hcat or vcat but this could lead to heavy memory allocation. The other alternative is having Array Views as a feature and accessing only selective indices of a matrix. Resolving this could lead to a speed improvement, why? The initial starting problem might be 10000x10000 but it can be as small as 100x100 after removing the first few redundancies, so passing dynamically smaller problems to more complicated redundancies makes sense. More needs to be done on this.

This entire journey has been a little humbling. Andrey Makharin, the Russian Prof who wrote and maintains GLPK , obviously seems to have been through all these stumbling blocks and most of my attempts to fix them are very close to the actual implementation in GLPK. Which makes me wonder if I reinvented the wheel in a few places. At this point, I think surpassing his performance would be a pretty sweet achievement!. I can see why he and the many other authors in other packages haven't exhaustively found redundancies because ultimately it is a tradeoff on time taken to find a redundancy and time taken to solve a bigger original problem. Also, from what I understand, CPLEX is the industry standard and has well established presolver, so it might be performing a lot quicker than GLPK which is itself an open source effort in C.

The closer I move to his implementation the more I feel the need to differentiate. We already have a wrapper for GLPK and access to its presolver, so I hope by the end of the project, I have more to show as positive than just a native Julia implementation.