LeetCode – Max Points on a Line

Introduction

Max Points on a Line is the third least accepted (against submitted solutions) problem on LeetCode; it is also the second hardest problem in the ‘hard’ category. At first glance, the problem seems simple – from a set of points, determine the longest straight line that can be formed from those points and return the number of points in the subset. However, as the saying goes, if you assume it makes an ass out of u and me.

This is my write-up detailing the approach I took to solving the problem and the issues I encountered along the way. Lessons learnt can always be taken from one context and applied to another – hopefully you find this useful.

Approaches

I explored two options when it comes to solving this problem, both explained below. An ‘agricultural’ approach which I quickly abandoned, and a ‘formulaic’ approach that did work but encountered some interesting issues.

Whilst I won’t be posting my solution in this article, you can find my implementation on GitHub (linked in the Social Banner above).

Agricultural

Having forgotten some high school mathematics – my brain instantly went trying to understand how an algorithm based on scanning and or recursion / iteration could solve the problem. Could I determine the ‘step’ between two points and then for each remaining point, see if I continue the step pattern whether or not the point in question would be reached.

The algorithm quickly became complex – you would obviously need to check the step in both positive and negative directions (reducing performance), you may want to start finding the point in the line that would be closest to the point your analysing to slightly improve the performance, etc.

Good problem solving doesn’t mean coming to the correct solution first time – it’s also recognising when an approach isn’t working and looking for another way. y = mx + b.

Formulaic

The straight line equation in its well known form will determine y for a points associated x where the slope (m) and y-intercept (b) of the line are known.

My initial approach was to take two points in an existing line (any two points can be the start of a line) and determine their slope m = ( (y1 – y2) / (x1 – x2) ). The remaining variable to determine was the y-intercept which with the formula rearranged stated b = y – mx. I knew the slope from the previous calculation and I know x and y from an existing point so I could determine b, the y-intercept. The final step was to determine if for the given x of the point being analysed, given the y-intercept and the defined slope, did the resulting y == y of the point being analysed.

The approach worked – I went from passing 50% of the test cases using the agricultural approach to passing approx. 90%. But what was causing the final 10% to fail? Precision.

In day-to-day life as a Software Engineer, precision doesn’t usually impact you unless you’re working on financial systems, sensor based system, and other similarly complex, numerically focussed applications.

My solution was based on JavaScript which by default represents all floating point numbers as 64-bit floats. This means a number with a fractional component can be represented up to 17 decimal places, and even then may not be accurate. Both of these limitations caused issues.

The following test case enabled me to identify the issue:

[[0,0],[94911151,94911150],[94911152,94911151]]

Whilst the final two points are very close to each other, when on a slope with 0,0, all three do not form a straight line. If we try to determine the slope of the line containing the first two points, we would calculate -94911150/-94911151 – the answer according to the JavaScript engine is 0.9999999894638303. To determine the slope between the first and third point, we would do -94911151/-94911152 – the answer according to the JavaScript engine is 0.9999999894638303. Brilliant! All three points are on a straight line! But they’re not…

Use a high precision calculator and you’ll see that -94911150/-94911151 = 0.999999989463830230022181482131641201991112719726684170124541003617161907 and -94911151/-94911152 = 0.999999989463830341033053734296682016882483946670460811601991723796588202.

You can see around digits 16/17, the answer begins to vary… the slope is not the same. However, due to JavaScripts (although this problem isn’t limited to native JavaScript) precision limit and accuracy issues (i.e. rounding) mean that for this problem, I am unable to correctly to determine the slope. So, how do you solve the problem?

Initially I knew I had a precision error, but I wasn’t sure whether it was determining the slope, y-intercept, and y value for the cross-comparison. My first thought was do I need to determine the y-intercept and the y value? Surely if two points are on the same slope as two other points on the line, we’re good. Having removed the other elements, I noticed the precision issue above.

My immediate thought was that by using 0,0 as a comparison point, we were working with an answer for the slope that was extremely precise. If another point was used, the result may not require the same precision. If points 2 and 3 form the input were used to determine the slope, the result would be 1 (-1/-1). As 1 != 0.9999999894638303 – we pass the test case. This solution worked for all test cases and the ultimate acceptance of the solution. However, it’s obvious to see for test cases that result in a similar precision no matter what points are selected, the solution would encounter problems.

Analysis

LeetCode provides some great solution analytics – at the time of solving the problem, my solution execution time was 79% faster than other JavaScript solutions. My assumption is that that these solutions may have taken more agricultural approaches.

There are a few key takeaways from working through this solution:

  1. All programming languages have limitations (of which some can be solved with third-party libraries – in this case, several BigNumber libraries). Whilst they may not impact you day-to-day, if you’re encountering ‘strange’ issues, they’re good to have in the back of your mind. Worse, often the runtime won’t throw any errors to tell you it has automatically rounded a result or is not accurate past a certain point.
  2. Formal Specification techniques are used in critical systems to at runtime ensure code is behaving according to some mathematical constraints. Whilst formal specifications may not be necessary for all projects, there’s nothing wrong with having code to check the validity of results. For example, if adding two decimals both with two digits following the decimal point, the result won’t have more than two digits following the decimal point. If it does, you know something has gone wrong.
  3. Understand what the minimum required steps are to solve the problem – my initial formulaic approach trying to understand the y value was adding complexity to the solution and making it harder to debug. I only needed to calculate the slope to solve this problem.
  4. Another bug I encountered was due to not fully understanding the specification – my assumption (the second time I assumed wrong in solving this problem!) was that the points had to be consecutive according to the step function. This wasn’t the case and was not clearly stipulated in the problem description.

Conslusion

This problem was great fun to solve. Day-to-day in the workplace, my clients are using software in ways were these sorts of problems do not manifest themselves. But they exists – it’s important as a Software Engineer to be aware of them and to challenge yourself in ways that you may not be challenged in your day job (not that it lacks its own unique set of challenges!).

Apart from the usual cliches such as testing is only as good as the test cases and not fully understanding requirements is the downfall of many IT projects, I will take two ideas from this exercise:

  1. Testing can be built into your code – if the result of performing an action is that n number of things should be in a list (without knowing the contents), add some code to check that n things are in the list. If not, an issue must have occurred (either data or code)
  2. Understand what the minimum required steps are to solve the problem – initially you may just want to solve a problem, with the result not being the most elegant solution. Even if you don’t encounter issues, refactoring is important. It results in less instructions, potentially leading to less bugs in future; for those bugs that do occur, the solution will be easier to debug.