Synaptic Knowledge – Making sense of Twitter

Twitter is a platform of over 340 million users, producing over 500 million tweets each day. Even just an insight into 1% of those tweets has the potential to provide a decent understanding into what’s happening in the world. If something is in the public domain, it’s on Twitter.

This post explores a technique to digest tweets down into a data structure that allows for user interaction, breaking story identification or even brand sentiment analysis.

The process begins by processing data from Twitter – for-which there are a number of approaches.

Data Processing

Data can be processed in many ways – two common to analytical processing are Batch & Stream Processing. At a high-level, the distinction is that with Batch Processing, the dataset for processing exists before processing begins. With Stream Processing, the dataset is not known ahead of time but instead arrives ‘bit-by-bit’.

Batch Processing

Batch Processing is the most common technique deployed for analytical workloads – perhaps each evening you want to take the days sales from your store and identify trending products, or perhaps you want to analyse the output of a collection of sensors following a rocket test to understand mechanical stresses that are felt across the vehicle. Batch Processing takes a defined amount of data as input at a specific time (t) and performs a series of actions upon it to create an output after-which processing ends.

However, sometimes we don’t want to wait for the entirety of the dataset to be available before we start processing it. Perhaps it’s not possible to have the entire dataset available prior to processing as the data does not yet exist. Regardless, the questions we ask of our batch datasets we could also ask to a more realtime flow of data – this is achieved through Stream Processing.

Stream Processing

Whilst not necessarily a new approach to data processing, Stream Processing is the processing of data whereby the dataset is not a known quantity. There could be 5 pieces of data to analyse or 5,000,000; 3 pieces of data could arrive each second most of the time, and at other times 5,000 pieces of data could arrive each second. Stream Processing allows us to process data as and when it arrives in a realtime manner.

In the rocket example, we analyse the sensor data as values are produced, not once the test has finished and all sensor outputs collated. This allows us to make decisions during the test as opposed to afterwards which could be useful if we’re looking to avoid an unplanned disassembly!

With Batch Processing, as the dataset is known ahead of time, the input can be split and assigned to compute resources ahead of execution – an execution plan can be created (if interested, read into MapReduce). With Stream Processing, we’re a lot more reactive and as such these architectures can often seem more complicated. However, as you should see in this post, that isn’t always the case and shouldn’t put you off.

Given we want to process tweets in realtime, it seems we need to implement a form of Stream Processing to meet our requirements.

Twitter Streaming

It turns out Twitter have an API to stream a 1% sample of tweets – the question then is given this information, how do we make sense of it?

Extracting Knowledge

I wanted to focus on two core elements when processing tweets – relevance meaning exposing words that ‘mean something’, and confidence meaning how relevant words come together to confidently outline a story.

I’m no linguistic expert, but let’s work through an example:

Fantastic goal from Mane this evening

From this tweet, the words ‘fantastic’, ‘goal’, ‘mane’ and ‘evening’ are relevant to understanding what’s happening. The words ‘from’ and ‘this’ whilst meaningful, are arguabley not as useful for my use case, they’re typically known as stop words. Furthermore, these extremely common words will be found in many tweets not relating to a goal scored by Mane, so it’s probably best we discount them in our analytics to avoid noise.

Secondly, confidence. If one person tweets that Mane has scored are goal, are we confident that he has? Probably not. If 50 people tweet Mane has scored a goal, I’d argue it’s likely that he has. This is the approach I have taken. There are obviously other techniques such as trusting some Twitter accounts more than others – much like how backlinks work within search engine indexing algorithms.

Correctness is also worth a mention, particularly in todays world. It’s not something I’ve tried to guard against in this piece of work as my primary goal is not to present correct information, just information that’s ‘trending’ on Twitter (at a level of detail that does not rely on hashtags).

Once we’ve received data from Twitter, we’re going to need a data structure to support our use case so we can programmatically record relevance and confidence.

Synaptic Graph

Again, that common data structure, the Graph, provides the mechanism to store the analysis. A visual example can be seen below.

The boldness of the words depicts how often the word is mentioned in tweets and the lines indicate an association between words that meets the given confidence criteria.

You will be able to see some stories in the above graph, but let’s look at some examples in further detail.

Examples

Unfortunately, the week of testing was not a particularly great one for the world and so I apologise for using such sensitive events in my analysis.

Nice, France Attack

The recent events in Nice, France appeared in the analysis. Initially ‘nice’ and ‘attack’ became apparent on the graph, swiftly followed by more details of what was happening on the ground as people began to tweet.

You can see from the boldness of the text that we’re pretty confident there’s been an attack in Nice, France and that the Police are involved. Details are emerging that it could be terrorist related and that the police are associated with a shot. However this exemplifies an issue with this data structure – it appears the police have shot someone dead.

It may be that early tweets were suggesting that the police had shot someone dead and the correctness issues outlined earlier becomes apparent. Or perhaps the tweets just contain information about the police attending an incident where people had died and the police had fired shots. The graph records useful, relevant information, but it isn’t a source of truth.

US Election

As you would expect, the US election is accounting for a large quantity of tweets at present.

Labour Party

The recent EHRC report into the Labour Party reported that Jeremy Corbyn was suspended from the party.

Depending on the tweets provided within the 1% sample stream, you can end up with separate graphs which whilst related in the real world, have not yet been connected through analysis. This can be seen opposite. As processing continued, a connection was formed between these two graphs.

Conclusion

I mentioned at the start that Stream Processing doesn’t have to be complicated – this proof of concept used client side Javascript and Google Chrome to open up a persistent HTTP connection, processing 70 tweets per second. If you’re trying to solve a data analytics problem, don’t feel it’s out of reach and you need to stand up an Hadoop cluster. Start small and you’ll be surprised at how much you can prove and achieve.

For me, this project will continue and I’ll report back on future versions. My efforts will focus on weeding the graph of old news over time, refining the deletion of stop words and perhaps overhauling the UI altogether. If you have any ideas, please let me know.

Why the UK Coronavirus Contact Tracing App will generate over 40TB worth of data and could cost £3mil. A Blueprint for a Contact Tracing Application.

Disclaimer: I am not working on the UK Coronavirus Contact Tracing App – this is my own analysis and thoughts.

Contact Tracing apps are appearing in the news almost daily – they’re seen as one of the key enablers to reducing lockdown measures. But how do they work?

If these apps are going to be the number 1 app in App Store’s over the next year, I think it’s important people know how they work. This post attempts to offer you an explanation.

I am not working on any contact tracing applications but I have the upmost respect for those that are – as this article will highlight, this isn’t about software engineers sitting down at their keyboards. This involves the collaboration of politicians, health professionals, law professionals, engineers (across hardware and software), and more. Thank you.

So how do you collect 40TBs worth of data and spend £3m in the process?

High-level Architecture

Architecture 101 will teach you about the 3 Tier Architecture – at the top you have a Presentation Tier interacting with your users and at the bottom you have a Data Tier storing the data generated by the system. In the middle you have an Application Tier plumbing both layers together.

My theoretical Contact Tracing app has the following architecture. Don’t worry – each box in the diagram below will be explained throughout this post.

Presentation

The Presentation Tier is the window into the Contact Tracing app for end users – whether that be you and me on our phones or professionals using dedicated tools. There are 3 core components:

  1. Smartphones – the primary tool in determining whether 2 people have come into contact and upload this information to a central server. It also allows users to report any symptoms they may experience to warn other users that they need to isolate.
  2. Dedicated Devices – for those that do not have smartphones, cheap devices with extremely long battery life can be distributed to also track person-to-person contact.
  3. Kibana – a tool to enable the professional community to analyse the data collected.

The remainder of this section explains these components in further detail.

Collecting Contact Report Information

A Contact Report is data describing the coming together of 2 people. But how do you know if 2 people are near each other in an automated, omnipresent way? Radio.

Smartphones use a lot of radio – when you make a cellular phone call, send a text, stream Netflix over your WiFi or download health data from your smartwatch via Bluetooth. But which radio technology is best suited to determine when 2 people are near each other?

Bluetooth LE

Bluetooth is the technology of choice – it operates in the 2.4GHz radio spectrum but at a much lower power than cellular and WiFi meaning it’s 1) friendlier to your battery, and 2) localised. If we can use Bluetooth, what information do we need to transmit to determine whether or not 2 people have passed each other?

We may be familiar with the terms IP, TCP, etc. these define a stack of protocols that allow us to send data across the Internet. But they’re not applicable everywhere – they’re quite heavy. Transmitting data in a Bluetooth environment does not have the same complexity as transmitting data over the Internet. Just as motorways have barriers, emergency telephones, etc. the street you live on doesn’t – different protocols are used in different environments. In Bluetooth, the important protocol to discuss is GAP.

GAP defines 2 types of devices, a Central and a Peripheral. A Peripheral has some data to offer – your smartwatch for example is a Peripheral in that it can tell your phone what your heart-rate is. The device looking for this data is therefore the Central. This relationship doesn’t have to be read-only, centrals can also write. For the sake of a contact tracing app however, it only needs to read.

A Central device is made aware of peripheral devices through advertisements packets – they’re like the person outside the airport holding your name up on a sign. The Advertisement Packet can merely inform the central of the devices presence, or it can contain additional information such as a name, the services it offers, and other custom data.

We can start to see how this may work – I’m walking along the street and my Bluetooth radio is listening across the various Bluetooth advertisement RF channels (of which there are 3), looking for other devices. Given the power at which Bluetooth signals are transmitted by the antenna on a device, it’s safe to assume if you pick up an advertisement packet, you’re within a stones throw of the person (ignoring walls, etc. – a concern raised regarding the reliability of contact tracing apps). We’ve detected an advertisements packet – great! How do we turn that into something useful?

Other contact tracing applications such as CovidSafe will connect to a device upon discovering a peripheral via an advertisement packet. Once the connection is made, it will read data from the device. This requires a fair bit of radio communication which would be nice to reduce. Furthermore, if the 2 devices can’t connect because the receiver signal strength is below the radio sensitivity (after all, they are walking away from each other and Bluetooth is short range), we’ve lost the contact even though we knew they were in the area as we saw an advertisement packet! Can we include some identifying non-identifying information in the advertisement packet that maintains privacy and reduces radio communication?

Every Bluetooth advertisement packet is sent with a source and destination address. Imagine you had the address 123, if somebody else knew that, they’d have a way of tracking you within a 15 meter radius over Bluetooth. That’s not good. To prevent this, the Bluetooth LE spec recommends periodically changing the address to avoid the highlighted privacy concerns – which Bluetooth chip manufacturers thankfully abide by. So we can’t use the Bluetooth address to identify a user as it may change. What other options do we have in the Advertisement Packet? (Identity Resolving Key (IRK) is a mechanism to remember devices – i.e. so you don’t have to keep reconnecting your watch!).

A developer can add up to 30 bytes of custom data to a Bluetooth advertisement packet – that data can be categorised inline with the Bluetooth specification. Within frameworks such as Apple’s Core Bluetooth, developers are limited to setting a device local name and a list of service UUIDs. Each Bluetooth application on a users phone can transmit different advertisement packets. By setting the device local name to an ID that means something in the context of the wider contact tracing application, we’ve a way of identifying when 2 people have come into contact. That thing is a Contact Token.

Contact Token

Every device in the contact tracing ecosystem has a unique identifier, often known as a Device UUID. This is a static ID – mine could be 1234. It contains no personal information but is unique to me. That’s great, but I can’t advertise that indefinitely or like the problem Bluetooth is trying to solve with the ever changing addresses, I can be tracked! This is where a Contact Token comes in.

A Contact Token is a somewhat short-lived identifier (couple of hours) that the Contact Tracing app knows about (i.e. it knows what user is using the token) but that other Bluetooth devices only know about for a couple of hours before it changes (therefore meaning you can’t be indefinitely tracked). You may recognise someone in a crowd from the clothes they’re wearing, but when they change their clothes the next day, you’ll have a hard time spotting them in the crowd.

Each device advertises a Contact Token once it has registered it with the application server (more on that later). When a device receives an advertisement, it informs the server that it has come into contact with the token, sending the token of the remote device, the local Device UUID, and a timestamp. On server-side, the contact token is correlated to the remote Device UUID and stored.

To prevent the user from being tracked, the Contact Token must be refreshed. But we’re talking about 48,000,000 people – we can’t do this every minute, the Transactions per Second (TPS) would be too high (think of TPS as frequency – I can ask you to do a push-up every second for 10 minutes, but you won’t be able to keep it up for long, I’d need to lower the frequency). If we change the token every 3 hours, we achieve a TPS of 4,000 – acceptable.

So that allows us to send a Contact Report to the contact tracing app backend systems and respect privacy – but when do we send these reports? As soon as they occur?

Sending Contact Information

Once we’ve identified a contact, we need to send that data to the server. But much like the TPS issues identified regarding the Contact Token – when sending contact reports, the frequency is increased by a factor of 10! Why? We walk past a lot of people each day!

In a typical day at work, I would imagine I walk past at least 100 people. A typical walk to work takes 10 minutes and I probably walk past a person every 10 seconds. That’s 60 people and the day hasn’t even started.

If there are 48,000,000 people utilising the app daily – you can imagine the volumes. ‭4,800,000,000‬ contacts per day across the population. Not only that, they probably occur over a 12 hour period between 0700 and 1900.

That’s a TPS of 111,111… ouch! No system can handle that. How can we reduce it? Batches.

Apple and Android support background execution of applications, however to preserve battery life, there are limitations. Whilst you can’t ask your app to do something every 2 seconds, there is support for Bluetooth ‘events’ – whenever an advertisement is received, your application can process it in the background. As contacts are discovered, we can add them to a cache and once that cache reaches a certain size (let’s say 50), it can be flushed to the server. This would result in a TPS of 2,222 – acceptable.

However, there are drawbacks. What if we have contacted 49 people and are then at home where we see nobody – those contacts will not be flushed to the server until the following morning when we venture outside and walk past 1 more person – this could result in delayed isolation notifications as the central system does not know of contact reports. Whilst some of these contacts may have been registered by the other person (you see my advertisement and I see yours), they may not have. Is this acceptable?

How do we handle contacts from coworkers and family members whereby we’re with them most of the day? To reduce the load, as Contact Token are replaced every 3 hours, we can cache the token and if we have already encountered it, refrain from sending the contact to the server.

Importantly, these decisions are not just technology based, they require input from politicians, health professionals, and more. Furthermore, they may be dynamically tuned during the live operation of the application.

Reporting Symptoms & Receiving Warnings

When a user reports that they have symptoms, all contact with that user in the past N days will be retrieved from the database. Each contact will then receive a notification (i.e. via the Apple Push Notification Service and Android equivalent) informing them to stay at home. How far we distribute these notifications is largely based on the R0 of the virus – the average number of people an infected person will infect. You can see a very simplified probability tree below where a single person is infected in a population of 13; infections can only traverse the lines between the circles. Furthermore, it is true that R0 is 1 for each population of 4 people (i.e. in a group of 4 people where 1 is already infected, 1/3 of the uninfected people will be infected).

At what breadth do you stop sending isolation warnings? Health professionals would have to decide, based on the R0 of the virus, at what probability they’re willing to stop sending notifications (too many notifications and people won’t trust the app is reliable). The R0 in the UK is approx. 0.5 and 1.

Apple & Google Frameworks

One of the main issues I see with current implementations of tracing apps is background execution. An app is considered in the background once it has been opened and the user then returns to the home screen without ‘swiping up the app’ (on iOS). However, many users frequently close their open apps, meaning the app will not be in the background and listening for or advertising packets over Bluetooth. This is where I would like to see improvements made in the frameworks Apple and Google are currently working on (although they’re taking it a step further).

Dedicated Device

What about those who do not own a smartphone? How might they participate?

As the name states, Bluetooth LE is Low Energy – embedded devices running off coin batteries can last for days to months. A potential solution therefore is a cheap, embedded device that can be distributed and integrated with the system.

I have created a basic proof of concept using a SmartBear BLE Nano board which can be seen below.

This device only has Bluetooth capabilities (to keep power consumption at a minimum), so how does it upload contact report information to the server and how are owners of these devices informed when they’re asked to isolate?

We know that receiving an advertisement packet is the trigger to upload a contact report to the server – in the smartphone example, given one of the devices receives an advertisement, the contact will be uploaded to the server. But in this case, if only the embedded device receives the advertisement, the contact won’t be uploaded to the server as there’s no radio providing internet connectivity.

A potential solution is to cache these contact reports and only when the chip can maintain a solid connection with a smartphone does it transfer this data to the phone which relays it to the server (ensuring to take care of man-in-the-middle attacks).

What about receiving isolation warnings? As is explained later on in this post, users verify their accounts via SMS. When a user receives their embedded device, they register it online, providing the system with a telephone number. A message can then be sent to the number if they are required to isolate.

So that’s the process from contact to sending the contact report to the server. Once the data is persisted centrally, we need a mechanism to make sense of it. Kibana.

Analysing the Data – Kibana

Kibana is a data exploration, visualisation and discovery tool. In other words, it allows people to make sense of large quantities of data in a somewhat user-friendly way.

Kibana: Explore, Visualize, Discover Data | Elastic

Utilising Kibana, professionals in their respective disciplines can slice the data to understand a myriad of metrics that will aid in the decision making processes to enable the country to return to normal in a safe and controlled way. It can help answer questions such as:

  1. Where are infections occurring?
  2. Are people who receive isolation warnings actually isolating? (i.e. is our strategy effective)
  3. R estimation validation
  4. Immunity – are people reinfecting?

Application

The Application Tier is what ties the data collection in the Presentation Tier with the persistent, centralised storage of that information in the Data Tier. This Blueprint focuses on a serverless AWS architecture given the execution environment (12 hours of immense usage followed by 12 hours of very little usage), however solutions in other cloud and non-cloud environments are possible.

There are 2 types of inbound transaction (ignoring sign up, etc.) the Application Tier must support:

  1. Registering a Contact Token – the user device must receive a response from the Application Tier before it can start advertising a Contact Token. It has a TPS of approx, 3,000.
  2. Report Contact – the user device informs the Application Tier of a contact between 2 people. It requires no confirmation. It has a TPS of approx. 2,000.

However before we send any of this data, we need to ensure it’s coming from someone that we trust.

AWS Cognito

Privacy is double-edged sword – on the one side it protects the user, but on the other side is has the potential to degrade data quality and subsequently user experience. If users can sign up for a service without verifying themselves in any way, those with malicious intent will take advantage. There are a number of ways to prevent this through verification:

  • Digital Verification – Email / Phone Verification, but also image recognition to match a taken picture against a driving license picture, etc.
  • Physical Verification – attending an approved location with ID

Given the environment, this solution utilises SMS verification – whilst the details regarding the owners of numbers is private, this information can be accessed through legal channels and may violate some of the security principles.

AWS Cognito is the PaaS Identity Management platform provided by AWS. It’s where users validate their password and in response gain access to the system.

AWS API Gateway

Once the user has authenticated with AWS Cognito and received permission to access the system (via an access token) – they can use this token to make authenticated calls to the API Gateway. The first transaction will be to register a contact token.

When communicating with the API Gateway to register a Contact Token, the message is synchronous meaning the user (or more specifically, the users phone) won’t advertise the contact token until the AWS API Gateway has said it’s OK to do so. The Contact Token will be stored in the database before a response is returned to the user. An AWS Lambda function will handle this request and is explained further on.

Unlike the synchronous contact token transaction, we don’t need to wait for the contact report to be added to the database before our phone can continue with its business. Rather, as long as the AWS API Gateway says it has our contact report and will handle it, we can trust it do so. This is known as asynchronous communication. This asynchronous communication can be mediated through the use of queues – and as if my magic, AWS have an offering – the Simple Queue Service (SQS).

AWS Simple Queue Service (SQS)

As has already been alluded to, volumes in a system such as this are huge. Even with batching contact reports (remember the groups of 50), we would still produce 96,000,000 messages per day. If each message takes 0.5 seconds to process, that’s 13,333 hours if processing them sequentially. We need a place to store contact reports so we can process them in parallel – this is where SQS and AWS Lambda come in. Queues allow work to be dropped off somewhere with a fast TPS to be picked up and processed by systems with a slower TPS.

For now, think of Lambda as the ‘code’ (known as a Lambda Function) that takes contact reports off the queue and stores them in the database. If we can run multiple Lambda Functions in parallel, we would reduce our elapsed execution time. You can execute up to 1000 Lambda Functions in parallel which takes our elapsed execution time to 13 hours. That seems to work, but can we improve?

If we take batches of 10, so 10*50 and assume 1 second to process, it would take 1000 parallel Lambda Functions 3 hours to clear the queue. Often time, that majority of the execution time is not the business logic, but the overhead of spinning up of the environment, etc. to process the batch.

Whilst the queues may be empty during the night, during the day the queues will ensure end user devices can send contact reports to the server, regardless as to how busy it is. Thankfully, AWS state SQS Simple Queues support an almost unlimited TPS!

Continuing with the serverless theme, processing of messages on the queue is performed by AWS Lambda.

AWS Lambda

Lambda is the logic engine of the Application Tier – 3 Lambda functions will perform the following:

Register Contact Token – in response to synchronous API calls through the API Gateway, this Lambda will store the token in the database.

Report Contact – through polling of the SQS queue, this lambda will take contact reports, resolve the contact token to a user and if not a duplicate (i.e. the other user has already registered the contact), add store the contact report in the database.

Notify Infection – upon receiving a verified infection report, this Lambda will inform all recent contacts that they meet defined criteria to isolate.

Health professionals will also need to decide what do with a user who has reported an infection and continues to participate in the population? This will require even more logic, perhaps when registering a contact to check whether the contact is already infected and triggering the notification process (perhaps ensuring those who were previously informed are not informed again).

Data

Finally, the Data Tier where we store 40TB worth of data.

Firstly, this section attempts to explain the difference between centralised and decentralised contact tracing apps. Secondly it explains the volumetrics – how do we get to the 40TB figure? Finally, it explains the database choice, Elasticsearch.

Centralised vs Decentralised

One of the main areas of contention when it comes to creating Contact Tracing applications is storage. Should all the data be sent to some central database owned by an organisation (the UK Government, for example), or should data remain on end user devices in a decentralised way (note decentralised does not mean Distributed Ledger or Blockchain!).

The main concern regarding centralisation appears to be privacy – do you want the government to know where you are / have been? Well, that’s a myth in my opinion. Given a design such as the one used here, even with a centralised model, the central organisation cannot easily track known individuals. At least not without going through existing legal channels to resolve telephone numbers to identified individuals.

With a decentralised solution – a device may maintain a list of all contacts. When a user reports they’re infected, this could be broadcast to all devices to check their local cache – if they’ve come into contact with that person recently, they will be asked to isolate.

With a decentralised model, data analysis becomes almost impossible. However, once the data is centralised, it comes incredibly useful to a wide range of professionals.

The UK Government is aiming for a centralised model, and I couldn’t agree more.

Data Model & Volumetrics

What data are we going to be storing? We essentially have 2 data types:

  • Contact – this is a user, it will include information such as their User ID, whether they’re infected, how long they’re in quarantine for, etc.
  • Contact Report – the two contacts who came into contact, where and when the contact occurred

The volumes involved in the contact information is negligible (GBs), however, as you can imagine, it is not so negligible for the contact reports.

Let’s define a contact report as

  • User A – 16 bytes
  • User B – 16 bytes
  • GPS – 16 bytes
  • Timestamp – 4 bytes

Each contact report consists of 52 bytes.

If there are 48,000,000 people using the app daily resulting in an average of 50 reports per user ending up in the database (100 contacts per day, 50% duplicate reports) and each contact report is 52 bytes, in a year that will generate approx. 40TB worth of data!

This data needs to be stored somewhere where it can be efficiently stored and queried.

Elasticsearch

In deciding on the database, I was looking for extreme retrieval performance. Naturally a GraphDB such as Neo4j jumps out – after-all, logically contacts are just vertices and edges. However, when the Neo4j data-size calculator told me I had too much data – I was concerned. Furthermore, the master / slave architecture of Neo4j (all writes must occur on a single node) is concerning.

The alternative is Elasticsearch – a distributed data store built of indices that are stored on shards across a number of nodes. This distributed nature allows for distributed querying across the dataset, outperforming any other database on the market. Furthermore, the integration with Kibana to analyse the data provided an unrivaled end-to-end package.

So that’s it – we’re done. Now for the final question – what will it cost?

Cost

Ignoring the people cost, I’ve estimated the technology cost as follows (high-level estimates based on simplified AWS pricing).

ServiceCost per Day ($)
API Gateway556
SQS38
Lambda19
Elasticsearch Servers100
Elasticsearch EBS80
Cognito4,000
Total4,793

Therefore, it’s approximately $143,790 per month, or $1,725,480 per year. There are also SMS costs to verify users, totaling $1,866,240 for the required population.

That’s a 1st year cost of $‭3,661,720, or £2,938,640.

Conclusion

Creating a Contact Tracing app is not as simple as making that Bluetooth.startAdvertising() call on a mobile phone. The call sets into motion a wealth of complexity that can only be solved by the amazing collaboration of engineers, politicians, medical professionals, and mathematicians to name a few.

Choosing between a centralised and decentralised solution has major implications as highlighted throughout this article. However, I believe the advantages of being able to analyse this data greatly outweighs the technical complexity and privacy concerns.

There are 500 million tweets a day – a contact tracing application has the potential to report on almost 5 billion contacts a day. These volumes are unparalleled and in my opinion can only be met through a serverless cloud architecture such as the one outlined in this article (although this took me a day to design so it’s probably full of holes!).

What are your thoughts? Is centalisation worth it? How would you improve on this solution? Is it worth £3m?

Thank you for reading and stay safe.

Breadth-First Search (BFS) Visualisation

Computer Algorithms are not magic – they’re a defined set of instructions set out by a Software Engineer to achieve a certain goal. The goal of LeetCode problem 675 is to cut down trees in a forest, cutting the trees in ascending order and counting the minimum number of steps required – the ‘difficulty’ comes through obstacles in the forest that you cannot walk through, potentially preventing you from cutting down the tree’s.

In the video above, the obstacles are depicted in the grey colour. Within the force-directed view, a red node indicates the current starting position and green indicates the next tree we’re heading for (note how green becomes red as we walk to each tree and cut it down). A purple node indicates the ‘frontier’ of the BFS, whilst pink nodes indicate those nodes we have already visited in the search. All others nodes are blue.

The video is just a snippet, head over to the demo site to view from start to finish.

Producing this visualisation was surprisingly simple – D3 is used to create the force-directed graph you see on the left, and Three.js is used to mimic the ‘forest’ we’re trying to chop down. The most complicated part of the solution is structuring it in such a way that the algorithm execution can be ‘slowed down’ and visualised; this was achieved through a mix of JavaScript Intervals and Promises. The code can be found on GitHub – it was put together in an afternoon so please don’t judge!

So what next? Debugging is typically performed in an Integrated Development Environment (IDE) – however, for a lot of new and aspiring Software Engineers, this environment can be difficult to navigate. What if we could debug in an environment that we are all accustomed to – a highly visual, interactive environment. This is the purpose of the next stage of this piece of work (if I can find the time!) – I will release this LeetCode problem as an integrated web based IDE that visualises your code as it executes. Will you implement a BFS solution, or perhaps a more heuristic solution such as A*?

Personal Automation (Apple Shortcuts)

Automation is key in the Enterprise – from Software Development Lifecycle (SDLC) automation such as DevOps, to business process automation such as Robotic Process Automation (RPA). But what about automation for the Consumer?

Workflow Automation is nothing new on iOS – one of the most popular workflow apps of 2016/17 was an app named ‘Workflow‘. In fact, it was so popular that Workflow was purchased by Apple in 2017 for an undisclosed amount.

The app was re-branded as Apple Shortcuts and may be one of those Apple Apps that you file away in an ‘Apple’ folder and never touch. But hopefully by the end of this article, you will not only understand the architecture behind Apple Shortcuts, but having been taken through the development of a real-world shortcut (integrating with Trello), you will have the knowledge to dive straight in and start creating your own shortcuts. Whilst this article focuses on iOS, Android has a similar offering, Google Action Blocks.

What is (was) Workflow?

As with any Operating System (OS) environment, frameworks are provided to developers to enable them to complete common tasks. For example, on OS may provide frameworks to allow developers to record audio, manipulate an image, or send a text message. As an iOS Developer, you’ll often integrate with these frameworks provided by Apple. Fundamentally, these frameworks expose functions that can be combined together into a workflow (a sort of simple app) in a dynamic way (i.e. they don’t require the user to create an app and release via the App Store). This was the idea behind Workflow and is shown below:

Action Types

Workflow essentially exposed wrappers around these common functions (and created some of their own reusable actions that do not use underlying Apple frameworks – such as handling variables, rounding numbers, etc.). Through a slick UI, the app allows users to combine these actions together to create a workflow, passing the output of actions as the input to subsequent actions.

For example, you may have a workflow that gets all photos taken today from the users camera roll and creates a pre-populated text to send to your family with the images attached. Prior to Workflow, this could not be automated with out of the box iOS functionality. But with Workflow, this is possible and opens up a whole new genre of apps, or more generally, consumer automation opportunities.

It’s obvious to see why Apple decided to make such an acquisition, especially when you see how they’ve integrated it into the Siri ecosystem over time, as the following section explains.

Apple Shortcuts (Intent Framework)

It’s great that the Workflow developers could write ‘wrappers’ around iOS frameworks and expose them to users in the Workflow app, but what about integrating with other applications on the App Store? They don’t come as iOS frameworks, but they contain useful actions we may want to combine into a workflow. We also don’t want the developers who work on the now Shortcuts app having to write a wrapper around every possible application out there to enable it to be combined into a Shortcut workflow. The answer is a layered architecture.

To a Software Engineer, the concept behind Apple Shortcuts makes complete sense. Applications consist of a number of functions (or Actions as Shortcuts refers to them) – you book taxis on the Uber app, order drinks on Starbucks App, post pictures on the Instagram app, etc. As a Software Engineer, we see this as a layered architecture consisting of Presentation, Application, and Data layers. The Application Layer consists of business logic that can be reused across a number of presentation technologies (GUI, voice, workflow, etc.). This is the concept behind Apple Shortcuts, whilst Starbucks allows you to order a coffee via their App, you can also set up an Apple Shortcut that will order you a coffee from Starbucks as soon as you leave the house, without you ever having to open your phone.

Intent Actions

The way in which Apple allow apps to expose reusable ‘Actions’ is via the Intent Framework; the framework originated to provide Siri with a way of interacting with applications (i.e. get the latest headlines from the BBC News App). This architecture can be seen below:

Intent Architecture

The Application (Business) and Presentation Layers are explained below:

  • Business Logic (Application Layer) – regardless of the way the user interacts with the application, the goal of the user is the same – this is the business logic of the application. For example, the goal of a user interacting with Uber is mostly to book a taxi. In order to do this, a number of parameters must be specified, such as:
    • Pickup location
    • Destination location
    • Type of taxi
    • Billing Information
  • Application View (Presentation Layer) – this is the Uber App you’re used to interacting with – the interface that allows you to interact with a map to set pickup and destination locations. The Application View passes parameters selected by the user using the User Interface to the Business Logic to book a taxi. Note that the Business Logic does not care how the parameters are retrieved from the user, just that they’re provided. For example, they could also be proved by an Intent.
  • Intent (Presentation Layer) – an Intent is another form of user interaction whereby the interaction is not via the ‘App’ but via either Siri or Shortcuts. Much as the UI provides a way for the user to input parameters to invoke some Business Logic, an Intent also collects parameters, passing them onto the reusable Business Logic. Depending on the channel used by the intent however, the approach will vary:
    • If using Siri, the user may converse with Siri (through voice, cards, etc.)
    • If using Shortcuts, the input parameters may be preset by the Shortcut definition, or the user my be prompted as part of Shortcut execution.

Siri, a Personal Assistant

Shortcuts is integrated even deeper into the Siri ecosystem with a learning path going from business logic execution to the Siri Recommendation Engine. Siri is an Artificial Intelligence (AI) agent – it aims to learn about you to provide a more tailored experience. Run the shortcut ‘Book taxi’ every morning? Siri will learn that you do this Monday – Friday and present you with the ability to run the Shortcut from your lock screen in the mornings, Monday to Friday. This is achieved through the Siri Donation System.

Over the Christmas break, I wanted to learn more about Apple Shortcuts and so came up with my own Shortcut. I run my day-to-day personal and work tasks through Trello – each morning I’ll open the various boards to see what’s due today (and often what’s late!). It would be great if, upon turning off my alarm in the morning, Siri would read out this information to me.

Shortcut Tutorial

Before we get started, it’s important to understand what makes up a Shortcut. In my opinion, a shortcut consists of three ‘structures’:

  • Flow Control – if you’re going to create something that executes a series of events, you’re going to need to make some decisions on what to do. That’s where Flow Control comes in; this essentially boils down to if statements and loops – statements that control execution.
  • Apps – whilst a Shortcut doesn’t need to interact with Apps on your Apple device, often times it will. Examples include Trello, Google Maps, Starbucks, Photos, Messaging, etc. These actions are exposed via the Intent Framework described above.
  • Functions – your Apple Device will expose a number of functions provided by the Operating System, this enables you to do things such as make API calls, perform base-64 encoding and decoding, and parse text.

The above combine to create the “What’s on today?” Trello Shortcut – however, they can be combined in a number of ways. I considered two approaches, starting with Native Trello REST API Integration.

Native Trello REST API Integration

Initially, I wanted to see if an approach to calling REST APIs would be successful; whilst the Workflow developers have already written Actions that call the Trello REST APIs, I was interested to see how easy it would be to integrate via REST myself. How easy would it be to integrate with any REST API out there on the Internet?

REST API calls are made through the Get Contents of URL Action – you provide an endpoint as well as a method, headers, and in the case of POST, PUT and PATCH, an optional request body. The response can then be parsed (particularly if it’s a JSON response), often by sending the response to the Get Dictionary from Input action.

I was also interested to see how complex the responses could be; could I return some audio generated by AWS Polly and returned via Lambda to be played back on the iOS device?

The architecture is outlined below:

What’s on Today? AWS Architecture

Architecturally, the solution worked. Lambda would make a call to the Trello REST APIs to retrieve cards, formulate the textual response and request AWS Polly to turn it into speech. This audio would then be base64 encoded and returned as the payload. The Shortcut would then decode then base-64 decode the response and send this to the Play Audio action. It has proven to me just how much potential there is to be achieved by Apple Shortcuts.

However, due to poor support for POSTing JSON (essentially serialisation of lists – the list is parsed into a string separated by new line (‘\n’) characters as opposed to an array of objects), I decided to follow the Trello Shortcut Action architecture explained in the next section.

One point of note if you choose to implement this architecture – AWS Lambda has a 6MB restriction on response payloads. If you’re sending the AWS Polly audio file as a base64 encoded string, you may quickly breach this limit. In this instance, you’ll need to use Amazon S3 for delivery of the audio file to the device (i.e. Lambda returns a Presigned URL).

Trello Shortcut Action

Prior to Apple purchasing Workflow, the developers had created a wrapper Action around the Trello REST API that will OAuth (authenticate) with Trello and enable you to query boards, lists, and cards (you can also create Trello items). The response from the Action works perfectly with some of the Flow Control structures in Shortcuts such as the ‘Repeat with Each’ action.

I created the below Shortcut to retrieve items from my Personal Development Board and utilised Siri to verbally tell me what cards are overdue and due today.

What’s on Today? Shortcut Definition

The solution works great, however due to restrictions of the Trello Action, it is not possible to dynamically select a list (i.e. to loop through all boards). This prevents you from creating a shortcut that will loop over all cards across all your boards (which I would find useful).

You can download the Shortcut and modify it as you see fit – what additions would you make?

I also wanted to automate the way in which the Shortcut would trigger – one of the enhancements Apple has made to the Workflow app in Shortcuts is the ability to automate the triggering of a Shortcut.

Siri will learn when you use certain shortcuts and can recommend them on your home screen for your to trigger manually; however, shortcuts can also automatically trigger upon location conditions, alarm conditions, and certain device conditions such as a Bluetooth device connecting. For the context of the “What’s on Today” Shortcut, I wanted Siri to read out the things I had to do today when I turned my alarm off. The automation below achieves that.

Shortcut Automation

Future Improvements

In the 2 years Workflow has been under the control of Apple as the Shortcuts app, a number of key improvements have been made – specifically integration with the Intent Framework and support for Automation. Through the creation of the “What’s on Today?” Shortcut, there are a number of improvements I would like to see which are outlined in this section.

Type System Stability

The Type System within Shortcuts is what enables you to retrieve the attributes of objects that are the result of an Action. Unfortunately, actions cannot always correctly determine the type of an input if the action does not come immediately after the action that produced the relevant output.

In the example below, the Repeat and If blocks were added sequentially following Get Trello Items where the type system works correctly. However, above the Repeat block a Speak action was added, making the Get Trello Items and Repeat blocks disjoint. Therefore, when adding the Time Between action, I was unable to correctly set the Repeat Item type to Trello Card and retrieve the Card Due Date. Note how the If block continues to work as it was added at a point in time when the type system work working correctly.

Shortcut Type System Defect

OAuth Support

The majority of REST APIs exposed by third parties support the OAuth Protocol. As a protocol following a standardised set of processes, it makes sense to enable this as a generic Action within Shortcuts – the output of which is an access token that can be used as an input to the Get Contents of URL action.

JSON Handling

Shortcuts handles JSON responses well (i.e. accessing keys returned from an API call in a JSON response), but it’s not easy (tending towards impractical) at all to simply dynamically create a Dictionary data structure and send it in an API request (i.e. a POST). This was highlighted in the creation of the “What’s on Today?” Shortcut – this is a MUST have for Apple in the next release.

“Shortcut Store”

As illustrated through sharing the “What’s on Today?” Shortcut, sharing of shortcuts is not ideal. It makes sense that the Gallery within the Shortcuts app can be used for users to share Shortcuts they’ve created (with ‘Top Downloaded’ boards, etc.). Obviously this will add some rigor to the process (more like the App Store), but I don’t think this is a bad thing (we still want to control the quality of Shortcuts given how easy they can be created).

App Support for Shortcuts

Many apps currently support the Intent Framework to support Siri integration – however, additions are required to enable support for Shortcuts (minor changes to the Intent Definition file). I’d like to see more apps supporting shortcuts so that we can do things such as automate the booking of a taxi (i.e. with the tap of a button, have a taxi be booked from your current location to the hotel you’re staying at that week which Siri automatically shows on my lock screen at 1800 because it has learnt that’s when I use this particular Shortcut).

Apple Framework Support

Every year at WWDC, Apple introduce a wide range of additions to their frameworks. It would be great to see Apple frameworks automatically integrate with the Intent Framework and by inference Shortcuts so that developers of the Shortcuts app do not have to write wrappers each and every time functionality is added (or removed).

Automation Triggers

Shortcuts can be triggered by changes to the iOS device state – for example, arriving at a location or upon connecting to a certain Bluetooth device. It would be great for that to be expanded to include events coming from the battery, receiving notifications, etc.

Closing Thoughts

I’m disappointed in myself that I am only just discovering Apple Shortcuts, and formerly, Workflow. Through writing this blog post, I have learnt so much about a genre of app development that I feel has huge potential.

Consumer Automation is personal – you may want to send a text with your location to a family member when your battery level becomes critical; when you arrive at the train station, you may want your AirPods to output when the next train is due to leave that takes you back home; or you may want your phone to set a reminder when your smartwatch battery is critical. By giving the consumer an easy to understand interface to build their own automation’s, mobile devices can really begin to assist consumers in their day-to-day lives, not just interrupt them.

There are thousands of great services available to consumers that provide real value, however, as with 90% of the work I do for clients, integration adds another level of value. It’s the same for the consumer, by allowing them to chain together actions provided by these individual applications in ways that work for them, easily, everyone wins.

Creating a LISP-like Interpreter (Introduction to ANTLR)

Whether trying to understand natural language processing or the intent of some software written in a given programming language, understand language syntax is a key Software Engineering challenge. Understanding the structure of a language is achieved through an understanding of the language grammar. But what is a grammar?

the whole system and structure of a language or of languages in general, usually taken as consisting of syntax and morphology (including inflections) and sometimes also phonology and semantics.

Google Dictionary

The Parse Lisp Expression LeetCode problem touches on this through a challenge that requires you to parse an expression that conforms to a given syntax and execute it – this is commonly known as interpretation. JavaScript is an example of an interpreted language, but even then, that’s often debatable. If we’re not executing machine instructions directly on a processor that correspond specifically to the given input, we must figure out what the intention of a certain input is, and then use instructions that are available for execution on the processor (from the host environment) to execute the intent; due to this additional interpretation step, interpreted code is therefore inherently slower than ‘native’ compiled code.

This post first explains the solution, but then touches on a grammar parsing tool, ANTLR, which offers a more robust, logical approach to language parsing. ANTLR was not used in the solution to the LeetCode problem as external libraries cannot be used in solutions.

Approaches

In solving this problem, I considered two approaches:

I intended to solve this problem using JavaScript; that constraint influenced the implementation approach. Specifically, the parsing of brackets within a statement was not possible with JavaScript’s regular expression handling ‘engine’.

(let x 2 (add (let x 3 (let x 4 x)) x))

For example, in the above statement, JavaScript regular expressions (which themselves, are interpreted) are limited in their ability to support look-aheads and look-behinds, meaning that the nested brackets in this statement cannot be understood correctly.

I therefore had to use an iterative / recursive solution.

Solution

We know from the problem description that the statements fit into the following syntax; it has the following grammar:

add <expr1> <expr2>
mult <expr1> <expr2>
let (<v1> <expr1>)* <returnExpr>

The approach to solving this problem is to read words (tokens) from left to right, executing as and when required. Note how brackets are parsed – they introduce the concept of a context which is explained below.

(let x 2 (add (let x 3 (let x 4 x)) x))

Parsing of the above statement will:

  1. Recurse the expression (let x 2 (add (let x 3 (let x 4 x)) x))
    1. Assign x = 2
    2. Recurse the expression (add (let x 3 (let x 4 x)) x)
      1. Recurse the expression (let x 3 (let x 4 x))
        1. Assign x = 3
        2. Recurse the expression (let x 4 x)
          1. Assign x = 4
          2. Return 4
        3. Return 4
      2. Return 4 + 2
    3. Return 6

The important thing to ensure in the implementation is that when the context changes (i.e. you encounter a left bracket), the state of any variables within preceding let expressions are assigned. For example, before executing the add statement in the above, x is first assigned the value of 2. It’s also important that the current context is remembered such that upon returning to the source context, it can be restored.

Improving the Solution

There are a number of areas wherein the solution could be improved; it was by no means the fastest JavaScript solution on LeetCode (only outperforming approximately 80% of other JavaScript solutions in terms of time complexity). There are two primary areas where performance could be improved:

  1. Assign variables more efficiently when executing the interpreted statement – as explained above, when executing an expression within a let expression, the ‘context’ must be set such that any variables used within the sub-expression are available. The inefficiency in the solution is that all variables are assigned (and sometimes multiple times) every time a sub-expression is reached, even if they haven’t changed.
  2. When executing a sub-expression, the state of any variables must be sent to that sub-expression as the context, however, upon returning to the initial expression, the context must be ‘reset’. As JavaScript passes objects by reference, we cannot pass the same object around as assigning a value to an existing property would overwrite the original value, or context. JavaScript does not have a straightforward way of cloning an object (especially deep cloning an object) – the approach taken was to convert the object to a string and then back into an object… obviously inefficient.

But what if we’re working with a complex language – something whereby we need a robust approach to parsing the grammar. This is where ANTLR can be used.

ANTLR Grammar Parsing

What is ANTLR?

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

antlr.org

In the context of this problem, ANTLR is used to parse the input statement into a structure that software can easier interact with than just a string of characters; the structure is a graph known as a Parse Tree. Graphs form the solution to a number of problems on my blog; understanding graphs is vital for any Software Engineer.

There are three stages to execute an interpreted statement, they are:

  1. Parse the syntax;
  2. Validate the semantics, and;
  3. Execute the statement.

Parsing the syntax turns the ‘program’ into a navigable set of ‘tokens’ in the form of a parse tree. This enables the semantics of the statement to be validated (i.e. are we adding to a variable that does not exist). Given both the syntax and semantics validate successfully, the parse tree can be executed.

The parse tree for the example statement above resembles the following:

Parse Tree Example

You can see from this tree structure, the walking algorithm follows a depth first approach, flowing from the left most leaf node to the right.

So how does ANTLR generate this tree structure? The answer is through a grammar definition, executed against some input program – the grammar to parse the LISP expression syntax as outlined in the LeetCode problem description can be found below.

grammar Expr;		
prog:	expr;
expr: '(' 'let' ' ' (VAR ' ' (expr|VAR|INT))+ ' ' (expr|VAR|INT) ')'
	| '(' ('add'|'mult') ' ' (expr|VAR|INT) ' ' (expr|VAR|INT) ')'
        ;
VAR: [a-zA-Z0-9]+ ;
INT: [0-9]+ ;

The grammar is relatively straightforward to understand – it follows a similar syntax itself to regular expressions but is known as Backus–Naur form. The program is made up of an expr, for which there are structurally two types of expression, a let and an arithmetic expression (add or mult).

Interestingly, in order to parse the grammar file provided, ANTLR itself will utilise a defined grammar to parse the input grammar.

You can therefore see how defining a grammar with a tool such as ANTLR provides for a much more robust environment when compared with the initial solution explained earlier in this post.

Conclusion

Often, your implementation won’t be impacted by the chosen programming language. Certainly, whilst the computational complexity of a solution may be the same in two ‘languages’, the time complexity may differ wildly; something you may want to consider given your non-functional requirements. Solving this problem did however introduce a language / run-time limitation in JavaScript’s support for regular expressions which is not as advanced as say Java’s.

Finally, learn. The complexity (effort, cost, computational, etc.) of a solution should always be proportional to the problem – but that’s not to say you shouldn’t be curious as to how else a problem can be solved. Understanding how tools such as ANTLR work give you, a Software Engineer, another tool in your kit – opening up alternative ways to solving problems in future.

Integrating ADFS (SAML) into Campus Solutions 9.2

Single Sign-On is becoming increasingly popular within the enterprise – with the reduction in monolithic systems design, users access a number of systems to perform discrete operations. Maintaining separate logons for each system is cumbersome and makes identity management almost impossible – for both users and IT departments.

Single Sign-On enables users to maintain a single identity (with an Identity Provider), with applications (Service Providers) trusting the Identity Provider to successfully authenticate the user and pass back the identity.

Single Sign-On supports many authentication types – if users are authenticated on the enterprise network (i.e. Active Directory), their identity can be determined through the SPNEGO / Kerberos protocols, or more generally through Integrated Windows Authentication – this process is invisible to the end user. Where this is not the case, the user must enter their credentials through a form-based authentication approach such as below (additionally, 2 factor authentication may also be configured as part of the authentication process within ADFS – ensuring consistent authentication approaches across the application estate).

This article explains how to integrate SAML based ADFS as the authentication mechanism for Campus Solutions 9.2. However, this will also be a useful resource when attempting to integrate ADFS into any web application.

How does Identity Provider ADFS work?

Identity Provider (IdP ) Initiated SSO is initiated by ADFS (the IdP) sending a SAML Response to a Service Provider. The main difference between IdP Initiated and Service Provider (SP) Initiated SSO is the triggering of the authentication process. With SP SSO, the application will generate a SAML Request when the user attempts to access the application and forward this (and the user) to ADFS – this allows the SP to track authentication requests from initiation to completion.

Ultimately, ADFS operates on a series of redirects and HTTP form submissions. The sequence diagram below outlines the process. ‘System’ is an application a user is trying to access that is protected by ADFS (i.e. Campus Solutions).

Before getting started with the implementation, it’s important to have a basic understanding of some of the key terms referred to throughout the rest of this article – they are explained below.

Deep Linking (Relay State)

When using IdP Initiated SSO, deep linking is achieved through the RelayState. The RelayState is a query parameter that is sent to ADFS by the application when the user is not authenticated. This RelayState is then sent back to application following successful ADFS authentication.

The RelayState can take any form – it can be a URL for simple redirects or it can be a base64 encoded JSON string if required (although be careful of URL lengths).

The RelayState is particularly useful for achieving Deep Linking – with IdP Initiated SSO, the users browser is always redirected back to the same application endpoint upon successfully authenticating with ADFS. It’s the RelayState also provided in the ADFS -> SP redirect which enables the application to redirect the user to their intended destination within the application.

In order to support Deep Linking, RelayState must be enabled in ADFS. Read this great blog for instructions on how to do so.

ADFS Session Cookie

The process of Single Sign-On is achieved through the use of an ADFS Session Cookie – once a user has successfully authenticated with ADFS, a cookie is set on the ADFS domain. The next time the user is sent to the ADFS authentication screen, the session cookie is sent along with the request – ADFS will check to ensure that the session has not expired and if it has not, will redirect the user to the appropriate application as if they had just successfully entered their login credentials.

Application Session Cookie

Similar in principle to the ADFS Session Cookie, the Application Session Cookie enables the application to remember users (i.e. whether they’ve previously logged on). It too may have an expiry – once expired, the user should be redirected to ADFS. If their ADFS session cookie is still active (i.e. ADFS Session Cookie Timeout > Application Session Cookie Timeout), the user will be redirected back to the application as per the ADFS Session Cookie example. If the ADFS Session Cookie is also no longer valid, the user will need to enter their username and password (plus any 2 factor) again.

ADFS Relying Party Configuration

This article is not intended to go into detail on how to configure a Relying Party on ADFS – others have done much better than I could so I suggest you read articles such as this one from Microsoft.

Implementation

Given the knowledge of ADFS outlined above, the complexity that remains is the configuration of Campus Solutions to parse a SAML response, validate it came from the IdP and then set the Application Session Cookie against the user identified in the SAML response.

Campus Solutions

The following components must be configured within Campus Solutions to integrate with ADFS:

  • Web Profile (and associated Site in WebLogic)
  • Base64 Decoding
  • Sign-on PeopleCode (FuncLib)
  • Validation JAR

The interaction of these components is depicted in the diagram below.

Web Profile

The Web Profile is associated with a hosted Web Logic ‘website’ – it’s the context within which your users interact with Campus Solutions. It relates to the SSO process in a number of ways:

  • Runs sign-on people code as a very restricted public user
  • Determines what pages will be sent back to the users browser following authentication events (success, failure, etc.)

The first task is to create a Web Profile that will support ADFS SSO (you may want to keep a web profile that still supports Campus Solutions form authentication for administrators, etc.). Your authentication domain will likely match your Campus Solutions domain, i.e: sts.my-organisation.com or campus-solutions.my-organisation.com. If you encounter errors regarding CORS, your authentication and Campus Solutions domains may differ. Try adding both domains to the ‘Authorized Site’ section of the Web Profile configuration.

The next step is to set up the public user and allow public access to the Web Profile – this user will run the sign-on code, validating the SAML response and setting the user context based upon the subject within the SAML response.

Finally, the pages that should be returned following authentication events must be configured in the following way:

  • Signon Result Doc Page – update to signonresultdocredirect.html such that the original deep link in the RelayState can be returned to the user now they have successfully authenticated. This HTML is shipped with all Campus Solutions installs.
  • Signon Error Page – in our implementation, any authentication failure would result in the user being sent back to ADFS. We achieve this by sending the user to the signin page where they will be forwarded to ADFS.

Sign-on PeopleCode

The Sign-on PeopleCode has the responsibility of handling the authentication process – from deciding whether the user should be redirected to ADFS, to validating the SAML token and deciding what to do on success or failure. The full Sign-on PeopleCode can be found on GitHub – the below explains some of the key sections.

The first line of code that may look slightly strange is one that is trying to get a Java class called ADFSSAMLResponseValidator – as you may know, PeopleCode can be used to execute Java. In order to do so, a JAR file containing the required class(es) must be included within the classes directory of the server. More information can found here.

The call below loads a reference to the class into the &SAML_Validate variable. Read further down this article for details of the ADFSSAMLResponseValidator class.

&SAML_Validate = GetJavaClass("saml.saml.ADFSSAMLResponseValidator")

The result of the ADFS authentication process is a SAML Response being sent to the Service Provider authorization URL – this response can be retrieved from the request in PeopleCode using the below line of code.

&requestParameter = %Request.GetParameter("SAMLResponse");

The SAML Response is base64 encoded by ADFS – it must therefore be decoded. Details on how to do this can be found here. The below uses the Crypt library to decode the SAML Reponse.

&samlDecode = CreateObject("Crypt");
&samlDecode.Open("BASE64_DECODE");
&samlDecode.UpdateData(&requestParameter);
&decodeResult = &samlDecode.Result;

In order to check whether the SAML Response is valid – its signature must be recalculated and validated (see the SAML Validation section below). This is achieved through the below call to ValidateSAMLResponse (a method of the Java class loaded earlier), passing in the decoded XML string (the SAMLResponse).

&SAML_Valid = &SAML_Validate.GetInstance().ValidateSAMLResponse(&decodeResult);

In addition to ADFS returning the SAML Response, the Relay State is also returned as a form parameter. This is retrieved to redirect the user to the page they initially tried to visit before being sent off to ADFS.

&Redirect_URL = %Request.GetParameter("RelayState");

The call that is the point of the sign-on code, SetAuthenticationResult, sets the userId that is retrieved from the SAML Response (see the full code for details), and sets the ResultDocument to be the redirect URL. Campus Solutions now forwards this ResultDocument value to signonresultdocredirect.html (configured to do so as part of the Web Profile above) where the PS_TOKEN is returned in the HTTP response and set on the users browser; as the response is a 302, the user is also redirected accordingly.

SetAuthenticationResult( True, &userID, &Redirect_URL)

In our implementation, where the authentication is unsuccessful (i.e. there’s no SAML response or the SAML response is not valid), the user is sent back to the signin page which redirects them to ADFS.

Finally, the signon PeopleCode should be registered within Campus Solutions – it should be the only enabled signon PeopleCode function (can be seen below against sequence number 6).

That is all that is required to configure Campus Solutions to authenticate users against ADFS and further adopt SSO into your enterprise. The final section below details the SAML Validation JAR – a custom implementation that validates a SAML response against an ADFS metadata endpoint.

SAML Validation

Validating a SAML token is relatively straightforward – when the IdP returns a SAML response, it creates a hash of the response and signs that hash with a private key. When the SP receives the SAML response, it can validate the responses integrity by decrypting the signed hash (with a public key provided in the SAML response) and comparing it with a recalculated hash. If the values match, we can be sure the SAML response has not been changed since it was originally signed.

However, by using the public key in the SAML response, we can’t be sure where this key has come from. To ensure this message has come from the IdP – we retrieve the ADFS metadata from the relevant endpoint (typically something like https://server/FederationMetadata/2007-06/FederationMetadata.xml) and check that the public key provided in the SAML response is one owned by the IdP.

The Validation JAR I have created is available on GitHub – note that it makes use of the great OpenSAML library to validate the signature and retrieve endpoint metadata.

Conclusion

You’ll often hear people say Campus Solutions does not support ADFS – whilst it’s true that Campus Solutions does not natively support ADFS, it does provide the necessary means of configuring this integration. I hope that a future release of Campus Solutions does support ADFS natively – it’s the way all academic institutions I have worked at are heading and it would be a real negative of the product if integration required either 1) custom coding, or 2) the customer to be sponged by another third-party provider selling their wares.

If you’re struggling to integrate ADFS and Campus Solutions please do get in touch and I will endeavour to assist where I can.

JIRA Issue Visualiser

If you’re as frustrated with the lack of insight into your JIRA projects as I am – I’ve got the tool for you.

Use the JIRA Issue Visualiser, for free, and view the structure of your JIRA projects in less than a minute.

You’ll be asked to authorize via JIRA Cloud OAuth – once complete you can then paste in a JQL query. For example, to produce the above I just wanted to see all items within my project, so my query was ‘project=IAM’. You can read about JQL queries on the JIRA help-site if you are not familiar. You must be using JIRA Cloud and not a local JIRA installation to use this tool.

NOTE: this tool has been glued together in an evening. It is not user friendly or close to the finished article. However, in its current state, it can still be as useful to you as it is to me. If you’re interested in working on the tool, please see the bottom of this post.

Once you have provided a query, wait 20-30 seconds (you will just see a blank page whilst the data is retrieved).

You can click and drag the core JIRA issues (epics and stories) to organise the graph in a way that makes sense. You can also click on an issue to open it in JIRA.

NOTE: if the query returns more than 400 items at present, it may take longer than 20-30 seconds to load.

Why develop the JIRA Issue Visualiser?

Forget Agile, Scrum and Pillars – the successful completion of a project is dependent on the successful completion of a number of tasks inline with client expectations (cost, time, etc.) through a dedicated, talented team. Fundamentally, JIRA provides a way of organising those tasks into helpful chunks, and supercharges a collaborative approach to their completion. JIRA goes on to do 100x more, but at its core, it’s a task management system.

JIRA offers a number of great reports out of the box that attempt to give you a view of your project – however I feel they don’t give me a view on a page that tells me:

  1. How dependent are my issues upon each other?
    1. And in particular, what tasks are causing the biggest issues (RAIDs)?
  2. What’s the status of my tasks?
  3. How big / complex is the project?

I looked at a number of options – in particular using PowerBI plugins to create force-directed graphs, but they just weren’t flexible enough. Having used D3 before, I knew I could spin up something to meet my current requirements, but would also be flexible enough for the future. I created the JIRA Issue Visualiser and use it multiple times per day – I hope it can be as useful to you as it is to me.

Architecture

The diagram below outlines the high-level architecture for the JIRA Issue Visualiser – the core components include:

  1. AWS
    1. S3 – static web hosting to return an index.html
      1. D3 is used to render the force-directed graph
    2. Lambda – functions to handle OAuth and retrieving issue data
  2. JIRA
    1. OAuth – gives the application access to any JIRA Cloud instance through an access token granted by a logged in user
    2. REST API v2 – exposes JIRA issues in JSON
  1. When the user retrieves the static HTML (and JavaScript) from S3, the code checks to see if there’s an access token available – if not, the user is redirected to the OAuth JIRA endpoint
  2. The user logs into JIRA Cloud and authorises their credentials against the JIRA Issue Visualiser – following this, the user is redirected back to the resource in S3, with an authorisation code in the query parameters
  3. The static page retrieves the authorisation code from the query parameters and sends it to AWS Lambda to be swapped for an access token
  4. The Lambda function sends a request containing the authorisation code to JIRA (along with private credentials such as the client secret)
  5. JIRA responds with the Bearer access token
  6. The access token is returned to the users browser (note Lambda is stateless and therefore does not maintain any sort of application session – I didn’t want to integrate DynamoDB or similar at this point)
  7. The user enters a JQL query which is sent to Lambda along with the access token
    1. The call can not be made from the browser direct to JIRA due to CORS limitations on JIRA cloud and the resultant restriction this puts on browser CORS security
  8. Lambda makes a call to the Issue Search REST endpoint, passing the JQL and access token. Due to the limitations of JIRA only returning 100 issues per API call, Lambda will make n number of API calls to retrieve all issues
  9. JIRA responds with issue information including issue links and subtasks
  10. The combined list of issues is returned to the users browser where it is rendered into a force-directed graph by D3

Future Enhancements

The code for the 2 Lambda functions and HTML / JavaScript can be found on GitHub – feel free to contribute (message me on LinkedIn to get started). This is by no means a finished product, future work could include:

  1. Retrieving issues from JIRA concurrently
  2. Remembering the access-token on page refresh so the user doesn’t have to re-authorise
  3. Improve UI/UX (i.e. not using JavaScript prompts to retrieve a JQL query!)
  4. Move the UI to a more future-proofed architecture (i.e. an SPA)
  5. The ability to update force-directed graph properties (charge, gravity, etc.) on the page
  6. Contextual menu containing useful information regarding the issue without having to click it
  7. The query will display related nodes where the related node is a node also returned by the JQL query – it should work regardless
  8. Validate the JQL provided by the user
  9. Setup AWS DevOps

LeetCode – Binary Tree Cameras [HARD]

Introduction

The Binary Tree Cameras problem focusses on the binary tree data structure, a form of graph. The high-level aim being to add a ‘camera’ to the least number of nodes such that every node is either a camera or has an edge that connects to a node with a camera.

Whilst this problem focusses on the tree structure, graphs are used extensively as the data structure behind many of the digital services we interact with on a day-to-day basis, such as:

  • Facebook Graph – modelling your friendships, posts, likes, comments, etc. to help provide a better user experience (i.e. Simon and John are both connect to the Sarah node, I’ll recommend John as a friend to Simon)
  • Google Maps – a road network is just a set of vertices with edges where you can transfer from one road to another. Google may use a graph to determine how to get from from vertex A to vertex B (with perhaps the edges containing information such as maximum speed, traffic volume, etc.). This is similar to routing protocols used on many corporate LANs / MANs

It’s obvious a good Software Engineer needs to understand trees and more generally graphs.

This post outlines my O(n) solution – every node of the graph is visited exactly once. This solution is faster and more memory efficient than all other accepted JavaScript solutions on LeetCode.

Approaches

Due to the use of a Binary Tree data structure, there are a number of constraints on the problem which make designing a solution a little easier (although I guess you don’t know what you don’t know, so I’m sure there’s a better way of solving this problem than mine!).

Ultimately you need to start at a node, traverse the tree in some way, and make decisions as to where cameras should be added – there are therefore three techniques:

  • Start from root node (Root-Down) – start at the top and work your way down to the leaf nodes.
  • Start at a random node – not sure how this would be sensible approach, but it’s an option!
  • Start at the leaf nodes (Leaf-Up) – starting at the leaf nodes, work your way back up to the root node.

Whatever approach is taken to solve the problem, there are two algorithmic approaches for traversing the tree:

If you’re interested in learning more about tree structures, understanding traversal techniques such as breadth-first and depth-first are also important – particularly as tree search techniques.

I believe recursion is the simplest technique for navigating a graph – however, it does have one major disadvantage; whether programming in a high-level language such as Java or writing in low-level assembly, the stack is a limiting factor. A stack grows upwards and has a reserved area in the main memory of a computer – processes are assigned a portion of the stack area in memory to grow into, and the growth occurs when, amongst other things, function calls are made.

When a function is called, essentially (true enough for the purpose of this paragraph) the state of the current function is pushed onto the stack, as well as a return address and the parameters sent to the next function (known as the stack frame). When the new function returns, everything is popped off, and the code continues from where it left off with the correct state. However, if a function doesn’t return and more and more function calls are made (i.e. a very large tree), too many stack frames are pushed onto the stack and the stack limit is breached causing an exception. This limit is often lower than you expect. You can see how this can be an issue for recursive solutions.

Whilst I had this issue in the back of my mind, I decided to progress and learn more about the problem using a technique I was comfortable with; thankfully none of the test cases breached this limit! However, for this reason, I’m not the biggest fan of my solution – I know it will fail for large binary trees.

Root-Down

Often times the best way to solve a problem is to just get started – whilst this technique may often not result in the final solution, it will provide an opportunity to learn more about the problem, rather than trying to theoretically think through the entire solution.

I began to solve this problem using a root-down approach but quickly realised the logic was booming too complex – demonstrated through the example below:

The algorithm may initially think the first node doesn’t have a camera and isn’t covered by one itself, so it makes sense to put one down. We then go down the left and right sub trees, following the rule such that if a node is ‘covered’ by a camera, don’t put a camera down.

The most efficient number of cameras for the above solution is 3 however, not 4. How could this be solved? We could implement some sort of lookahead, but it’s no longer a truly recursive solution, and certainly not O(n).

What about starting at the leaf nodes? We need some knowledge of the structure of the tree to make efficient decisions.

Leaf-Up

If you recursively visit each each node until you reach a leaf node (of which there could of course be multiple), you can begin to give the parent nodes some context as to whether or not a camera should be added. Implementing this algorithm will result in the below solution for the problem introduced above:

By modifying the rule slightly – if as a node, you’re covered by a camera or your parent is covered by a camera, do not place a camera if you have children nodes. You can see how this rule related in an efficient placement around A->B and A->E.

By starting at the leaf nodes, you can provide context to the decision making process which results in the efficient placement of cameras.

Analysis

This problem wasn’t particularly hard, but it did provide a couple of learning opportunities which are discussed in this section. They are:

  1. Auto-generated Unit Testing
  2. Optimising code, and
  3. Parallel Computing

Auto-generated Unit Testing

The benefits of Test Driven Development (TDD) are well known, but as with the majority of testing approaches, it relies upon a human defining n number of scenarios that test the various use cases. It can be difficult if not close to impossible to tell whether you have the right level of coverage, because you don’t know what you don’t know. Solving LeetCode problems involves writing some code, running the tests, realising you didn’t cater for a certain scenario, updating your solution, and repeating. What if we weren’t given the outputs of the LeetCode tests? We were just told whether the solution is correct or not.

Thinking of this scenario and the fact that as humans we don’t know what we don’t know, would auto-generated unit tests be feasible? Could we let some code randomly create binary trees that we test against? The process would be:

  1. Software creates a binary tree
  2. Human inputs the expected value (so will only work for moderately sizes trees)
  3. Solution executes and is compared against expected value

In this way, we don’t need to rely on human intelligence to understand the various scenarios… but how likely and how quickly will auto-generated unit tests find errors in a solution? This obviously depends on the complexity of the input you’re generating, but in this scenario anything more than 100 unit tests would probably begin to annoy me.

The code to run these automated unit tests is available on GitHub; by ‘breaking’ various parts of my solution (to mimc the development process outlined above), auto-generation of unit tests found these issues after, on average, 2.5 rounds. That took me by surprise – this could be a really useful technique going forward.

So if you’re working for Facebook and you want to test your recommendation engine, you could auto-generate social graphs to compare an updated solution output to the existing solution. If you work for Google and you’re updating your directions functionality, will you remember to write a test that goes through a small village with a railway crossing that stops cars for 5 minutes when the barriers are down? Does your updated directions functionality take account of this scenario? Again, the success of this approach depends on the probability of auto-generation creating a good distribution of units tests, but it’s an interesting thought.

Optimising Code

What about performance? What makes a solution faster than 96% of solutions as opposed to being faster than all other solutions? As it turns out, not much. The usual suspects always play a part (including as discussed in a preview blog post, the shared LeetCode execution environment):

Reduce (or better yet, eliminate) logging. Logging in NodeJS leaves the NodeJS environment (event loop) – this can be a costly process. Often time pre-compiler directives (in C for example) can be used to eliminate certain logging statements from even making their way into the final executable, potentially reducing an application size so much that it can fit entirely into main memory, with the benefits that brings.

Stop Assigning variables that aren’t used (or no longer serve a purpose) – the Node object of the input to the solution contains a val property – I used this property to identify which nodes had cameras (0 and 1) when logging. Once I had a working solution, this was a wasted operation… so remove it.

Think about your code and how it will be compiled down (or interpreted) – if you think about the code block below, we’re assigning two variables, executing some conditional statements, and again assigning some variables a value.

var leftResp = null
var rightResp = null
if (node.left) {
    leftResp = recurseNode(node.left, true)
}
if (node.right) {
    rightResp = recurseNode(node.right, true)
}

If this was in a compiled program, we’d have:

  1. Some stack operations to reserve address space
  2. Conditional logic including varying JMP commands as well as truth logic
  3. Further stack operations (calling functions, assigning values, etc.)

Compare that with the below code which achieves the same:

var leftResp = node.left && recurseNode(node.left, true)
var rightResp = node.right && recurseNode(node.right, true)

If this were to be compiled down, we’d have:

  1. Some stack operations to reserve address space
  2. Some some truth logic
  3. Some more stack operations (calling functions, assigning values, etc.)

Now whilst the AND statement may result in some JMP commands to ensure the rhs does not execute if the lhs is false, this refactoring did result in a faster execution time. At a high level, less CPU instructions results in a faster execution time.

Parallel Computing

How else could we improve the performance of the solution that isn’t just optimising code? We could parallelise the solution. Whilst NodeJS is inherently single threaded following an event loop architecture, it does support support multi-threading through Workers. Could the solution be modified such that where a node branches off with 50% of nodes on one branch and 50% on the other (in the ideal world), we spin up an additional worker to execute in parallel (note not concurrently, we’d be making use of multi-processor architectures). However, much as calling functions and the resultant context switching has a performance impact (and generally makes recursive solutions slower than their iterative counterparts), so too does parallelising your code. Creating and destroying threads is a costly process.

Whilst parallelising the solution would have the aim of improving execution time, it has no impact on the computation complexity, it would remain O(n) as we must still visit each node once.

Conclusion

As with anything in life, knowledge and experience go hand in hand. To make the best decisions, you need both – ultimately a lack of experience will lead to errors and therefore more experience. As a Software Engineer, this experience leads to an intuition whereby you can tell when it’s worth abandoning a solution and looking elsewhere. Following a Root-Down approach, I got the feeling the solution wasn’t as clean as it could be (simply down to lines of code), so I decided to try something else. Trusting your intuition as a Software Engineer is extremely important.

Further to this, you may want to develop the perfect solution, but sometimes this is not possible and / or required. In this example I know my recursive solution wouldn’t work for extremely large trees, however it meets all the requirements outlined by the LeetCode Unit Tests. Understand (which often involves communicating with stakeholders) whether the scenarios not covered by your solution are likely to ever occur and are therefore worth the extra investment to cater for.

LeetCode – LFU Cache [HARD]

Introduction

The Least Frequently Used (LFU) Cache problem brings together a number of Computer Science techniques and principles to solve a very “Computer Science’y” problem.

In summary, the problem asks you to build a cache that could mimic the behavior found in hardware caches (i.e. CPU caches) as well as software based caches (i.e. Redis, memcached, etc.). The cache is composed of key:value pair objects.

The problem is essentially structured into 3 areas:

  1. Intialising a cache of a given size
  2. Supporting the addition of items to the cache such that if the cache is full, the least frequently accessed item is removed (to make space) and the new item added.
    1. If the item already exists in the cache, its value is updated and the frequency of that item being accessed is increased by 1
  3. Supporting the retrieval of items from the cache such that each retrieval updates the frequency at which the retrieved item is accessed, as well as when it was last accessed (and importantly in relation to when it was last accessed, when compared with items accessed at the same frequency)

My O(1) JavaScript solution was 99.31% faster (although this measurement isn’t particularly fair as the environment is not the same for each run (i.e. it’s a shared environment)) and 100% more memory efficient than all other accepted JavaScript solutions.

Approaches

In the problem description, it mentions that a solution of complexity O(1) is desirable – meaning the retrieval or addition of an item takes the same number of steps, irrelevant of the item or the size of the cache (from a computational perspective – obviously larger software caches may perform slower where hardware caching, virtual memory and paging come into play).

Ignoring the O(1) requirement, I identified two possible solutions:

  • Iterative approach (not O(1))
  • Doubly Linked-List approach (O(1))

Iterative Approach

The initial approach that comes to mind is to split the items into buckets based on frequency, with the buckets containing an array of items. The item at the top of the array is the most recently accessed item for that frequency. This is demonstrated in the diagram below:

The issue with this solution is that in order to determine whether an item is in the bucket, you must loop through potentially all the items in the bucket. This prevents the solution from achieving O(1) and is more O(n).

Other issues I encountered with this approach around around maintaining index pointers into the array and the complexity this introduced when items are removed in particular. Essentially, I found a loop was always required (although I’d love to see an O(1) solution for this approach if it exists!).

Doubly Linked-List Approach

In order to achieve O(1), a solution must be designed such that, in simple terms, there are no loops or recursion (which are valid approaches to solving the problem). Whereas the previous solution relied on buckets and loops to look through all the items, an O(1) solution must therefore utilise a different data structure(s).

Those data structures are:

The solution is highlighted in the diagram below:

Two doubly linked-lists are used:

  1. The first doubly linked-list creates a set of buckets that contain items based on the frequency at which they’re accessed
  2. The second doubly linked-list (the root of which is the ‘data’ segment of the first doubly linked-list – the frequency bucket) stores the items in order of last accessed
    1. This is particularly useful when wanting to delete the least frequently accessed item

This data structure on its own does not meet our O(1) requirement; if we wanted to determine whether an item exists, we’d have to traverse each bucket and the items within each bucket (a complexity of essentially O(n)). The solution is simple, maintain a dictionary which points to the items within the doubly linked-list structure.

There are additional minor details which won’t be expanded upon in this post, but I invite you to critique the solution on GitHub.

Analysis

This problem was definitely less of a thought experiment and more of a problem you may come across during a career as a Software Engineer. My takeaways are as follows:

  • Understand when a solution will benefit from a more rigid approach – apart from scribbling a bit-part solution on paper, I started development straight away. The doubly linked-list solution naturally developed and therefore I hadn’t coded for it to be reusable or use a reusable library. This resulted in some code duplication and generally increased the difficulty of debugging the solution. Obviously if I was to productionise a solution such as this, I would refactor to use a library implementation of a doubly linked-list that was well tested.
  • Having a basic understanding of computational complexity is important for all software engineers – regardless as to whether you’re developing for embedded systems or consumer websites. Problems can be solved in any number of ways, but its not usually until scaled out (not least in the enterprise) that the way the problem was solved becomes important. By understanding the impact of your solution, you’ll save on some pain further down the road.
  • All software boils down to data structures and algorithms. You may have been taught about them at University and now subconsciously use them in your job; or you may have never studied them but as a Software Engineer find yourself using them naturally. Having an understanding of the common data structures (linked lists, stacks, dictionaries, etc.) and how to operate upon them can increase the speed at which you develop reliable, quality code (especially if using libraries). Data Structures and Algorithms are not just topics to be studied for exams – keep them at the front of your mind.

Conclusion

I’ve solved many problems on other ‘coding’ challenge sites such as Project Euler – these problems can often seem too theoretical to take any practical lessons from. It’s been enjoyable to solve a problem that is Computer Science focused – whilst I didn’t learn as much as I may do when compared with a Project Euler problem, it has reinforced some Computer Science theory and the importance of it in your day-to-day job as a Software Engineer.

LeetCode – Max Points on a Line

Introduction

Max Points on a Line is the third least accepted (against submitted solutions) problem on LeetCode; it is also the second hardest problem in the ‘hard’ category. At first glance, the problem seems simple – from a set of points, determine the longest straight line that can be formed from those points and return the number of points in the subset. However, as the saying goes, if you assume it makes an ass out of u and me.

This is my write-up detailing the approach I took to solving the problem and the issues I encountered along the way. Lessons learnt can always be taken from one context and applied to another – hopefully you find this useful.

Approaches

I explored two options when it comes to solving this problem, both explained below. An ‘agricultural’ approach which I quickly abandoned, and a ‘formulaic’ approach that did work but encountered some interesting issues.

Whilst I won’t be posting my solution in this article, you can find my implementation on GitHub (linked in the Social Banner above).

Agricultural

Having forgotten some high school mathematics – my brain instantly went trying to understand how an algorithm based on scanning and or recursion / iteration could solve the problem. Could I determine the ‘step’ between two points and then for each remaining point, see if I continue the step pattern whether or not the point in question would be reached.

The algorithm quickly became complex – you would obviously need to check the step in both positive and negative directions (reducing performance), you may want to start finding the point in the line that would be closest to the point your analysing to slightly improve the performance, etc.

Good problem solving doesn’t mean coming to the correct solution first time – it’s also recognising when an approach isn’t working and looking for another way. y = mx + b.

Formulaic

The straight line equation in its well known form will determine y for a points associated x where the slope (m) and y-intercept (b) of the line are known.

My initial approach was to take two points in an existing line (any two points can be the start of a line) and determine their slope m = ( (y1 – y2) / (x1 – x2) ). The remaining variable to determine was the y-intercept which with the formula rearranged stated b = y – mx. I knew the slope from the previous calculation and I know x and y from an existing point so I could determine b, the y-intercept. The final step was to determine if for the given x of the point being analysed, given the y-intercept and the defined slope, did the resulting y == y of the point being analysed.

The approach worked – I went from passing 50% of the test cases using the agricultural approach to passing approx. 90%. But what was causing the final 10% to fail? Precision.

In day-to-day life as a Software Engineer, precision doesn’t usually impact you unless you’re working on financial systems, sensor based system, and other similarly complex, numerically focussed applications.

My solution was based on JavaScript which by default represents all floating point numbers as 64-bit floats. This means a number with a fractional component can be represented up to 17 decimal places, and even then may not be accurate. Both of these limitations caused issues.

The following test case enabled me to identify the issue:

[[0,0],[94911151,94911150],[94911152,94911151]]

Whilst the final two points are very close to each other, when on a slope with 0,0, all three do not form a straight line. If we try to determine the slope of the line containing the first two points, we would calculate -94911150/-94911151 – the answer according to the JavaScript engine is 0.9999999894638303. To determine the slope between the first and third point, we would do -94911151/-94911152 – the answer according to the JavaScript engine is 0.9999999894638303. Brilliant! All three points are on a straight line! But they’re not…

Use a high precision calculator and you’ll see that -94911150/-94911151 = 0.999999989463830230022181482131641201991112719726684170124541003617161907 and -94911151/-94911152 = 0.999999989463830341033053734296682016882483946670460811601991723796588202.

You can see around digits 16/17, the answer begins to vary… the slope is not the same. However, due to JavaScripts (although this problem isn’t limited to native JavaScript) precision limit and accuracy issues (i.e. rounding) mean that for this problem, I am unable to correctly to determine the slope. So, how do you solve the problem?

Initially I knew I had a precision error, but I wasn’t sure whether it was determining the slope, y-intercept, and y value for the cross-comparison. My first thought was do I need to determine the y-intercept and the y value? Surely if two points are on the same slope as two other points on the line, we’re good. Having removed the other elements, I noticed the precision issue above.

My immediate thought was that by using 0,0 as a comparison point, we were working with an answer for the slope that was extremely precise. If another point was used, the result may not require the same precision. If points 2 and 3 form the input were used to determine the slope, the result would be 1 (-1/-1). As 1 != 0.9999999894638303 – we pass the test case. This solution worked for all test cases and the ultimate acceptance of the solution. However, it’s obvious to see for test cases that result in a similar precision no matter what points are selected, the solution would encounter problems.

Analysis

LeetCode provides some great solution analytics – at the time of solving the problem, my solution execution time was 79% faster than other JavaScript solutions. My assumption is that that these solutions may have taken more agricultural approaches.

There are a few key takeaways from working through this solution:

  1. All programming languages have limitations (of which some can be solved with third-party libraries – in this case, several BigNumber libraries). Whilst they may not impact you day-to-day, if you’re encountering ‘strange’ issues, they’re good to have in the back of your mind. Worse, often the runtime won’t throw any errors to tell you it has automatically rounded a result or is not accurate past a certain point.
  2. Formal Specification techniques are used in critical systems to at runtime ensure code is behaving according to some mathematical constraints. Whilst formal specifications may not be necessary for all projects, there’s nothing wrong with having code to check the validity of results. For example, if adding two decimals both with two digits following the decimal point, the result won’t have more than two digits following the decimal point. If it does, you know something has gone wrong.
  3. Understand what the minimum required steps are to solve the problem – my initial formulaic approach trying to understand the y value was adding complexity to the solution and making it harder to debug. I only needed to calculate the slope to solve this problem.
  4. Another bug I encountered was due to not fully understanding the specification – my assumption (the second time I assumed wrong in solving this problem!) was that the points had to be consecutive according to the step function. This wasn’t the case and was not clearly stipulated in the problem description.

Conslusion

This problem was great fun to solve. Day-to-day in the workplace, my clients are using software in ways were these sorts of problems do not manifest themselves. But they exists – it’s important as a Software Engineer to be aware of them and to challenge yourself in ways that you may not be challenged in your day job (not that it lacks its own unique set of challenges!).

Apart from the usual cliches such as testing is only as good as the test cases and not fully understanding requirements is the downfall of many IT projects, I will take two ideas from this exercise:

  1. Testing can be built into your code – if the result of performing an action is that n number of things should be in a list (without knowing the contents), add some code to check that n things are in the list. If not, an issue must have occurred (either data or code)
  2. Understand what the minimum required steps are to solve the problem – initially you may just want to solve a problem, with the result not being the most elegant solution. Even if you don’t encounter issues, refactoring is important. It results in less instructions, potentially leading to less bugs in future; for those bugs that do occur, the solution will be easier to debug.