Eventual Data Design

I do a lot of work with web API's, particularly as React web-app back-ends, and an issue I constantly run-into is the speed at which these requests return. These long-running web requests makes pages slow to render, with content having to be hidden behind a loader or other skeleton animation while the user waits.

These React projects, however, are low-traffic and low-value. I don't want to have to pay for servers to continuously pre-cache data that's nice for a user to have and access quickly, but isn't all that critical or value-adding to the system overall.

While thinking about this, I was wondering why I couldn't just do the long-running processing for the data after the request had terminated? I was working in C#, which is well known to separate the actual application from the web server - meaning that you can't control the web server from within your app in a very meaningful way. To me this meant a change in technologies, to something where the web server was actually part of the application - this was a job for Node.js.

In Node.js, you don't have to return a script to terminate the web request. This is perfect for what I had thought about.

 

The design outline

  1. The user hits the JSON API, and they are either returned initial data displaying default values, or data from the cache.
  2. In the JSON object, a field describes the state of the cache - if the data will be renewed after this request, when the data was last cached, etc., so that the client knows how old the data is and can make decisions based on that.
  3. After the request is terminated, the timestamp on the cache is evaluated to determine if the data needs to be refreshed (or is ignored if it always needs to be as up-to-date as possible), and then the script continues to do the typical things that you would do to process or otherwise add value to the API data.
  4. Transformed data should be added to a fast database, such as MongoDB, Redis, or DocumentDB, so it can be looked up quickly in the first step.
  5. Script terminates.

 

Pros

  • Data is consistently quickly delivered to the user
    A simple key/value query to a database is much faster than selecting information from a range of data stores, and then processing it. The transformations applied to different sets of data may vary in the amount of time it takes to complete the job, meaning if this job occurs while the client is waiting, they may cancel the request before it is complete.
  • Longer-running data transformations can be employed
    Data is only ever processed when the client isn't waiting for a response, meaning a longer-running query or data transformation process can be applied without slowing down requests. This results in more valuable data being delivered more consistently.
  • Cost of data processing is dramatically reduced
    The data is only processed when the client requests it, rather than having to run a timed or constant job to keep data up-to-date. It is impossible to predict exactly when a client is going to require this data, so it would be wasteful and expensive to continuously update these sets of data when it's only going to be requested a couple of times per day by each client - perhaps not at all. Machine learning could be applied to discover when a client is most likely to make a request, but this is adding more cost in an attempt to lower costs.

 

Cons

  • Data is always outdated
    This design shouldn't be applied to critical data sources, as the information being displayed will always be tied to when the job was last performed for this data set. The issues is that it could be at best a few seconds old, or at worst months or even years, depending on the client's activity.
  • Repeat requests defeat the purpose
    Depending on your design, you may run-into issues where the client hits the API multiple times, before the last requests have completed their work. This means that the potentially expensive data transformation may occur multiple times, with only one transformation being value added - a sort of race-condition. This can of course be fixed with ACID transactions, but it's always going to be a possibility while the work dispatcher is controlled by the client (as they initiate API requests), and not an orderly scheduling system.

 

Example

If you want to see an example of how I implemented this with Google Cloud Platform's Functions, I have a now-defunct set of scripts located on GitHub: interflare/archived-functions.

There are of course many improvements than can be made to prevent repetitive requests, such as a system that marks the job as in-progress in a database, but with the project I was working on, it didn't really matter.

 

Conclusion

This design probably isn't new, but I believe it's a great way to think about accessing and processing data that means little to clients and your application, but is nice to have nonetheless. Rather than stripping out everything that isn't going to make you money, this might be a dirt-cheap way of distributing supplementary data that could mean a customer chooses you over every other service.

Kali Linux on DigitalOcean

Kali Linux is a distribution of Linux that is used for penetration testing. While normally distributed in a typical ISO file for install on hardware, it has also been wrapped into a Docker image to run it on basically any OS with Docker installed.

On DigitalOcean, you can't install an ISO file. This is where the Docker image comes into play. On DigitalOcean you can set up a container distribution to use Kali. I like to use CoreOS.


Once you've set up your container distribution, go ahead and login with your SSH key, using the 'core' user. That was one thing that was not immediately obvious, because I would typically SSH into virtual machines with 'root'.

As the Kali Docker documentation explains, you can pull the OS like so:

And that's it! You can install and update the tools you'll need as you would normally.


It's important to note, you can't install the GUI version of Kali Linux like this. If you wish to do that, you will need to set it up as a guest in something like OpenVZ or Xen, and then connect to it through VNC or RDP.

One-Page App Routing on Netlify

While playing around with react router on a react app I'm building and deploying to Netlify, I noticed that I would get a 404 page not found error if I fully-refreshed the page, or entered from a location other than the index.

Looking deep into the Netlify docs I found out that you have to add a little file to your /public directory named '_redirects' with the following content:

All it does is rewrite all requests to any file that doesn't already exist to the index page, where react router can handle it.

Note that navigating to a file that exists from the app will not work without a forced refresh. For example if you were at '/', and you typed '/file.png', you would be passing '/file.png' to the router, of which a route (probably) won't exist.

Pretty simple!

Here we go again with Blogger 🙄

I decided I wanted to switch to a static blog from Blogger a few months ago, mostly because of the performance benefits of plain-old HTML. I don't know what's happened to me or Blogger since then, but this platform has somehow managed to pull itself out of the gutter and into my life as my blogging home once more.

Blogger obviously has more pro's than the likes of Jekyll or Hugo or *insert static generator here*, an actual editor and dynamic fixtures for starters, but the reason I originally ditched it was because of its difficulty regarding customisation and, most importantly for me, page load times.

First of all, customisation on Blogger is hell. The only thing you get to edit is this massive, confusing, super-dense, and non-standard XML file. Yes, you get that theme customiser where you can dump some CSS and turn a few knobs. Fine. I wanted to set-up things like Twitter cards per page/post in the meta of the HTML, but it turns out that it's actually impossible with the way that Blogger was designed. This is still the case today, and it's still a push factor.

The second push factor, as I said, was the page speed. My original Blogger home page used to take about four seconds to load, according to previous speed tests. I hated this, not that it really matters because this blog is obscenely obscure, but it was super embarrassing to me. This is what tipped the scales for me, and I ended-up moving my posts over to GitHub pages. I didn't really have much I wanted to move, so there was definitely a tonne of archive 'shrinkage'. Actually, now that I think of it - I didn't move any posts. I reckon I was glad to have the excuse to erase my dumb stuff.

But yeah, I'm back on Blogger because I felt like the struggle of: creating a new markdown document, typing-up all the front matter, and then pushing to git, was just such a drag. Once I got back up and running, I actually noticed that things were running much faster than I remembered. I don't know if it's because I'm publishing this using a G Suite account this time around - maybe there's a different set of servers or framework for paying customers.

Regardless, I actually think I missed this ugly WYSIWYG editor. At least I don't have to type-in the damn date and time now.

SSL on Blogger with Custom Domain

[Update 2018-06-27]: Wow okay, Google really dusted-off the Blogger codebase and decided to implement SSL on all custom domains now. They must have seen my post and decided to implement it ;) that's my #IMPACT.

---

Once again, I changed my mind with which service I want to use for my blog. This time around, I decided to go back to Blogger - a dynamic blogging service, as opposed to my previous static blog. However because Blogger is getting a little bit old now, it's a bit of a pain in the ass to implement the expectations of a modern website - particularly SSL.

SSL is managed and enabled by Google by default, if you use their subdomain. If you want your own custom domain, they expect you to go without SSL. However, this can be fixed with our good 'ol pal CloudFlare.


Go ahead and add your Blogger domain to CloudFlare like you would normally, and do the same on Blogger.


Note that you might have to 'grey-cloud' your domains on CloudFlare while you verify them on Blogger, but make sure you switch them back to orange when it's done, or the reverse proxy won't be enabled.

Then head over to the 'SSL' section of CloudFlare, and enable the following:
  • Always use HTTPS
  • Require modern TLS
  • Automatic HTTPS rewrites
It can take up to 24 hours for your CloudFlare certificate to be issued, so don't expect it to work right away. However after that period of time, you should have a tidy green padlock for your new blog!

Let me know in the comments if you're having trouble, and I'll help you out.

Featured

Debounce API Requests in a JS Class

I have a react application that needs to access an API endpoint on the client side - which is hit when a user clicks on a button, as well as...

Popular