Scaling applications

5 minute read


Cloud services make creating new, scalable applications very easy, at a cost.

Amazon AWS, Google Cloud Computing, Microsoft Azure, and Alibaba Cloud are probably the best known of the top-tier cloud service providers.

Amazon is probably the best known of these, with good reason:

AWS continued to drive much of Amazon’s profit. Operating income increased 37% from a year earlier to $3.56 billion, but trailing the $3.75 billion FactSet consensus estimate. That means 52% of the company’s operating income can be attributed to AWS, compared with about two-thirds in the same period a year ago.

So the public-facing side of Amazon, the online retail side, accounts for only 48% of Amazon’s profit.

But, back to the topic of this post: scaling. The examples I’m going to give are all Amazon-centric, because that is the cloud platform I have the most experience with.

Cybersecurity starts with security

The Solar Winds security fiasco should make every software engineer start every project thinking about security.

Every project.

For this look at scalability, that means a few questions need to be answered.

Confidentiality : What public internet access does my application need?

Not every part of a computing system needs direct public access to the internet. So, reducing the accessibility to your machine by the public internet is a **Good ThingTM</sum>**.

Amazon has a variety of security configurations that enable this ranging from Virtual Private Cloud (VPC) to Security Groups to Identity and Access Management (IAM).

Integrity : What data am I storing?

The next question you need to ask is what data am I storing? And how am I storing it?

Knowing what data you are storing is important, especially if any of your customers are based in the European Union and need to be compliant with the General Data Protection Regulation (GDPR)

Availability : Where are my customers?

To first look at security, we need to know where your customers are.


Because cloud service providers are still physically located somewhere. And if your customers are all on the West Coast of the USA, but you decide to use Amazon’s Singapore data center (ap-southeast-1), then perhaps delays or network connectivity might stop availability.

A better selection might be in California (us-west-1) or Oregon (us-west-1).

Below shows the current console list of selectable data centers.

Amazon's data center locations.

Depending on the criticality of your application, it might be prudent also to site instances in geographically distant data centers. California, for example, is known to have the occasional earthquake. So having a backup site in Canada (ca-central-1) is a good move.

This addresses the security aspect of availability.

More availability

Another aspect of availability is ensuring that the application works. Recently, many websites supposedly designed to allow people to register for COVID-19 vaccinations are failing. The software worked perfectly in test, but as soon as real people wanting real vaccinations started using the site, they failed.

The same thing happened in the Iowa Democratic caucuses.

For the software industry, this failure is unacceptable. It is unacceptable because it is completely predictable.

There are many ways to help an app can scale. In broad strokes, there are two main ways: vertically and horizontally.

Scaling vertically

Vertical scaling is about using a larger machine. My simple Take A Number queueing system uses a t2.micro instance to run.

As you can see from the table below, this instance type has 1 GB of memory and 1 virtual CPU, and is the second smallest available.

As you can also see form the table, I have other options in the t2 range, all the way up to the t2.2xlarge which has 32 GB of memory and 8 virtual CPUs.

Older T2 instance specifications.

Using a larger computer to help with scale is called scaling vertically. We don’t change the software at all, we just use a bigger machine to run it on.

M6G specifications.

If the t2 range doesn’t provide enough, then there are other machine architectures available. The largest in the m6g range has 256 GB of memory and 64 virtual processors.

Scaling horizontally

Horizontal scaling is about using more machines in your application.

This one is harder to implement, because it can require a different approach to writing the application.

In AWS, we put a pool of machines behind an Elastic Load Balancer. The load balancer decides which machine of the pool gets any incoming requests.

The diagram below shows the requests coming from the left, hitting the load balancer, and then being distributed to the different machines.

Diagram of an elastic load balancer.

A key problem in writing an application to be able to horizontally scale is connectivity to the database. Most applications require a single database. And scaling the code across many machines does nothing if all those machines clobber a single database at the same time. The same problem happens: instead of the webserver being overwhelmed, now it’s the database server.

So, in addition to scaling the application horizontally, the database must also be scaled horizontally. This often means having “read replicas”, many copies of the database that receive all read requests (SELECTs) and process them. Write requests still generally need to go to the prime database server, unless a sharding approach is taken.