Software Systems Design (FIRST DRAFT)

Version 0.1

7. Modern globe-spanning architectures

Today, in the 2020’s, large tech companies offer products and services to practically anyone on the planet, provided they have a network connection and (usually) some cash to spare. Given not only the ambitious scale of such endeavors, but also varying government regulations and cultural expectations, how do the tech giants design their systems?

We saw the limitations of a classic three-tier architecture, which will only scale to a certain degree (though it may be highly available, very responsive, or have other desirable properties). To deploy on a very large scale, a high degree of automation is needed. Cloud computing aims to automate the provisioning of servers, services (including storage), and on-demand computation.

Not every globe-spanning application is properly a “cloud-based” one, but we will use cloud computing as our canonical design for this chapter.

The world is a big, weird, and often wonderful place

There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy
[Hamlet, Act 1, Scene 5]

Horatio has just heard the ghost of Hamlet’s father, who he had already seen, and he’s confounded because he doesn’t believe in ghosts. Hamlet admonishes him that the world is a stranger place than he can imagine. And it is! Unless we have lived everywhere and taken on all roles (socio-economic, gender, race, ethnic, religious, and so on), we must accept that there are things about the world and its people that will surprise us.

The design and use of cloud computing systems is complicated as much by geopolitical and cross-cultural realities as by technical challenges. A single country, like the U.S., has a diverse population and many regional differences. Imagine building a system to serve people in many countries, across many continents and time zones.

Example 1: Names

TODO: Add text and reference.

Example 2: Maps

You want to provide a map as part of your application. Clearly, you want the names on the map to be displayed in each user’s selected language. Is that the only change between what you show one user versus a different user?

Example 3: GDPR

The General Data Protection Regulation of the European Union is a prominent example of legal requirements that applications must meet when serving residents of the European Union.

Notably, GPDR applies no matter where customer data is located. Among other implications for software system design are required features, including:

  • Limiting the transfer of EU customer data to countries outside the European Economic Area. In practice, to follow this rule often requires storing EU customer data in the EU, or even in their specific country of residence.
  • Using EU customer data only to fulfill contractual obligations, i.e. to do whatever business the customer wants done, such as to read an article or purchase a product.
  • Providing a mechanism by which an EU customer can obtain a copy of all data your system has about them.
  • Providing a mechanism to erase all data about an EU customer at their request.

Many industries work within specific regulations like the Health Insurance Portability and Accountability Act. In the United States, HIPAA address health data, while other laws address educational data, financial data, telecommunications data, and the like. Many of these regulations are considered weak from a consumer standpoint, allowing companies to freely share data with partners and subsidiaries which may number in the hundreds.

Automation

Traffic to (demand for) any application may vary widely from hour to hour, or from day to day. There are two ways to handle a high traffic load:

  1. Deploy a system large enough to handle the highest anticipated load; or
  2. Automate the scaling of a system such that its size can be increased as load increases, and decreased otherwise, to save costs when demand is low.

The term scaling up is frequently used to mean “buy/rent a bigger server”. It applies to bandwidth and storage as well. For a great many applications, this is what we do when responsiveness suffers due to high demand.

We scale out a system by adding resources to it: network connections, servers, and storage are the primary ones.

The largest applications can only be scaled out. As with three-tier applications, we can add and subtract servers by using a load balancer in front of them. But at very high scales, we must also distribute our storage (databases and static content) and our network connections (e.g. having regional access points like www.google.fr).

To scale out in this way, we must carefully design our applications and services. And we must do so while bearing in mind the geopolitical and cultural implications of our choices.

Topics for investigation and discussion

  • What is a country? Network packets don’t know where they are, so does the internet even have a notion of international borders?
  • To serve customers in multiple countries, what does our application have to do differently for country A versus country B?
  • Consider building an installer for some complicated software. Can we use ask the user what country they are in, and use that information to configure settings like language and the display formats of currency, dates, and more?
  • As a customer, does it matter to me personally where my bank has its data centers? How about just the data center that my mobile banking app talks to? And what about where the secondary (failover) data center is? If I live in NC, do I care if the data center is in Washington, D.C. or Toronto, Ontario?
  • In the late months of 2023, who controlled Ukraine’s terrestrial internet access?
  • Who controlled satellite internet access there in 2023?

Key engineering questions:

  • DNS maps names to ip addresses. Packets are sent to ip addresses. How do routers know how to route packets? Who defines the internet topology, and how are updates made?
  • When I ping www.google.com, I get 142.251.167.105. Is that what everyone gets?
  • Large applications (services) may be distributed over many data centers. What factors might be important in deciding where to locate a data center (or which existing data to choose for an application)?
  • Cloud computing is in many respects an attempt to virtualize everything: servers, networks, and storage are the basics. What is a virtual machine? What is a virtual (private) network? What might it mean to have virtual storage?
  • Automating the scale-out of an application is not a trivial task. It takes time to spin up a new server, or to provision and configure other services.
    • What happens if demand varies more quickly than we can respond?
    • When demand drops, how do we decide whether to reduce scale (deprovision resources) or wait, in case demand rises again?
  • Fortunately, there is a long history in engineering of controlling complex systems.
    • What is open loop control of a system, as compared with closed loop control?
    • How might the concept of proportional control apply to scale-out?
    • In terms of application demand, responsiveness, and deployed size, where do we see the control theory concepts of proportion, integral, and derivative factors?

Further Reading