Geographically Disperse Infrastructure Use Case

A company based in USA has clients in multiple regions: USA and Europe. This company currently bases his infrastructure in USA, and their objective is to enhance the latency and the availability.

The latency

Latency is the delay from the user request to first bit of data response.

Latency is different from bandwidth: after the first bit of data is received the bandwidth determines how fast the response will be transferred.

Bandwidth and latency are also different from page load: once all the resources are transferred, the page load time includes the time required to load and render Javascript, CSS and other resources.

Higher latency affects sales and conversions. Various reports have stated that:

Although bandwidth and page load time also affect conversions, these reports demonstrate causality between latency and conversions, ignoring secondary metrics as page load.

The results are shocking. Our intuition states that 100ms of latency shouldn't affect human behaviour: Surely the human brain can barely notice 100ms !. However the data exposes a different reality; If our website experiences 100ms latency, and we are making 100k revenue per day, we could be facing annual looses from 365k to 1.4 million.

If a company makes 100k per day, 100ms extra latency could reduce the revenue by 1.4 million every year

Traditional solutions

Currently, there are best practices to reduce the page load.

Meanwhile there are fewer solutions to reduce the latency, and most of them focuses on CDN technology;

  • To deliver static content from a node geographically close to your user.
  • To define catching strategies, converting dynamic content into static content when possible.
  • In the most advanced systems, to optimize the network path between your infrastructure and your user when dynamic content is delivered.

However, CDNs have several problems:

  • They are effective with static or cached content only.
  • Complex caching strategies are difficult to maintain.
  • When the best path is routed to deliver dynamic content, the best achievable theoretical speed is still only the speed of light, over vast distances.

ManageaCloud approach

ManageaCloud approaches the latency problem by creating several synchronized independent infrastructures scattered around the world:

  • Orchestrating geographically disperse infrastructure, locating it as close as possible to your clients.
  • Using traffic management to ensure that your clients connects to the closest infrastructure available.
  • Combining multiple public and private clouds to deliver the fastest content possible.
  • Combining traffic management and monitoring to achieve high availability along different geographical locations and/or different cloud suppliers.
  • Beating complexity by automating everything.

These techniques can be used alongside the traditional best practices to reduce the page load.

What about the persistence layer ?

The most complicated persistence layer to scale geographically used to be the database. But not any more.

In recent years, the most popular databases, relational and NoSQL, added two features that allow for the geographical dispersion and geographical scalability:

  • multi-master replication: Allowing for data writing in different master databases at the same time.
  • eventual consistency: Spreading new information to the cluster's members does not happen immediately. Data will only be transferred when possible. It normally just takes a few milliseconds, but if the case that connectivity between masters suffers, the application's performance will not be affected. And if one of the servers or cluster dies, the data will be seamlessly re-synchronized during back up.

However, this situation requires additional awareness:

  • How does the cluster behave if two nodes modify the same piece of information at the same time ?
  • How does the cluster behave if two nodes add the same primary key at the same time ?

Furthermore, Development teams needs to know that the application is geographically disperse and companies user cases must be reviewed to confirm that, in the unlikely event of a cluster failing, application reliability is unaffected

Application Architecture

This company decides to use ManageaCloud as part of their strategy to improve the latency and high availability by using Geographically Disperse infrastructures. Their application and the system configuration management codes are publicly available in github.

The server architecture contains two roles:

ManageaCloud needs to understand how to configure these roles. Therefore, we create one server configuration that contains two server blocks: application block and a database block.

Macfile The blueprint architecture file

The roles defined above depends on each other. As we want to achieve Geographical Dispersion, we orchestrate those servers in USA, Europe and Asia.

mac: 0.7.1
description: Distributed Infrastructure
name: Demo distributed
version: 1.0

roles:
  master:
    instance create:
      configuration: demo_disperse_database
      environment:
      - PUBLIC_IP: master.PUBLIC_IP
  joining_master:
    instance create:
      configuration: demo_disperse_database
      environment:
      - PUBLIC_IP: joining_master.PUBLIC_IP
      - MASTER_IP: master.PUBLIC_IP

infrastructures:
  usa:
    name: 'dbusa'
    location: us-central1-c
    role: master
  europe:
    name: 'dbeurope'
    location: europe-west1-b
    role: joining_master

This orchestration must work alongside the traffic management, sending the users to the closest node and activating the geographical high availability. Please contact us to get more information about these set ups.

Benefits

Latency is reduced

Originally, ManageaCloud was physically located in San Francisco using one popular cloud supplier. If you monitored the latency of manageacloud.com from a different geographical location (the probes are located in Europe and USA), manageacloud had a latency average of 470ms (using SSL).

Then, we created three different infrastructures: Europe, USA and Australia, reducing the latency to an average of 240ms (using SSL).

The locations that we chose are still quite generic. The infrastructure is further expanded, use different public cloud suppliers, using hybrid strategies, etc. the global latency could be reduced even further.

Traffic Management is enabled

This set-up allows you to utilise the power of traffic management. For example:

  • Clients will connect to the closest node.
  • You can set rules based on the traffic in a particular node, including: number of connections, load average and active requests. If a node is overloaded, then some traffic is sent to another node.

Geographical high availability is available

This is a more advanced alternative to disaster recovery. If you already have two nodes working at the same time in at least two different geographical locations, (and maybe using different cloud suppliers) you can achieve real time geographical high availability. If one of the nodes is not working any more, the system will deactivate the node and send the traffic to the closest available node.

Automation is enforced

As a desirable side effect of Geographical Infrastructure, the systems must be fully automated.

Cloud supplier mobility

Creating and destroying infrastructure in different cloud suppliers is now a trivial task. This reduces coupling with one of a company's major assets: The supplier for cloud services. You can use any other supplier at any time in a matter of minutes, enhancing the mobility when needed.

Want to know more ?

If you'd like to know how ManageaCloud can help you with Geographically Disperse Infrastructure, please get in touch. We'd be delighted to hear from you.