blog

DeNAのエンジニアが考えていることや、担当しているサービスについて情報発信しています

2025.08.19 技術記事

What's New in Aurora DSQL? (A Comparison with Spanner) [DeNA Infra SRE]

by Hiroyuki Nishizaki

#infrastructure #sre #database #aws #aurora #spanner

*This article is the English translation of a post originally published in Japanese on March 18, 2025. Please be aware that the content reflects the time when Aurora DSQL was still in its preview phase and not yet Generally Available (GA). With its GA, the official pricing structure has been released. We recommend consulting this latest information when considering Aurora DSQL.

Hello, this is Hiro from Group 4 of the IT Platform Department. My primary role is to manage the infrastructure for mobile games that are released worldwide.

This is the second article in a three-part series, and today I will explore what makes Amazon Aurora DSQL new and different when compared to Google Cloud Spanner. Please note that this is not an exhaustive comparison; rather, it aims to deepen our understanding by focusing on DSQL’s unique characteristics and architecture to help guide future decisions on which to use.

While there are many differences between Aurora DSQL and Spanner, the key distinction I’ll introduce today is Aurora DSQL’s superior latency and scaling performance, achieved through fine-grained component separation and optimistic concurrency control.

How Spanner ensures data consistency

Let’s begin with Spanner. Although also known as a distributed SQL database, its underlying mechanism differs from Aurora DSQL. In short, Spanner maintains data consistency by having all servers holding the updated data communicate with each other.1.

Spanner’s minimal internal configuration can be illustrated as follows2:

Spanservers

The internal servers in Spanner, called Spanservers, contain both compute resources and storage (data on a distributed file system). To ensure redundancy, Spanner replicates data across different zones, meaning Spanservers holding the same data exist in multiple zones.

These Spanservers replicate data using a protocol called Paxos, and a set of these servers forms a Paxos Group. Without going into detail, Paxos is a protocol for reaching consensus (e.g., whether to commit a transaction) among multiple machines. This is why when Spanner updates data, communication among the servers holding that data is mandatory.

As data grows, Spanner automatically creates multiple Paxos Groups to distribute the data. This is essentially a managed, internal implementation of the sharding that has been used in traditional RDBs and previous versions of Aurora.

When a transaction needs to update data across multiple Paxos Groups, Spanner employs a two-phase commit (2PC) protocol between them. Consensus must still be reached within each Paxos Group involved, so the more groups a transaction spans, the more communication overhead is generated. This architecture means that latency increases as you update data across a larger number of geographically dispersed servers.

spanner 2pc architecture

What’s New in Aurora DSQL?

“4x Faster” than Spanner

In contrast, Aurora DSQL is marketed for its lower latency compared to Spanner, as AWS’s CEO highlighted in the re:Invent keynote. While the “4x” figure should be seen as a marketing tagline, how is this characteristic actually achieved?

re:Invent ceo keynote

source: https://youtu.be/LY7m5LQliAo?t=3639

Efficient Write Transactions

Below is a diagram of Aurora DSQL’s internal architecture. Similar to the diagram in the previous article in this series , it shows the path of a user’s update query from the left to the Storage layer on the far right. The key point to notice here is that by isolating the functions required for write transactions into dedicated components (Adjudicator and Journal), Aurora DSQL eliminates the need for direct communications between the data-holding Storage nodes.

Aurora DSQL architecture

source: https://brooker.co.za/blog/2024/12/05/inside-dsql-writes.html

(The component descriptions are the same as in the previous post)

  • Query Processor
    • The compute component that contains a customized Postgres engine.
    • It receives requests from users and executes read and write operations.
  • Adjudicator
    • It checks whether a write operation initiated by a Query Processor would violate data consistency (i.e., if there are conflicting transactions).
  • Journal
    • It receives data to be written, makes it durable, and asynchronously communicates the changes to the storage.
  • Storage
    • It stores the data.

This separation of concerns allows a write transaction to be fully committed only with the following flow:

Aurora DSQL commit architecture

A commit is considered durable the moment it’s written to the Journal, and then the update is to be reflected in the Storage layer automatically. This means that completing a commit does not require waiting for communication with the Storage nodes. While communication does occur between Adjudicators, this process is optimized due to their specialized role, making it more efficient than direct communication between the actual data-holding Storage nodes3. We can anticipate that this will improve latency, especially in cases where the data being updated spans multiple shards (Paxos Groups in Spanner) in the storage layer.

Adoption of Optimistic Concurrency Control

Furthermore, Aurora DSQL uses optimistic concurrency control, which reduces latency by avoiding the need for upfront locking. This advantage is most pronounced in a multi-region configuration, as communication with other regions is not required until commit time, significantly reducing latency. The diagram below illustrates the flow for read and write operations; steps 1 through 4 are all completed within a single region (whereas in Spanner, a read operation requires contacting the Paxos Group’s leader server to acquire a lock4).

Aurora DSQL read & write architecture

Considering that the state of updated data must be globally consistent at commit time, this is an architecture that theoretically enables the fastest possible parallel writes from different regions to a globally distributed cluster. The CEO’s keynote explicitly mentioned running transactions with “READS & WRITES” in a “multi-Region” setup, suggesting this specific characteristic was the basis of the “4x faster” feature.

Independent Scaling of Performance Aspects

Because Aurora DSQL’s components are finely-grained, its performance aspects—such as reads, writes, storage capacity, and SQL execution—can all scale independently5. This allows DSQL to scale for virtually any use case, from a personal hobby project to a large enterprise application, without requiring any complex configuration. A user only needs to create a cluster to access all its capabilities, including scaling6.

The ability for each performance metric to scale independently is likely due to the fact that read(Query Processor and Storage), write(Query Processor, Adjudicator, and Journal), storage capacity(Storage), and SQL execution(Query Processor) can be scaled by scaling their corresponding components7.

In contrast, Spanner’s performance is managed by a single setting: the number of nodes. Even if you need to scale only one aspect, you must add entire nodes and pay for them(the storage capacity is also limited to 10TB per node8, so you may need to add a node just for capacity, even if request traffic is low). Furthermore, autoscaling requires either using an open-source tool or the managed feature that became GA in February 2025(available only for Enterprise and Enterprise Plus editions)9.

Since the detailed Terms of Use for Aurora DSQL are not yet public, the concrete benefits of its scaling model remain to be seen. However, Aurora DSQL’s scalability is a clear architectural advantage, and it is emphasized on the official website and in the blog I’ve referenced. We are hopeful that this will translate into user benefits like flexible and rapid automatic scaling without manual intervention, and a more granular, pay-for-what-you-use pricing model.

Summary

In this article, I focused on the key characteristic that makes Aurora DSQL’s latency potentially lower than Spanner’s. Aurora DSQL is a distributed SQL database that particularly excels at executing low-latency, parallel, non-conflicting writes from different regions in a multi-region deployment.

This doesn’t mean it’s difficult to use DSQL in a single region; you can still enjoy the benefits of distributed SQL with low latency. Furthuremor, I expect the scaling advantages from its fine-grained component architecture to become clearer in the future.

On the other hand, the reliance on optimistic concurrency control to achieve these characteristics means it is not well-suited for transactions that are prone to conflicts or are long-running. In such cases, Spanner or traditional Aurora would likely remain the better choice.

It’s important to remember that Aurora DSQL is still in a limited preview stage, and much of my analysis is based on inferences from public information, as it’s still unable to test DSQL’s actual performance or multi-region capabilities in practice. Moreover, even with PostgreSQL compatibility, there will be differences in supported features. Once Aurora DSQL becomes Generally Available, we strongly recommend performing workload-specific tests before making a selection.

最後まで読んでいただき、ありがとうございます!
この記事をシェアしていただける方はこちらからお願いします。

recruit

DeNAでは、失敗を恐れず常に挑戦し続けるエンジニアを募集しています。