<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Mark Sayson</title>
    <description>Notes on software development, technology, and life.</description>
    <link>https://www.marksayson.com/</link>
    <atom:link href="https://www.marksayson.com/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Tue, 09 Jun 2026 07:36:59 +0000</pubDate>
    <lastBuildDate>Tue, 09 Jun 2026 07:36:59 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
      <item>
        <title>Why deletion means different things in different systems</title>
        <description>&lt;p&gt;A customer submits a data deletion request. The deletion workflow executes across dozens of systems, and every system reports success.&lt;/p&gt;

&lt;p&gt;Yet weeks later, the customer’s records still exist in a data warehouse, a search index, database backups, and a fraud investigation system.&lt;/p&gt;

&lt;p&gt;At first glance, this looks like a deletion failure. However, every system may have behaved exactly as designed. The problem is that “delete” meant something different in each environment, and the organization may struggle to explain the discrepancy during an audit.&lt;/p&gt;

&lt;h2 id=&quot;deletion-is-not-a-single-operation&quot;&gt;Deletion is not a single operation&lt;/h2&gt;

&lt;p&gt;Privacy requirements are typically expressed as a desired outcome, such as removing personal data that should no longer be retained or processed.&lt;/p&gt;

&lt;p&gt;Translating this into technical requirements is difficult because organizations store data across systems designed for different purposes, such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Transaction processing&lt;/li&gt;
  &lt;li&gt;Search&lt;/li&gt;
  &lt;li&gt;Analytics and machine learning&lt;/li&gt;
  &lt;li&gt;Logging and observability&lt;/li&gt;
  &lt;li&gt;Archival storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A transactional database may physically remove records or mark them as inactive to preserve referential integrity. A search index may remove documents asynchronously and update indexes over time. A data warehouse may rewrite partitions or de-identify records containing personal data. An event stream may represent deletion through tombstone events and downstream compaction.&lt;/p&gt;

&lt;p&gt;As a result, the same deletion request can produce different outcomes across systems, depending on how each system interprets what deletion should do.&lt;/p&gt;

&lt;h2 id=&quot;common-deletion-patterns&quot;&gt;Common deletion patterns&lt;/h2&gt;

&lt;p&gt;These outcomes typically fall into a small number of recurring deletion patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hard deletion&lt;/strong&gt;: Data is physically removed from the system.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Most aligned with intuitive expectation of deletion&lt;/li&gt;
  &lt;li&gt;Conceptually simple&lt;/li&gt;
  &lt;li&gt;Can introduce referential integrity or recovery challenges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Soft deletion&lt;/strong&gt;: Data remains in the system but is marked as inactive or deleted.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Common in transactional systems&lt;/li&gt;
  &lt;li&gt;Supports recovery&lt;/li&gt;
  &lt;li&gt;Data still physically exists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Soft deletes can become problematic if the personal data is still readily available to internal teams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tombstoning&lt;/strong&gt;: A deletion marker replaces the original record to signal removal.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Common in event streams and distributed databases&lt;/li&gt;
  &lt;li&gt;Supports synchronization across eventually consistent systems&lt;/li&gt;
  &lt;li&gt;Original data may persist until compaction processes run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Anonymization or de-identification&lt;/strong&gt;: Identifiers are removed or transformed so that records can no longer be linked back to the user.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Original identity no longer recoverable&lt;/li&gt;
  &lt;li&gt;Data may remain useful for analytics and reporting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Organizations may use anonymization or de-identification when full deletion is not required or feasible. However, preserving analytical utility while removing all identifying information can be difficult. Re-identification risks increase when joining datasets across systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key destruction&lt;/strong&gt;: Encryption keys are deleted so that data can no longer be decrypted.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Common in cloud architectures&lt;/li&gt;
  &lt;li&gt;Encrypted data remains stored&lt;/li&gt;
  &lt;li&gt;Access is prevented via inability to decrypt the data rather than through physical deletion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Suppression&lt;/strong&gt;: Data is retained but excluded from operational processing.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Common in marketing and advertising&lt;/li&gt;
  &lt;li&gt;Typically implemented via fine-grained access controls&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;comparison-of-deletion-patterns-and-their-desired-outcomes&quot;&gt;Comparison of deletion patterns and their desired outcomes&lt;/h3&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Deletion pattern&lt;/th&gt;
      &lt;th&gt;Original data retained?&lt;/th&gt;
      &lt;th&gt;Recoverable?&lt;/th&gt;
      &lt;th&gt;Desired outcome&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Hard delete&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
      &lt;td&gt;Data should no longer exist&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Soft delete&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
      &lt;td&gt;Data may need to be restored&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Tombstone&lt;/td&gt;
      &lt;td&gt;Temporarily&lt;/td&gt;
      &lt;td&gt;Sometimes&lt;/td&gt;
      &lt;td&gt;Deletion must be communicated to other systems&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Anonymization&lt;/td&gt;
      &lt;td&gt;Partially&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
      &lt;td&gt;Identity should be removed while preserving analytical value&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Key destruction&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
      &lt;td&gt;Data should become permanently inaccessible&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Suppression&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
      &lt;td&gt;Data must be retained but not actively used&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-inconsistent-deletion-semantics-create-operational-problems&quot;&gt;Why inconsistent deletion semantics create operational problems&lt;/h2&gt;

&lt;p&gt;The existence of multiple deletion models is not inherently problematic.&lt;/p&gt;

&lt;p&gt;The problem arises when organizations assume they are equivalent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 1: Mismatched hard vs soft deletion expectation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Privacy team expects deletion.&lt;/li&gt;
  &lt;li&gt;System performs soft delete.&lt;/li&gt;
  &lt;li&gt;Downstream systems continue to process soft-deleted data due to inconsistent enforcement of deletion semantics.&lt;/li&gt;
  &lt;li&gt;Result: Data remains accessible internally, against intended policy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/images/20260608_SoftDeletionComplianceGapExcalidraw.svg&quot; alt=&quot;Diagram illustrating compliance gaps from soft deletion mismatching expectation&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 2: Suppression inconsistently applied across data use cases&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Customer requests deletion.&lt;/li&gt;
  &lt;li&gt;Marketing platform suppresses customer data from email campaigns.&lt;/li&gt;
  &lt;li&gt;Personal data remains accessible to other operational systems.&lt;/li&gt;
  &lt;li&gt;Result: Teams disagree on whether preventing their system’s use of data is equivalent to deleting personal data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example 3: Anonymized data re-identified in derived datasets&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Privacy team expects customer personally identifiable information (PII) to be removed.&lt;/li&gt;
  &lt;li&gt;Analytics platform de-identifies records rather than deleting.&lt;/li&gt;
  &lt;li&gt;Data scientists continue querying aggregated datasets.&lt;/li&gt;
  &lt;li&gt;Privacy reviewers discover that individuals become re-identified when datasets are joined.&lt;/li&gt;
  &lt;li&gt;Result: Teams disagree on cross-system anonymization requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example 4: Data reappearance via event replays&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Source system deletes data.&lt;/li&gt;
  &lt;li&gt;Replaying historical pipeline events rehydrates AKA recreates deleted records.&lt;/li&gt;
  &lt;li&gt;Systems lack mechanisms to prevent or remove rehydrated data after replay.&lt;/li&gt;
  &lt;li&gt;Result: Systems differ on whether deletion should prevent historical state from being reintroduced, resulting in an enforcement gap.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/images/20260608_RestoredDeletedDataExcalidraw.svg&quot; alt=&quot;Diagram illustrating compliance gaps from deleted data being rehydrated&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Key insight: Many deletion failures are semantic mismatches rather than execution failures.&lt;/p&gt;

&lt;h2 id=&quot;why-this-matters-for-automation&quot;&gt;Why this matters for automation&lt;/h2&gt;

&lt;p&gt;Organizations often want standardized workflows and automated orchestration and reporting.&lt;/p&gt;

&lt;p&gt;However, in order for a deletion platform to answer “Did deletion succeed?”, it must first answer: “What was the expected outcome for this system?”&lt;/p&gt;

&lt;p&gt;Instead of treating deletion as a collection of service-specific scripts, organizations may need a way to declare:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What deletion means for each system&lt;/li&gt;
  &lt;li&gt;Which deletion model applies&lt;/li&gt;
  &lt;li&gt;How success is verified&lt;/li&gt;
  &lt;li&gt;What evidence should be collected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The challenge is explicitly defining deletion semantics, ensuring each system enforces and verifies the intended deletion mechanism, and validating that personal data has been removed across the organization.&lt;/p&gt;
</description>
        <pubDate>Thu, 04 Jun 2026 04:30:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/why-deletion-means-different-things/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/why-deletion-means-different-things/</guid>
        
        
        <category>privacy-engineering</category>
        
        <category>right-to-be-forgotten</category>
        
      </item>
    
      <item>
        <title>Why data deletion is still an unsolved infrastructure problem</title>
        <description>&lt;p&gt;A customer submits a data deletion request that propagates across backend services. One service only partially deletes the customer’s records due to a bug. Another misses the request because it was down for maintenance, and several analytics datasets were never integrated into the deletion process.&lt;/p&gt;

&lt;p&gt;Months later, the customer continues receiving highly personalized marketing emails from the company. She submits a data access request and discovers that the company still has her name, address, phone number, purchase history, and detailed behavioral data from years of interactions with their services.&lt;/p&gt;

&lt;p&gt;What began as a routine privacy request has now become a regulatory problem. A complaint to a data protection authority leads to an investigation, revealing an inability to demonstrate consistent deletion across the company’s systems. The company now faces corrective action and potential penalties.&lt;/p&gt;

&lt;p&gt;Many of these failures were invisible to the company until the compliance team had to prove that deletion had actually occurred everywhere.&lt;/p&gt;

&lt;h2 id=&quot;a-typical-deletion-request-path&quot;&gt;A typical deletion request path&lt;/h2&gt;

&lt;p&gt;A single request initiated from a customer-facing application may propagate across multiple services, storage systems, analytical platforms, and downstream consumers. Each system may implement deletion differently, operate on different schedules, or fail independently.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20260530_TypicalDataFlowBehindDeletionRequestExcalidraw.svg&quot; alt=&quot;Diagram illustrating data flow behind a deletion request&quot; /&gt;&lt;/p&gt;

&lt;p&gt;No single team owns the entire lifecycle. Application teams own services, platform teams operate infrastructure, analytics teams maintain data pipelines and data warehouses, and additional systems produce or consume derived datasets.&lt;/p&gt;

&lt;p&gt;The problem becomes significantly harder at enterprise scale. Large organizations may operate hundreds of independently managed services and thousands of datasets while continuously adding new data flows and third-party integrations.&lt;/p&gt;

&lt;p&gt;Over time, data lineage becomes more difficult to track, inventories become harder to maintain, and deletion workflows diverge across teams. What begins as a straightforward compliance requirement gradually becomes a coordination problem spanning large portions of an organization’s infrastructure.&lt;/p&gt;

&lt;h2 id=&quot;the-hidden-complexity-behind-a-simple-deletion-request&quot;&gt;The hidden complexity behind a “simple” deletion request&lt;/h2&gt;
&lt;p&gt;From a user’s perspective, deleting personal data sounds simple.&lt;/p&gt;

&lt;p&gt;Inside a modern organization, the same request can require teams to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Locate all systems containing relevant customer data&lt;/li&gt;
  &lt;li&gt;Resolve identifiers across different schemas and services&lt;/li&gt;
  &lt;li&gt;Execute system-specific deletion logic&lt;/li&gt;
  &lt;li&gt;Verify completion across distributed components&lt;/li&gt;
  &lt;li&gt;Handle retries and partial failures&lt;/li&gt;
  &lt;li&gt;Produce audit evidence&lt;/li&gt;
  &lt;li&gt;Prevent downstream recreation of deleted data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Customer data rarely exists in a single form. The same individual may appear as a primary key in a transactional database, an attribute in an analytics platform, or within large object-store datasets shared across many users. Deletion often requires system-specific logic tailored to how each platform stores and processes data.&lt;/p&gt;

&lt;p&gt;The core difficulty is coordinating consistent execution across systems with different deletion semantics, retention policies, operational constraints, and failure modes.&lt;/p&gt;

&lt;h2 id=&quot;the-execution-gap-in-privacy-tooling&quot;&gt;The execution gap in privacy tooling&lt;/h2&gt;
&lt;p&gt;The privacy technology ecosystem has matured significantly. Organizations now have access to tools for privacy request management, governance, and data discovery.&lt;/p&gt;

&lt;p&gt;These tools address important parts of the problem, but they do not solve execution.&lt;/p&gt;

&lt;p&gt;Governance platforms help define what should happen. Cloud-native capabilities help execute deletion within individual systems. Between those layers is the challenge of executing and validating deletion across production systems.&lt;/p&gt;

&lt;p&gt;Organizations often fill this gap with scripts, service-specific workflows, and operational procedures that become increasingly difficult to maintain, validate, and audit as independently managed services proliferate and coordination requirements compound across teams.&lt;/p&gt;

&lt;h2 id=&quot;why-right-to-erasure-is-uniquely-challenging&quot;&gt;Why right-to-erasure is uniquely challenging&lt;/h2&gt;
&lt;p&gt;Among privacy requirements, right-to-erasure (“the right to be forgotten”) places some of the greatest demands on an organization’s architecture.&lt;/p&gt;

&lt;p&gt;Unlike many privacy obligations, deletion cannot be satisfied through policy alone. Organizations must coordinate actions across all systems where customer data resides and demonstrate that those actions completed as intended.&lt;/p&gt;

&lt;p&gt;This exposes architectural inconsistencies that might otherwise be missed:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Systems that interpret deletion differently&lt;/li&gt;
  &lt;li&gt;Incomplete or unverifiable audit trails&lt;/li&gt;
  &lt;li&gt;Service-specific workflows that are difficult to maintain&lt;/li&gt;
  &lt;li&gt;Fragile operational dependencies&lt;/li&gt;
  &lt;li&gt;Manual processes that don’t scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right-to-erasure is therefore more than a privacy requirement. It is a stress test of an organization’s ability to manage data consistently across its infrastructure.&lt;/p&gt;

&lt;h2 id=&quot;privacy-as-an-architectural-capability&quot;&gt;Privacy as an architectural capability&lt;/h2&gt;
&lt;p&gt;The industry has largely approached privacy as a governance problem.&lt;/p&gt;

&lt;p&gt;However, governance alone does not execute deletion, validate outcomes, recover from failures, or provide consistent guarantees across distributed systems.&lt;/p&gt;

&lt;p&gt;Privacy obligations increasingly depend on infrastructure capabilities that most organizations were not built with.&lt;/p&gt;

&lt;p&gt;That means treating privacy as a first-class architectural concern rather than a collection of scripts and workflows. Organizations need a way to define, execute, and verify deletion consistently across systems.&lt;/p&gt;

&lt;h2 id=&quot;what-comes-next&quot;&gt;What comes next&lt;/h2&gt;
&lt;p&gt;If deletion is fundamentally an infrastructure problem, the next step is understanding how and why it fails in practice at scale.&lt;/p&gt;

&lt;p&gt;In upcoming posts, we’ll explore common failure modes in deletion programs, including inconsistent deletion semantics, coverage gaps as architectures evolve, rehydration of previously deleted data, and gaps in verification and auditability.&lt;/p&gt;

&lt;p&gt;After that, we’ll examine the structural properties of deletion programs that allow organizations to consistently orchestrate, execute, and verify deletion across distributed systems.&lt;/p&gt;
</description>
        <pubDate>Sun, 31 May 2026 04:30:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/why-data-deletion-is-still-unsolved/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/why-data-deletion-is-still-unsolved/</guid>
        
        
        <category>privacy-engineering</category>
        
        <category>right-to-be-forgotten</category>
        
      </item>
    
      <item>
        <title>Polymorphic JSON deserialization with Java sealed interfaces and Jackson</title>
        <description>&lt;p&gt;When building a JSON-based API that accepts multiple request types, you need a strategy for mapping incoming payloads to the appropriate data model.  Java sealed interfaces combined with Jackson’s polymorphic type annotations provide a clean way to support multiple strongly typed backend data models.&lt;/p&gt;

&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Consider an API that receives data processing requests.  Each request provides a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;requestType&lt;/code&gt; field that determines which fields are relevant.&lt;/p&gt;

&lt;p&gt;All requests share the same lifecycle and processing steps, but each request type corresponds to a different data model with different required and optional fields.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Account deletion: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{&quot;requestId&quot;: &quot;request-1234&quot;, &quot;requestType&quot;: &quot;deleteAccount&quot;, &quot;accountId&quot;: &quot;1234&quot;}&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Scoped data deletion: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{&quot;requestId&quot;: &quot;request-1234&quot;, &quot;requestType&quot;: &quot;deleteScopedData&quot;, &quot;accountId&quot;: &quot;1234&quot;, &quot;scope&quot;: {&quot;deletePurchaseHistory&quot;: true, &quot;deleteSubscriptions&quot;: false, &quot;deletePreferences&quot;: false}}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;approach-sealed-interface--jackson-annotations&quot;&gt;Approach: Sealed interface + Jackson annotations&lt;/h2&gt;

&lt;h3 id=&quot;step-1-define-sealed-interface-with-annotations-mapping-request-types-to-implementations&quot;&gt;Step 1: Define sealed interface with annotations mapping request types to implementations&lt;/h3&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.annotation.JsonSubTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.annotation.JsonTypeInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;cm&quot;&gt;/**
 * Polymorphic deletion request interface that routes to request-type-specific data models.
 */&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@JsonTypeInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;JsonTypeInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;Id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;property&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;requestType&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@JsonSubTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;({&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@JsonSubTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;Type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AccountDeletionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;deleteAccount&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@JsonSubTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;Type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ScopedDataDeletionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;deleteScopedData&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sealed&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DeletionRequest&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;permits&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AccountDeletionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ScopedDataDeletionRequest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Define common method declarations and static methods here.&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Java interfaces allow us to keep core business logic independent of child implementation details, making the system easier to test and extend.  Sealed interfaces, available in Java 17+, restrict which classes are allowed to implement the interface via a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;permits&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;When combined with Jackson &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@JsonTypeInfo&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@JsonSubTypes&lt;/code&gt; annotations which wire the JSON discriminator field (here, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;requestType&lt;/code&gt;) to the appropriate implementation, this enables a polymorphic contract with automatic, compile-time-checked JSON routing to strongly typed implementations.&lt;/p&gt;

&lt;h3 id=&quot;step-2-define-concrete-implementations-as-classes-or-records&quot;&gt;Step 2: Define concrete implementations as classes or records&lt;/h3&gt;

&lt;p&gt;In the following implementations of DeletionRequest, we set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@JsonIgnoreProperties(ignoreUnknown = true)&lt;/code&gt; to ignore unmentioned attributes when deserializing JSON strings matching known request types.&lt;/p&gt;

&lt;p&gt;This allows us to maintain strong typing for attributes relevant to this service while being agnostic of other upstream attributes our service doesn’t use.&lt;/p&gt;

&lt;p&gt;You can drop &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@JsonIgnoreProperties&lt;/code&gt; if you prefer to enforce the complete possible schema of input JSON strings, which will throw a runtime exception when receiving any new attribute you haven’t defined.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.annotation.JsonIgnoreProperties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;cm&quot;&gt;/**
 * Data model for account deletion requests.
 */&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@JsonIgnoreProperties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ignoreUnknown&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AccountDeletionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accountId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requestId&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DeletionRequest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;cm&quot;&gt;/**
 * Data model for scoped data deletion requests
 */&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@JsonIgnoreProperties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ignoreUnknown&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;ScopedDataDeletionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accountId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requestId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;DeletionScope&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;scope&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DeletionRequest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;cm&quot;&gt;/**
 * Data model for data deletion scope.
 *
 * Used when user requests deleting specific categories of data.
 */&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DeletionScope&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;deletePreferences&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;deletePurchaseHistory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;deleteSubscriptions&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;step-3-deserialize-input-json-strings-to-strongly-typed-data-models&quot;&gt;Step 3: Deserialize input JSON strings to strongly typed data models&lt;/h3&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.core.JsonProcessingException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.databind.ObjectMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.databind.exc.InvalidTypeIdException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Note: Prefer to instantiate ObjectMapper once as static final field or singleton and reuse across classes.&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ObjectMapper is thread-safe, and it&apos;s expensive to create new instances.&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ObjectMapper&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mapper&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ObjectMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DeletionRequest&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;deletionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;deletionRequest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;readValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputJson&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DeletionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;InvalidTypeIdException&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Replace BadRequestException in catch statements with locally relevant class for 400 Bad Request exceptions&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;throw&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;BadRequestException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Unsupported request type&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;JsonProcessingException&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;throw&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;BadRequestException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Invalid request payload&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With Java 21+, the compiler enforces that every permitted subtype in switch statements on sealed interface instances are handled, removing a class of potential runtime errors.&lt;/p&gt;

&lt;h2 id=&quot;when-to-use-a-polymorphic-api&quot;&gt;When to use a polymorphic API&lt;/h2&gt;

&lt;p&gt;A single endpoint accepting multiple JSON schemas can make sense when all request types share the same lifecycle and infrastructure (authorization, queues, auditing) and only diverge in execution logic.&lt;/p&gt;

&lt;p&gt;Prefer separate endpoints when request types have different authorization rules, rate limits, or lifecycle rules, or when you need clean OpenAPI/Swagger/Smithy documentation, which don’t work well with polymorphic schemas.&lt;/p&gt;

&lt;p&gt;Polymorphic APIs are more difficult for clients to consume and understand compared to stand-alone APIs, so they are a better fit for internal APIs between services your team controls.&lt;/p&gt;

&lt;h2 id=&quot;when-to-use-sealed-interfaces-to-implement-polymorphic-apis&quot;&gt;When to use sealed interfaces to implement polymorphic APIs&lt;/h2&gt;

&lt;p&gt;Sealed interfaces and Jackson polymorphic type annotations are a good fit for implementing polymorphic schemas when all the following are true:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Your JSON payloads share a discriminator field but have different schemas per type.&lt;/li&gt;
  &lt;li&gt;You want a closed set of permitted subtypes.&lt;/li&gt;
  &lt;li&gt;You want subtypes to be strongly typed, each potentially requiring its own validation logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are a poor fit when either of the following is true, in which case abstract classes may be a better replacement for sealed interfaces:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Other teams need to be able to add subtypes without modifying your code (sealed interface subtypes must be defined in the same module).&lt;/li&gt;
  &lt;li&gt;You cannot use Java 17+, the first Long Term Support version supporting sealed interfaces.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pros-and-cons-vs-common-alternatives&quot;&gt;Pros and cons vs common alternatives&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Strategy&lt;/th&gt;
      &lt;th&gt;Pros&lt;/th&gt;
      &lt;th&gt;Cons&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;b&gt;Sealed interface + Jackson polymorphism&lt;/b&gt;&lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Supports strongly typed subtypes with different required and allowed fields&lt;/li&gt;
          &lt;li&gt;Easily add subtypes with minimal code changes&lt;/li&gt;
          &lt;li&gt;Compile-time rejection of unsupported subtypes&lt;/li&gt;
          &lt;li&gt;Clean pattern matching without casting&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Requires all subtypes to be defined in same module&lt;/li&gt;
          &lt;li&gt;Requires Java 17+&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;b&gt;Single class with nullable fields&lt;/b&gt;&lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Simple&lt;/li&gt;
          &lt;li&gt;Works pre-Java 17&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Runtime null checks everywhere&lt;/li&gt;
          &lt;li&gt;No compile-time safety&lt;/li&gt;
          &lt;li&gt;Unclear which fields apply to each type&lt;/li&gt;
          &lt;li&gt;More messy to enforce validations on subtypes&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;b&gt;Inheritance with abstract class&lt;/b&gt;&lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Familiar OOP pattern&lt;/li&gt;
          &lt;li&gt;Subclasses can live anywhere&lt;/li&gt;
          &lt;li&gt;Works pre-Java 17&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Anyone can subclass, no compile-time restriction on subtypes&lt;/li&gt;
          &lt;li&gt;Compiler does not enforce exhaustive handling of subtypes&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;b&gt;&lt;code&gt;@JsonAnySetter&lt;/code&gt; with &lt;code&gt;Map&amp;lt;String, Object&amp;gt;&lt;/code&gt;&lt;/b&gt;&lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Maximum flexibility&lt;/li&gt;
          &lt;li&gt;Allows for unknown schemas&lt;/li&gt;
          &lt;li&gt;Works pre-Java 17&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Zero type safety, accepts all inputs&lt;/li&gt;
          &lt;li&gt;Validation is manual and error-prone&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;b&gt;Enum and factory method&lt;/b&gt;&lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;Explicit type registry (as with sealed interface)&lt;/li&gt;
          &lt;li&gt;More flexible, full control over deserialization&lt;/li&gt;
          &lt;li&gt;Works pre-Java 17&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
      &lt;td&gt;
        &lt;ul&gt;
          &lt;li&gt;More boilerplate code than sealed interface + annotations&lt;/li&gt;
          &lt;li&gt;No compile-time validation of correct/complete routing&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;key-takeaways&quot;&gt;Key takeaways&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sealed&lt;/code&gt; provides a closed type hierarchy so the compiler knows every possible implementation.&lt;/li&gt;
  &lt;li&gt;Jackson’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@JsonTypeInfo&lt;/code&gt; simplifies routing deserialization to the appropriate type based on a discriminator field.&lt;/li&gt;
  &lt;li&gt;Java records minimize boilerplate for immutable, validated data models.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In practice, this pattern has reduced the effort to support new data processing types from roughly a week per type to a few hours.  The sealed interface and Jackson annotations handle the routing, leaving only type-specific models and logic to implement.&lt;/p&gt;
</description>
        <pubDate>Sat, 16 May 2026 04:30:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/polymorphic-json-deserialization-java-sealed-interfaces-jackson/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/polymorphic-json-deserialization-java-sealed-interfaces-jackson/</guid>
        
        
        <category>programming-languages</category>
        
      </item>
    
      <item>
        <title>Enabling ad hoc runs of queue-based ECS services</title>
        <description>&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;A common ECS use case is to continually poll a queue for new events to process, with auto-scaling based on CPU or memory utilization, or the size of the queue backlog.  These services can be tricky to build reliable integration tests for, as queue backlogs may result in long delays before a new message is processed.&lt;/p&gt;

&lt;p&gt;A simple way to enable efficient integration tests for these services is to support an optional environment variable that signals the service to immediately process the message without long-polling the queue, as if it were the content of a single SQS message body.&lt;/p&gt;

&lt;h2 id=&quot;example-code&quot;&gt;Example code&lt;/h2&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jsonRequest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;System&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getenv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;ADHOC_REQUEST&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jsonRequest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jsonRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;isEmpty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// Execute the single JSON request&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;executeRequestFromJson&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jsonRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// Continually poll a queue to process pending requests&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;longPollQueue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;executeRequestFromJson&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jsonRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;MyParsedRequest&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;request&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parseRequestFromJson&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jsonRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;executeRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;longPollQueue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;MyQueueMessage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;message&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getMessageFromQueue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;message&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;isPresent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;nc&quot;&gt;MyParsedRequest&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;request&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parseRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;message&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;executeRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;acknowledgeSqsMessage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;message&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;example-execution-via-aws-cli&quot;&gt;Example execution via AWS CLI&lt;/h2&gt;

&lt;p&gt;After setting variables for the AWS region and ECS cluster, service, and task definition ARNs, you can run a variation of the following to run a new ECS task immediately, without waiting for a new message to be processed in a queue that may have a large backlog of pending messages.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;YOUR_ECS_CONTAINER_NAME&lt;/code&gt; should be replaced by the name field value in the task definition’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerDefinitions&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The ECS container overrides allow you to provide specific environment variable overrides while maintaining other existing environment variables from the ECS task definition.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;networkConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;aws ecs describe-services &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--region&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$awsRegion&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--cluster&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$ecsClusterArn&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--services&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$ecsServiceArn&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--query&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;services[0].networkConfiguration&apos;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--output&lt;/span&gt; json&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
aws ecs run-task &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--region&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$awsRegion&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--cluster&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$ecsClusterArn&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--task-definition&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$ecsTaskDefinitionArn&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--launch-type&lt;/span&gt; FARGATE &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--network-configuration&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$networkConfig&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;--overrides&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{
        &quot;containerOverrides&quot;: [{
            &quot;name&quot;: &quot;YOUR_ECS_CONTAINER_NAME&quot;,
            &quot;environment&quot;: [{
                &quot;name&quot;: &quot;ADHOC_REQUEST&quot;,
                &quot;value&quot;: &quot;{\&quot;yourRequestKey\&quot;:\&quot;yourRequestValue\&quot;}&quot;
            }]
        }]
    }&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Depending on the integ test environment, you can also use the &lt;a href=&quot;https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html&quot;&gt;ECS RunTask API&lt;/a&gt; in a similar way as the above CLI command.&lt;/p&gt;

&lt;p&gt;After implementing this pattern in your service code, your integration tests can run ad hoc ECS tasks with specific test requests and validate the expected output - for example, periodically calling another API to validate a record has been successfully created or updated after a wait time, with a maximum number of retries.&lt;/p&gt;

&lt;p&gt;Since the task processes only the injected request, tests can use a much shorter max duration than they would need if waiting for a large queue backlog to drain.&lt;/p&gt;
</description>
        <pubDate>Sat, 04 Apr 2026 00:00:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/enabling-adhoc-ecs-task/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/enabling-adhoc-ecs-task/</guid>
        
        
        <category>aws</category>
        
      </item>
    
      <item>
        <title>Strategies for querying periodic S3 data snapshots</title>
        <description>&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;A common AWS analytics use case is making aggregate queries across multiple data sets stored in S3.  For example, one partner team may store product metadata, while another team stores purchase order metadata, and we may want to join these data sets to determine which products are most popular across each marketplace.&lt;/p&gt;

&lt;p&gt;In this post I’ll cover a few options for syncing data to S3 and retrieving data snapshots for use in aggregate queries, and specifically discuss the use case where we need to maintain access patterns to complete, recent data snapshots without partial unavailability during data syncs.&lt;/p&gt;

&lt;h2 id=&quot;a-few-options-for-syncing-data-to-s3&quot;&gt;A few options for syncing data to S3&lt;/h2&gt;

&lt;h3 id=&quot;1-syncing-each-record-to-a-unique-stable-file-path&quot;&gt;1. Syncing each record to a unique, stable file path&lt;/h3&gt;
&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Provides a stable “latest” data set that always represents all records.&lt;/li&gt;
  &lt;li&gt;Enables O(1) look-ups of specific records if data consumers query by S3 key (“filepath”).&lt;/li&gt;
  &lt;li&gt;Low storage costs since only a single copy of each record is stored.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Poor performance and higher costs for aggregate queries at scale. Retrieving tens of thousands of small files is much slower than retrieving a few multi-MB files.&lt;/li&gt;
  &lt;li&gt;Limits design choices for data providers and may increase complexity of their implementation (eg. AWS Glue defaults to writing aggregated partial result files).&lt;/li&gt;
  &lt;li&gt;No historical data is retained unless explicitly backed up elsewhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is best suited to when data consumers retrieve specific records, only need their latest state, and don’t make aggregated queries across a large number of records.&lt;/p&gt;

&lt;h3 id=&quot;2-appending-all-events-as-new-data-without-overwriting-prior-events&quot;&gt;2. Appending all events as new data without overwriting prior events&lt;/h3&gt;
&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Provides a complete history of events, allowing for time-series analysis.&lt;/li&gt;
  &lt;li&gt;If partition data by time, supports efficient queries of specific time periods.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Increase storage costs as accumulate historical data.&lt;/li&gt;
  &lt;li&gt;Increased complexity, latency, and cost to retrieve the latest version of each record.&lt;/li&gt;
  &lt;li&gt;If data isn’t partitioned in a way that aligns with query use cases, may have very inefficient scans.&lt;/li&gt;
  &lt;li&gt;If upstream workflows fail, may not have a way to recover data, and have missing records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is best suited to queries of time-partitioned events, rather than needing the latest state of records.&lt;/p&gt;

&lt;h3 id=&quot;3-overwriting-one-or-more-files-that-include-multiple-records&quot;&gt;3. Overwriting one or more files that include multiple records&lt;/h3&gt;
&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Consumers always access the most up-to-date version of the dataset.&lt;/li&gt;
  &lt;li&gt;More efficient aggregate queries than retrieving thousands of single-record files.&lt;/li&gt;
  &lt;li&gt;Low storage costs since only a single copy of each record is stored.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Risk of data loss if a failure occurs during the overwrite process.&lt;/li&gt;
  &lt;li&gt;Data consumers querying during a data sync may receive incomplete or duplicate data.&lt;/li&gt;
  &lt;li&gt;No historical data is retained unless explicitly backed up elsewhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This can work for aggregated queries of the latest records, but if we can’t accept the risk of corrupt/partial/duplicate data during sync failures or read/write race conditions, we may prefer writing to separate snapshot partitions.&lt;/p&gt;

&lt;h3 id=&quot;4-periodically-writing-complete-data-to-time-based-partitions&quot;&gt;4. Periodically writing complete data to time-based partitions&lt;/h3&gt;
&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Consumers always access a complete dataset when querying a completed/past partition.&lt;/li&gt;
  &lt;li&gt;More efficient aggregate queries than retrieving thousands of single-record files.&lt;/li&gt;
  &lt;li&gt;Maintain historical data for as long as prior time partitions are kept in storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;More complex to identify the latest complete partition.&lt;/li&gt;
  &lt;li&gt;Data consumers querying the most recent partition during a data sync may receive incomplete data.  Need some type of completeness signal to mitigate.&lt;/li&gt;
  &lt;li&gt;Increase storage costs as accumulate historical data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can mitigate storage costs by setting a lifecycle rule to automatically delete S3 files past a certain age, eg. auto-delete files over 2 weeks old, if we only need to query recent snapshots.&lt;/p&gt;

&lt;h2 id=&quot;case-study-aggregate-queries-where-completeness-is-important&quot;&gt;Case study: Aggregate queries where completeness is important&lt;/h2&gt;
&lt;p&gt;For this post, we’ll consider the case where we want to optimize for aggregated queries against hundreds of thousands of records, with filter criteria applied across all records.  Our business requirements are that data consumers must always have access to complete and accurate data (no duplicates or missing records), but data does not need to be real-time as long as it’s up to date within a few hours.&lt;/p&gt;

&lt;p&gt;In this case, single-record files and time-series events are not a good fit due to increased latency to query latest state across this number of records.  We may prefer time-based partitions with complete data written to a new partition every N hours, eg. hourly, to mitigate partial/duplicate data issues during concurrent read/write operations.&lt;/p&gt;

&lt;h2 id=&quot;a-few-options-for-querying-time-partitioned-snapshots&quot;&gt;A few options for querying time-partitioned snapshots&lt;/h2&gt;

&lt;p&gt;A common challenge for time-partitioned partitions is identifying the latest complete partition, especially if upstream data syncs are not 100% successful.&lt;/p&gt;

&lt;h3 id=&quot;1-retrieve-a-specific-snapshot-relative-to-the-current-time&quot;&gt;1. Retrieve a specific snapshot relative to the current time&lt;/h3&gt;

&lt;p&gt;Assuming we have hourly snapshots that are partitioned by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snapshot_date&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snapshot_hour&lt;/code&gt;, where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snapshot_date&lt;/code&gt; is formatted as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;2025-01-31&quot;&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snapshot_hour&lt;/code&gt; is formatted as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;01&quot;&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;23&quot;&lt;/code&gt; to support consistent string comparisons, the following Athena SQL query retrieves data from the last hour’s time partition:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DATE_FORMAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;1&apos;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOUR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;%Y-%m-%d&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DATE_FORMAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;1&apos;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOUR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;%H&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Simple, easy to understand.&lt;/li&gt;
  &lt;li&gt;Very efficient since only querying data from a single partition, and it’s O(1) to locate that partition.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;May fail to get any data if there were upstream sync issues for the selected partition. &lt;strong&gt;This is a blocker for us.&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;May get partial data if query the most recent partition during an ongoing sync.&lt;/li&gt;
  &lt;li&gt;May need to query older and more out-of-date partitions to avoid the above race condition.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;2-retrieve-most-recent-snapshot&quot;&gt;2. Retrieve most recent snapshot&lt;/h3&gt;

&lt;p&gt;We can improve on the prior option by using an initial query to identify the most recent snapshot partition that has data.&lt;/p&gt;

&lt;p&gt;Since it could be expensive to search across all partitions, we can set a maximum look-back period, for example, the last 7 days, to allow for occasional upstream failures that may take a few days to resolve.&lt;/p&gt;

&lt;p&gt;Example Athena SQL query:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MostRecentSnapshotPartition&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DISTINCT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;-- Set a maximum look-back period to reduce query search space while allowing a few days of upstream failures&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;CURRENT_DATE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;7&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INNER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MostRecentSnapshotPartition&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MostRecentSnapshotPartition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MostRecentSnapshotPartition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Relatively simple and easy to understand.&lt;/li&gt;
  &lt;li&gt;Guaranteed to get most recent snapshot with data within the given look-back period.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;May get partial data if query the most recent partition during an ongoing sync.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;3-retrieve-the-second-most-recent-snapshot&quot;&gt;3. Retrieve the second most recent snapshot&lt;/h3&gt;

&lt;p&gt;If we need to mitigate race conditions where the latest partition may have partial data, we could always query for the second most recent partition.&lt;/p&gt;

&lt;p&gt;We can maintain the same maximum look-back period to limit the query search space for performance reasons, while accounting for a few days of upstream failures.&lt;/p&gt;

&lt;p&gt;Example Athena SQL query:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RecentSnapshotPartitions&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DISTINCT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;-- Set a maximum look-back period to reduce query search space while allowing a few days of upstream failures&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;CURRENT_DATE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;7&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;-- Get the two most recent partitions with data&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;RankedSnapshotPartitions&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;-- Set a row number so we can select the second most recent partition&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ROW_NUMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVER&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;recency_rank&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RecentSnapshotPartitions&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INNER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RankedSnapshotPartitions&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RankedSnapshotPartitions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RankedSnapshotPartitions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RankedSnapshotPartitions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;recency_rank&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;If the second most recent partition is always complete, this guarantees complete data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;More complex and difficult to understand than other options.&lt;/li&gt;
  &lt;li&gt;Less efficient than other options.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;4-retrieve-the-most-recent-snapshot-older-than-the-period-between-data-syncs&quot;&gt;4. Retrieve the most recent snapshot older than the period between data syncs&lt;/h3&gt;

&lt;p&gt;If we need to account for temporary upstream failures and cannot accept race conditions where the most recent partition may sometimes be complete, we can simplify from Option 1 by querying for the most recent partition older than the time between syncs, eg. the most recent partition more than 1 hour old.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RecentCompleteSnapshotPartition&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DISTINCT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;-- Get snapshots from more than 1 hour ago, to avoid querying a partition with an ongoing sync if have race conditions near the start of an hour&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DATE_FORMAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;1&apos;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOUR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;%Y-%m-%d&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DATE_FORMAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;1&apos;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOUR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;%H&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;-- Set a maximum look-back period to reduce query search space while allowing a few days of upstream failures&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;CURRENT_DATE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;7&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INNER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RecentCompleteSnapshotPartition&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RecentCompleteSnapshotPartition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RecentCompleteSnapshotPartition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;If populated partitions that are older than the period between syncs are always complete, guarantees complete data.&lt;/li&gt;
  &lt;li&gt;Simpler and more efficient than Option 3, while covering the flaws of Options 1-2.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Slightly more complex query than Options 1-2.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;5-send-data-completeness-events-to-trigger-downstream-queries&quot;&gt;5. Send data completeness events to trigger downstream queries&lt;/h3&gt;

&lt;p&gt;If we can request that our data provider pushes a data completeness signal file such as an empty file named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_SUCCESS&lt;/code&gt; to the latest S3 directory after completing a data sync, we can set up S3 PutObject notifications that filter for this specific filename and trigger downstream workflows whenever this signal is received.&lt;/p&gt;

&lt;p&gt;This is ideal for event-based workflows that only need to listen for a single trigger, while it may not be sufficient if there are multiple upstream datasets that we need to query that have different schedules and may not all be complete at the same time.&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Guarantees completeness of the given data set at the time we receive the event.&lt;/li&gt;
  &lt;li&gt;Allows downstream workflows to run with the most recent data possible, as soon as data is received.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;More complex to support multiple upstream data sets, best for single-dependency workflows.&lt;/li&gt;
  &lt;li&gt;May not work with workflows that are strictly schedule-based and cannot be easily triggered by an event.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;6-leverage-aws-glue-catalog-trigger-glue-crawler-from-glue-job-event&quot;&gt;6. Leverage AWS Glue Catalog, trigger Glue Crawler from Glue job event&lt;/h3&gt;

&lt;p&gt;If we populate our S3 bucket with an AWS Glue job, we can set up a Glue Catalog and a Glue Crawler that updates the catalog with an abstracted representation of the source data and its available partitions for downstream services such as AWS Athena to query.&lt;/p&gt;

&lt;p&gt;If we set up the Glue Catalog Table as the data source for our downstream queries, we will only query partitions that it has crawled.&lt;/p&gt;

&lt;p&gt;We can set up a &lt;a href=&quot;https://docs.aws.amazon.com/glue/latest/dg/about-triggers.html&quot;&gt;Glue trigger&lt;/a&gt; to automatically run the Crawler after the upstream Glue job that populates new S3 partitions has succeeded, or after &lt;a href=&quot;https://docs.aws.amazon.com/glue/latest/dg/starting-workflow-eventbridge.html&quot;&gt;some event is received via EventBridge&lt;/a&gt;, such as creation of a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_SUCCESS&lt;/code&gt; file, to automatically make new partitions available only after their sync jobs have completed.&lt;/p&gt;

&lt;p&gt;We could then use the Athena SQL query from Option 2 but with the Glue Table as our source, to simplify querying the most recent complete partition.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MostRecentSnapshotPartition&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DISTINCT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;-- Set a maximum look-back period to reduce query search space while allowing a few days of upstream failures&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;CURRENT_DATE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;7&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INNER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MostRecentSnapshotPartition&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MostRecentSnapshotPartition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_date&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MostRecentSnapshotPartition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;snapshot_hour&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Abstracts logic for how to identify the most recent partition from data consumers.&lt;/li&gt;
  &lt;li&gt;Enables always querying the most recent complete partition.&lt;/li&gt;
  &lt;li&gt;Avoids the incomplete/duplicate data issues from Options 1-2 and the query complexity from Options 3-4.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Requires additional infrastructure set-up and hardware costs for Glue resources and event-based triggers.&lt;/li&gt;
  &lt;li&gt;May add several minutes of data latency from events -&amp;gt; Glue trigger -&amp;gt; Glue crawler -&amp;gt; Glue catalog update, compared to directly querying S3 for the most recent partition more than N hours old.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;7-leverage-aws-glue-catalog-use-aws-lambda-to-update-a-latest-table-after-completeness-events&quot;&gt;7. Leverage AWS Glue Catalog, use AWS Lambda to update a “latest” table after completeness events&lt;/h3&gt;

&lt;p&gt;If we have multiple upstream data sets which rules out Option 5, and we’re willing to invest a few developer weeks to optimize and simplify queries for data consumers, we could create a custom Lambda function that programmatically updates Glue resources to point to a “latest” S3 partition whenever a data sync completes.&lt;/p&gt;

&lt;p&gt;We could require data providers to write an empty &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_SUCCESS&lt;/code&gt; file to the new partition after a successful sync.  We can then set up a S3 PutObject notification that filters for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_SUCCESS&lt;/code&gt; files, and make this a trigger to our Lambda function.&lt;/p&gt;

&lt;p&gt;The Lambda function will then:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Query the S3 directory for files in the new partition and scan them to infer their partitions and table schema.&lt;/li&gt;
  &lt;li&gt;Query Glue’s &lt;a href=&quot;https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html&quot;&gt;UpdateTable API&lt;/a&gt; to update the Glue Catalog Table that represents the latest version of the snapshot-based data set, to have the latest data schema and point to the new partition.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Data consumers can then simplify their queries to the following, without needing to select snapshot partition attributes:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upstream_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Pros:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Significantly simplifies queries for data consumers.&lt;/li&gt;
  &lt;li&gt;Enables always querying the most recent complete partition.&lt;/li&gt;
  &lt;li&gt;Avoids the incomplete/duplicate data issues from Options 1-2 and the query complexity from Options 3-6.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Adds multiple weeks of developer effort to replicate AWS Glue’s functionality for inferring schemas and updating Glue Catalog Tables.&lt;/li&gt;
  &lt;li&gt;Increases backend architecture and code complexity, with more points of failures that need to be maintained and supported.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this works, the end result may be great from the data consumer side, but in many cases this developer time is more valuable spent on other problems, and we’re willing to accept either slightly more complex but well-documented queries for data consumers, or occasional race conditions where recent snapshots may be incomplete if we happen to be reading at the same time that a data sync is occurring.&lt;/p&gt;

&lt;h2 id=&quot;case-study-retrieving-recent-data-when-completeness-is-critical-and-developer-time-is-limited&quot;&gt;Case study: Retrieving recent data when completeness is critical and developer time is limited&lt;/h2&gt;

&lt;p&gt;Take the case where we are limited to a few days of developer time to set up associated infrastructure, while we have the business requirements that:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Data consumers must always be able to query complete and accurate data that is no more than a few hours old.  Missing or duplicate data due to read/write race conditions is not acceptable.&lt;/li&gt;
  &lt;li&gt;Data consumers must be able to aggregate data from multiple upstream sources while satisfying the above conditions for completeness without partial data if querying during data syncs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this scenario, Option 6 may be an acceptable tradeoff, where we leverage existing AWS Glue functionality to automatically surface new partitions and schemas after receiving a data sync completion signal, and point data consumers to the Glue Catalog Table to query partitions that have completed syncs.&lt;/p&gt;

&lt;p&gt;Data consumers will then need to be aware of snapshot partition attributes to select data from the most recent partition, but they can simply query for the most recent partition since we will only surface partitions after receiving a data sync completion signal.&lt;/p&gt;
</description>
        <pubDate>Sat, 31 May 2025 00:00:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/strategies-for-querying-snapshot-s3-data/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/strategies-for-querying-snapshot-s3-data/</guid>
        
        
        <category>aws</category>
        
      </item>
    
      <item>
        <title>Granting AWS Organization member accounts access to Cost Explorer</title>
        <description>&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;By default, adding accounts to an AWS Organization results in consolidated billing and cost management in the Organization management account, and Organization member accounts lose access to Cost Explorer, Billing, and other cost management services unless access is explicitly enabled from the Organization management account.&lt;/p&gt;

&lt;p&gt;This post walks through how to allow member accounts to access cost management services, to enable each team to review and manage their AWS spending.&lt;/p&gt;

&lt;h2 id=&quot;enabling-cost-explorer-and-cost-optimization-hub&quot;&gt;Enabling Cost Explorer and Cost Optimization Hub&lt;/h2&gt;

&lt;p&gt;Cost Explorer allows you to analyze AWS service costs and usage changes over time, and is free to access through the AWS UI.  You can filter and group costs and usage across multiple dimensions including AWS service, resource, usage type, region, and tag.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostExplorer.png&quot; alt=&quot;Cost Explorer Screenshot&quot; /&gt;&lt;/p&gt;

&lt;p&gt;After logging into the Organization management account using the root user, navigate to the Cost Explorer service to automatically enable it from the current time onwards.  You will need to do this once per account in your Organization, logging in as the root user for each account.&lt;/p&gt;

&lt;p&gt;Cost Optimization Hub is a free service that provides cost optimization recommendations across multiple services.  To enable member accounts access, we’ll need to enable the services from the AWS Organizations management account, and explicitly add permissions to the Permission Sets granting them access.&lt;/p&gt;

&lt;p&gt;To enable Cost Optimization Hub, navigate to the Cost Optimization Hub landing page, scroll down, select “Enable Cost Optimization Hub for this account and all member accounts”, and click “Enable”.&lt;/p&gt;

&lt;p&gt;As indicated by the prompt, to fully benefit from this service you’ll also need to opt into the free AWS Compute Optimizer service to import service rightsizing recommendations.  Follow the link to navigate to AWS Compute Optimizer, and click “Get started”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostOptimizationHub.png&quot; alt=&quot;Prompt to enable Compute Optimizer&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_ComputeOptimizerEnablePage.png&quot; alt=&quot;Compute Optimizer Enable Link&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_ComputeOptimizerEnablePage2.png&quot; alt=&quot;Compute Optimizer Enable Page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Select to opt in all member accounts, and click “Opt in”.&lt;/p&gt;

&lt;h2 id=&quot;enabling-access-for-member-accounts&quot;&gt;Enabling access for member accounts&lt;/h2&gt;

&lt;p&gt;On the top right hand of the AWS console, select your account name to open the account dropdown, and click “Account”, or directly navigate to &lt;a href=&quot;https://us-east-1.console.aws.amazon.com/billing/home?region=us-east-1#/account&quot;&gt;https://us-east-1.console.aws.amazon.com/billing/home?region=us-east-1#/account&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_AwsAccountDropdown.png&quot; alt=&quot;AWS account dropdown&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Scroll down to “IAM user and role access to Billing information” section and click “Edit”.  Select “Activate IAM Access” and click “Update”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_AccountSettings_EnableIamAccessToBilling.png&quot; alt=&quot;Enable IAM access to Billing&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Navigate to “Billing and Cost Management” &amp;gt; “Cost Management Preferences”.&lt;/p&gt;

&lt;p&gt;Under the “General” tab, enable the options below, then select “Save preferences” at the bottom of the page and confirm changes.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Linked account access&lt;/li&gt;
  &lt;li&gt;Linked account refunds and credits&lt;/li&gt;
  &lt;li&gt;Linked account discounts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostManagementPrefs_GeneralPrefs.png&quot; alt=&quot;Enable linked accounts access to cost management services&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Under the “Cost Explorer” tab, enable “Granular data” with “Resource-level data at daily granularity”, and select “All services”, then click “Save preferences” at the bottom of the page and confirm changes.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostManagementPrefs_CostExplorerPrefs.png&quot; alt=&quot;Enable Cost Explorer data at daily granularity&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Under the “Cost Optimization Hub” tab, under “Organization and member account settings”, select “Enable Cost Optimization Hub for all member accounts” and “Allow member account discount visibility”, then click “Save preferences” and confirm changes.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostManagementPrefs_CostOptimizationHubPrefs.png&quot; alt=&quot;Enable linked accounts access to Cost Optimization Hub&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;accessing-cost-explorer-from-member-accounts&quot;&gt;Accessing Cost Explorer from member accounts&lt;/h2&gt;

&lt;p&gt;At this point, team members logging into Organization member accounts to view Cost Explorer will still see access denied errors across all widgets, even if their IAM permissions grant access to all Cost Explorer actions.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostExplorerAccessDenied.png&quot; alt=&quot;Cost Explorer Access Denied&quot; /&gt;&lt;/p&gt;

&lt;p&gt;When clicking on one of the errors, we can see the following message:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;You don’t have permission to [Cost Explorer:]. To request access, copy the following text and send it to your AWS administrator. Learn more about troubleshooting access denied errors&lt;/p&gt;

  &lt;p&gt;User: [REDACTED_ACCOUNT_ID]&lt;/p&gt;

  &lt;p&gt;Service: [Cost Explorer]&lt;/p&gt;

  &lt;p&gt;Name: [AccessDeniedException]&lt;/p&gt;

  &lt;p&gt;HTTP status code: [400]&lt;/p&gt;

  &lt;p&gt;Context: [IAM user access not activated]&lt;/p&gt;

  &lt;p&gt;Request ID: [REDACTED_REQUEST_ID]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To resolve this error, we need to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Log into the Organization member account as the root user.&lt;/li&gt;
  &lt;li&gt;Visit AWS Cost Explorer once to enable this service if we haven’t done this before.&lt;/li&gt;
  &lt;li&gt;Navigate to Account settings via the top right hand account dropdown &amp;gt; “Account”.&lt;/li&gt;
  &lt;li&gt;Scroll down to “IAM user and role access to Billing information” section and click “Edit”.&lt;/li&gt;
  &lt;li&gt;Select “Activate IAM Access” and click “Update”.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_AccountSettings_EnableIamAccessToBilling_EditPage.png&quot; alt=&quot;Enable IAM access to billing&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Non-root users with billing permissions should now have access to Cost Explorer, while the page may initially show zero costs and only show correct spending for the current time going forward.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostExplorerZeroCosts.png&quot; alt=&quot;Cost Explorer page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;After waiting for a few days, I noticed that the default view still shows zero spending, while after updating the time period to the last 7 days and updating the granularity to daily, I see correct costs for the past few days up to the date when Cost Explorer was enabled from the Organization management account.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostExplorerLast7Days.png&quot; alt=&quot;Cost Explorer filtered to last 7 days&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We can now add any filters or groupings we like, such as grouping costs by usage type to see what specify AWS service usages are contributing to our daily spending.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostExplorerLast7DaysGroupByUsageType.png&quot; alt=&quot;Cost Explorer filtered to last 7 days, grouped by usage type&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;granting-cross-account-user-groups-access-to-cost-explorer&quot;&gt;Granting cross-account user groups access to Cost Explorer&lt;/h2&gt;

&lt;p&gt;We need to explicitly grant user groups access to Cost Explorer, otherwise they will be denied by default.&lt;/p&gt;

&lt;p&gt;Users with full read permissions across all AWS services will be able to view Cost Explorer after the earlier steps in this post, but we may want to allow additional user roles access as well.&lt;/p&gt;

&lt;p&gt;See my last post’s &lt;a href=&quot;/blog/aws-organizations/&quot;&gt;“Creating cross-account role-based permission groups”&lt;/a&gt; section for how to create a Permission Set and assign user groups these permissions on specific linked accounts.&lt;/p&gt;

&lt;p&gt;To manage your Permission Sets, navigate to “IAM Identity Center” &amp;gt; “Multi-account permissions” &amp;gt; “Permission sets”.&lt;/p&gt;

&lt;p&gt;Assuming you have created a Permission Set that does not have explicit permissions to Cost Explorer, and you want that group to have read access to all cost management services, select that Permission Set.&lt;/p&gt;

&lt;p&gt;In this case, I have a SupportUser Permission Set where I want my support team to be able to view cost and usage data for their assigned accounts.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_SupportUserPermissionSetWithoutBillingAccess.png&quot; alt=&quot;Enable linked accounts access to Cost Optimization Hub&quot; /&gt;&lt;/p&gt;

&lt;p&gt;While this Permission Set only has the SupportUser policy assigned, users logging in with this role will be denied access to all Cost Explorer widgets with the error:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;User: [REDACTED_ACCOUNT_ID]&lt;/p&gt;

  &lt;p&gt;Service: [Cost Explorer]&lt;/p&gt;

  &lt;p&gt;Name: [AccessDeniedException]&lt;/p&gt;

  &lt;p&gt;HTTP status code: [400]&lt;/p&gt;

  &lt;p&gt;Context: [User: arn:aws:sts::REDACTED_ACCOUNT_ID:assumed-role/REDACTED_ROLE_ID is not authorized to perform: ce:GetCostAndUsage on resource: arn:aws:ce:us-east-1:REDACTED_ACCOUNT_ID:/GetCostAndUsage because no identity-based policy allows the ce:GetCostAndUsage action]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Under the Permission Set’s “Permissions” &amp;gt; “Inline policy” section, select “Edit”.&lt;/p&gt;

&lt;p&gt;Enter the following IAM policy document, or the scoped permissions you want to grant:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Version&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;2012-10-17&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Statement&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Sid&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;CostManagementViewAccess&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Effect&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;Allow&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Action&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;account:GetAccountInformation&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;aws-portal:ViewBilling&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;billing:Get*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;billing:List*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;budgets:ViewBudget&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;budgets:Describe*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ce:Describe*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ce:Get*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ce:List*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;consolidatedbilling:Get*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;consolidatedbilling:List*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;cur:Describe*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;cur:Get*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;freetier:Get*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;sustainability:GetCarbonFootprintSummary&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Resource&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;*&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Scroll to the bottom of the page and save your changes.&lt;/p&gt;

&lt;p&gt;Validate that users assigned this Permission Set on a given account now have read access to Billing and Cost Management pages.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250510_CostExplorerAccessAsSupportUser.png&quot; alt=&quot;Support user now able to view Cost Explorer pages&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;posts-in-this-series&quot;&gt;Posts in this series&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/aws-organizations/&quot;&gt;Using AWS Organizations to standardize security controls across AWS accounts&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;(Current post) Granting AWS Organization member accounts access to Cost Explorer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;AWS Organizations User Guide: &lt;a href=&quot;https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html&quot;&gt;https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS Cost Explorer Access User Guide: &lt;a href=&quot;https://docs.aws.amazon.com/cost-management/latest/userguide/ce-access.html&quot;&gt;https://docs.aws.amazon.com/cost-management/latest/userguide/ce-access.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS IAM Identity Center Permission Sets User Guide: &lt;a href=&quot;https://docs.aws.amazon.com/singlesignon/latest/userguide/permissionsetsconcept.html&quot;&gt;https://docs.aws.amazon.com/singlesignon/latest/userguide/permissionsetsconcept.html&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Sat, 10 May 2025 00:00:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/aws-organization-members-cost-explorer-access/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/aws-organization-members-cost-explorer-access/</guid>
        
        
        <category>aws</category>
        
      </item>
    
      <item>
        <title>Using AWS Organizations to standardize security controls across AWS accounts</title>
        <description>&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;AWS Organizations provide a helpful way to centralize management of AWS accounts, with support for consolidating billing, role-based permission sets, service and resource control policies, and AWS service configurations.&lt;/p&gt;

&lt;p&gt;Service control policies (SCPs) can be used to enforce security controls such as:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Requiring multi-factor authentication (MFA) to complete certain actions&lt;/li&gt;
  &lt;li&gt;Blocking use of the root user outside of the AWS Organization management account&lt;/li&gt;
  &lt;li&gt;Blocking certain changes such as leaving the AWS Organization or disabling security tools&lt;/li&gt;
  &lt;li&gt;Blocking access to specific regions or setting a region allow-list, if your organization has policies restricting where services can be deployed&lt;/li&gt;
  &lt;li&gt;Blocking granting VPCs direct Internet access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Resource control policies (RCPs) are supported by S3, STS, KMS, SQS, and Secrets Manager, and can be used to enforce security controls such:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Requiring all queries against in-scope resources to be over HTTPS&lt;/li&gt;
  &lt;li&gt;Restricting access to your S3 buckets, KMS encryption keys, and Secrets Manager secrets to principles within your AWS Organization&lt;/li&gt;
  &lt;li&gt;Restricting sts:AssumeRoleWithWebIdentity requests to allow-listed OpenID Connect (OIDC) Identity Providers and identities&lt;/li&gt;
  &lt;li&gt;Requesting KMS encryption to be used for all S3 objects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SCPs and RCPs are provided at no cost, while only up to 5 SCPs and 5 RCPs can be assigned to a given AWS account or AWS Organizational Unit.&lt;/p&gt;

&lt;p&gt;Each AWS Organization has a single management account that can’t be changed, and AWS recommends using an account that has no other resources or workloads for this purpose.  Access to the management account should be limited to admin users that need to make organization changes.  SCPs and RCPs do not apply to the management account.&lt;/p&gt;

&lt;p&gt;There are no costs associated with using AWS Organizations, so while you do need to take care when using it (with great power comes great responsibility), it’s an awesome tool to simplify managing multi-account organizations and services.&lt;/p&gt;

&lt;h2 id=&quot;creating-an-aws-organization&quot;&gt;Creating an AWS Organization&lt;/h2&gt;

&lt;p&gt;Once you’ve created an AWS account that will be specifically for AWS Organization management, log into that account, navigate to the AWS Organizations service and click “Create an Organization”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_01_StarterLandingPage.png&quot; alt=&quot;AWS Organizations Landing Page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Review the linked recommendations, and click “Create an Organization” to proceed.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_02_CreateOrgPage.png&quot; alt=&quot;Create AWS Organization Page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can then view and modify Organization settings and Invite Accounts to join the Organization.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_03_NewOrgPage.png&quot; alt=&quot;New AWS Organization Page&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;standardizing-user-access-via-iam-identity-center&quot;&gt;Standardizing user access via IAM Identity Center&lt;/h2&gt;

&lt;p&gt;From the AWS Organization management account, navigate to the IAM Identity Center service, and enable Identity Center for your Organization if not yet enabled.&lt;/p&gt;

&lt;p&gt;Then, under “Settings” &amp;gt; “Identity source”, configure an Identity Source that matches your use case.&lt;/p&gt;

&lt;p&gt;You can choose between:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Identity Center directory, where you create users and groups in IAM Identity Center, and users sign in through the AWS access portal with usernames and passwords.&lt;/li&gt;
  &lt;li&gt;Active Directory, which many companies already have set up to manage network access controls.&lt;/li&gt;
  &lt;li&gt;Other external identity providers that implement supported identity federation protocols.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my proof-of-concept, Active Directory and external identity providers aren’t relevant, so I’ll use the native AWS Identity Center directory.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_IamIdentityCenter_01_ChooseIdentitySource.png&quot; alt=&quot;Choose Identity Source&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;enforcing-multi-factor-authentication-mfa&quot;&gt;Enforcing multi-factor-authentication (MFA)&lt;/h2&gt;

&lt;p&gt;Especially if using traditional usernames and passwords, multi-factor authentication is critical to protect your accounts from attackers.&lt;/p&gt;

&lt;p&gt;MFA requires that after a user enters a correct username and password, they also provide additional verification that they are who they say they are.&lt;/p&gt;

&lt;p&gt;Companies such as &lt;a href=&quot;https://security.googleblog.com/2019/05/new-research-how-effective-is-basic.html&quot;&gt;Google&lt;/a&gt; and &lt;a href=&quot;https://www.microsoft.com/en-us/security/blog/2019/08/20/one-simple-action-you-can-take-to-prevent-99-9-percent-of-account-attacks/&quot;&gt;Microsoft&lt;/a&gt; that have enforced MFA have reported 99%+ decreases in successful account take-overs when requiring on-device prompts, and 100% decreases when requiring physical security keys.&lt;/p&gt;

&lt;p&gt;To enforce MFA for your Organization:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Navigate to “IAM Identity Center” &amp;gt; “Settings” &amp;gt; “Authentication”.&lt;/li&gt;
  &lt;li&gt;Under “Multi-factor authentication”, click “Configure”.&lt;/li&gt;
  &lt;li&gt;Under “Prompt users for MFA”, select “Every time they sign in”.&lt;/li&gt;
  &lt;li&gt;Under “Users can authenticate with these MFA types”, select one or more of the following:
    &lt;ul&gt;
      &lt;li&gt;Security keys (eg. YubiKey) and built-in authenticators (eg. Apple TouchID) - recommend always enabling so can users can choose the strongest MFA type available; or&lt;/li&gt;
      &lt;li&gt;Authenticator apps (eg. Authy, Google Authenticator) - good back-up if not all users can use the first option.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Under “If a user does not yet have a registered MFA device”, select “Require them to register an MFA device at sign in”.&lt;/li&gt;
  &lt;li&gt;Click “Save changes”.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_IamIdentityCenter_02_EnforceMfa.png&quot; alt=&quot;Enforce MFA&quot; /&gt;&lt;/p&gt;

&lt;p&gt;MFA will now be automatically enforced for all users.&lt;/p&gt;

&lt;h2 id=&quot;adding-aws-accounts-to-an-aws-organization&quot;&gt;Adding AWS accounts to an AWS Organization&lt;/h2&gt;

&lt;p&gt;From the AWS Organization’s “AWS accounts” page, click “Add an AWS account”.&lt;/p&gt;

&lt;p&gt;From here, you can either create new AWS accounts under the Organization, or invite existing AWS accounts to join the Organization.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_04_InviteAccountPage.png&quot; alt=&quot;Invite AWS Account Page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To invite an existing account, click “Invite an existing AWS account”, enter the account’s details, and click “Send invitation”.&lt;/p&gt;

&lt;p&gt;You can then log into the invited AWS account, select “Invitations”, and accept or decline the invite.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_05_MemberAccountInvitation.png&quot; alt=&quot;View Invite to AWS Organization&quot; /&gt;&lt;/p&gt;

&lt;p&gt;After accepting the invite, billing for the member account will be managed by the Organization’s management account, and any Organization-level security controls and AWS service configurations will be automatically applied.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_06_MemberAccountInviteAccepted.png&quot; alt=&quot;Invite Accepted Page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Member accounts can leave the Organization at any time from their AWS Organizations dashboard, unless the Organization has enforced a service control policy (SCP) blocking this action.  It’s a good practice to implement such an SCP to prevent actors who’ve compromised an account from leaving the Organization to disable its security controls.&lt;/p&gt;

&lt;h2 id=&quot;creating-cross-account-role-based-permission-groups&quot;&gt;Creating cross-account role-based permission groups&lt;/h2&gt;

&lt;p&gt;A permission set is a collection of IAM policies and permission boundaries that defines the access that will be granted to a logged in user.&lt;/p&gt;

&lt;p&gt;Multiple permission sets can be created and assigned to user groups to enable those users to log into a AWS account with one of the permission sets that have been made available to them.&lt;/p&gt;

&lt;p&gt;To manage your permission sets, navigate to “IAM Identity Center” &amp;gt; “Multi-account permissions” &amp;gt; “Permission sets”.&lt;/p&gt;

&lt;p&gt;If you click “Create permission set”, you can select from a number of AWS-defined template permission sets, or create a custom permission set with any combination of managed and inline IAM policies and an optional permission boundary.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_IamIdentityCenter_04_CreatePermissionSet.png&quot; alt=&quot;Create Permission Set&quot; /&gt;&lt;/p&gt;

&lt;p&gt;For example, we could create the following predefined permission sets for our Organization:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;AdministratorAccess: provides full access to AWS services and resources&lt;/li&gt;
  &lt;li&gt;PowerUserAccess: provides full access to AWS services and resources, but does not allow management of Users and groups&lt;/li&gt;
  &lt;li&gt;SupportUser: provides permissions to troubleshoot and resolve issues in an AWS account, and contact AWS support to create and manage cases&lt;/li&gt;
  &lt;li&gt;ReadOnlyAccess: provides read-only access to AWS services and resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_IamIdentityCenter_03_PermissionSets.png&quot; alt=&quot;Permission Sets List View&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We can then create a user group and grant that group a subset of these permission sets on specific member accounts in our Organization, depending on the level of access they need on the given accounts for their business functions.&lt;/p&gt;

&lt;p&gt;Under “IAM Identity Center” &amp;gt; “Groups”, you can manage permission groups that you can assign role-based permissions, and these groups can be assigned to any selection of Organizational Units or accounts within your AWS Organization.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_IamIdentityCenter_05_GroupsList.png&quot; alt=&quot;Groups List View&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To create a new group, select “Create group”, enter a group name and optional description, optionally add users, and click “Create group”.&lt;/p&gt;

&lt;p&gt;Then open that group and under AWS accounts, select “Assign accounts”.&lt;/p&gt;

&lt;p&gt;Select accounts and permission sets that the given group should have on those accounts, the click “Assign” to apply the permissions.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_IamIdentityCenter_06_GroupAssignPermissionSets.png&quot; alt=&quot;Assign permission sets and accounts to a group&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can then add users to the group, and after they log into their provided AWS access portal, they will be able to select an account and a permission set out of those assigned to their group for that account.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_UserLoginPermissionSetsView.png&quot; alt=&quot;User portal with accounts and permission sets&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A common use case for setting up multiple permission sets on a group is to allow developers to log into accounts with read-only access by default to validate workflows, and only have them log in with admin permissions when necessary to manually remediate issues.&lt;/p&gt;

&lt;h2 id=&quot;setting-up-service-control-policies&quot;&gt;Setting up Service Control Policies&lt;/h2&gt;

&lt;p&gt;From your AWS Organization management account, navigate to “AWS Organizations” &amp;gt; “Policies”.&lt;/p&gt;

&lt;p&gt;Click “Service control policies” and enable this feature.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_07_SecurityControlPoliciesPage.png&quot; alt=&quot;Security Control Policies page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Click “Create policy” and create a policy based on your use cases.&lt;/p&gt;

&lt;p&gt;For example, to create a policy that blocks member accounts from leaving the AWS Organization, disabling or modifying GuardDuty config, or attaching their VPCs directly to the Internet, we can create a SCP with:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Policy name: DenyBypassOrgSecurityControls
    &lt;ul&gt;
      &lt;li&gt;Note: Will likely want a more specific policy and name, using this for testing purposes.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Description: Prevent member accounts from leaving the AWS Organization, disabling or modifying GuardDuty configurations, or opening direct VPC access to the Internet.&lt;/li&gt;
  &lt;li&gt;Policy JSON:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Version&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;2012-10-17&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Statement&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Sid&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;DenyLeaveOrganization&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Effect&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;Deny&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Action&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;organizations:LeaveOrganization&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Resource&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;*&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Sid&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;DenyModifyGuardDuty&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Effect&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;Deny&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Action&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:AcceptInvitation&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:ArchiveFindings&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:CreateDetector&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:CreateFilter&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:CreateIPSet&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:CreateMembers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:CreatePublishingDestination&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:CreateSampleFindings&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:CreateThreatIntelSet&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DeclineInvitations&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DeleteDetector&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DeleteFilter&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DeleteInvitations&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DeleteIPSet&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DeleteMembers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DeletePublishingDestination&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DeleteThreatIntelSet&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DisassociateFromMasterAccount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:DisassociateMembers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:InviteMembers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:StartMonitoringMembers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:StopMonitoringMembers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:TagResource&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:UnarchiveFindings&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:UntagResource&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:UpdateDetector&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:UpdateFilter&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:UpdateFindingsFeedback&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:UpdateIPSet&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:UpdatePublishingDestination&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;guardduty:UpdateThreatIntelSet&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Resource&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;*&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Sid&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;DenyOpenVpcInternetAccess&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Effect&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;Deny&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Action&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ec2:AttachInternetGateway&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ec2:CreateInternetGateway&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ec2:CreateEgressOnlyInternetGateway&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ec2:CreateVpcPeeringConnection&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ec2:AcceptVpcPeeringConnection&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;globalaccelerator:Create*&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;globalaccelerator:Update*&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;Resource&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;*&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Click “Create policy” to save your changes.&lt;/p&gt;

&lt;p&gt;Then, from the “Service control policies” page, select your created policy and select “Actions” &amp;gt; “Attach Policy”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_08_SecurityControlPoliciesPage_AttachPolicyDropdown.png&quot; alt=&quot;Security Control Policies actions&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Select the Organizational Units or specific accounts you want to apply the policy to, and click “Attach policy”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrgs_09_SecurityControlPolicies_AttachPolicyPage.png&quot; alt=&quot;Security Control Policies attach policy page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The security controls will now be automatically enforced for the associated member accounts.&lt;/p&gt;

&lt;p&gt;We can test this by logging into a member account using the AdministratorAccess role, navigating to AWS Organizations, and selecting to leave the Organization.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20250507_AwsOrg_ScpValidation_ErrorLeavingOrg.png&quot; alt=&quot;Error leaving AWS Organization&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As shown above, the Service Control Policy successfully blocks member accounts from leaving the Organization.  Removing member accounts can now only be initiated from the Organization management account, which reduces the risk of a compromised account bypassing Organization security policies.&lt;/p&gt;

&lt;h2 id=&quot;posts-in-this-series&quot;&gt;Posts in this series&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;(Current post) Using AWS Organizations to standardize security controls across AWS accounts&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/aws-organization-members-cost-explorer-access/&quot;&gt;Granting AWS Organization member accounts access to Cost Explorer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;AWS Organizations User Guide: &lt;a href=&quot;https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html&quot;&gt;https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS Organizations Best Practices: &lt;a href=&quot;https://docs.aws.amazon.com/organizations/latest/userguide/orgs_best-practices_mgmt-acct.html&quot;&gt;https://docs.aws.amazon.com/organizations/latest/userguide/orgs_best-practices_mgmt-acct.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS Organizations Security Control Policies User Guide: &lt;a href=&quot;https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html&quot;&gt;https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS Organizations Security Control Policies Best Practices: &lt;a href=&quot;https://aws.amazon.com/blogs/industries/best-practices-for-aws-organizations-service-control-policies-in-a-multi-account-environment/&quot;&gt;https://aws.amazon.com/blogs/industries/best-practices-for-aws-organizations-service-control-policies-in-a-multi-account-environment/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS IAM Identity Center MFA User Guide: &lt;a href=&quot;https://docs.aws.amazon.com/singlesignon/latest/userguide/mfa-configure.html&quot;&gt;https://docs.aws.amazon.com/singlesignon/latest/userguide/mfa-configure.html&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 07 May 2025 00:00:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/aws-organizations/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/aws-organizations/</guid>
        
        
        <category>aws</category>
        
      </item>
    
      <item>
        <title>Reducing Lambda latency by 76% with AWS Lambda Power Tuning</title>
        <description>&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;Optimizing AWS Lambda memory capacity can decrease customer-facing latencies by up to 2-5 times without significantly increasing hardware costs.  However, this takes trial and error, and many teams just pick an amount of memory and stick with it, leaving their services several times slower than necessary.&lt;/p&gt;

&lt;p&gt;Other teams spend hours setting up custom code and metrics to measure latencies for each of their service’s use cases, benchmark each use case against various memory capacities, and use the AWS Cost Estimator or AWS Lambda pricing documentation to estimate costs and choose the amount of memory with the best latency-to-cost tradeoff.&lt;/p&gt;

&lt;p&gt;This is no longer necessary with the AWS Lambda Power Tuning tool, which can be run against any Lambda function in your AWS account to automatically determine the optimal memory capacity that minimizes execution latency and/or hardware costs.&lt;/p&gt;

&lt;p&gt;There is no cost to deploy and run this besides its underlying hardware costs, which is likely free if you only run it a few times before deleting it from your account.&lt;/p&gt;

&lt;p&gt;Since it only relies on AWS-infrastructure-level API calls, the tool works regardless of which programming language your Lambda function uses, and doesn’t require any modifications to your service infrastructure or code.&lt;/p&gt;

&lt;h2 id=&quot;set-up&quot;&gt;Set up&lt;/h2&gt;

&lt;p&gt;The AWS Lambda Power Tuning &lt;a href=&quot;https://github.com/alexcasalboni/aws-lambda-power-tuning&quot;&gt;GitHub repo&lt;/a&gt; documents multiple ways to deploy the tool, using either the AWS Serverless Application Repository (simplest), AWS SAM CLI, AWS CDK, or Terraform.&lt;/p&gt;

&lt;p&gt;I used the AWS Serverless Application Repository since this reduces set-up to a few button clicks, and I planned to tear down the tool after optimizing my Lambda function.&lt;/p&gt;

&lt;p&gt;To use this deployment option, you can simply log into your AWS account, visit &lt;a href=&quot;https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:451282441545:applications~aws-lambda-power-tuning&quot;&gt;https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:451282441545:applications~aws-lambda-power-tuning&lt;/a&gt;, and click Deploy.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20240714_AWSLambdaPowerTuning_AWSServerlessRepoAppPage.png&quot; alt=&quot;alt text&quot; title=&quot;AWS Serverless Repo application page for the AWS Lambda Power Tuning tool&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This will create an AWS Lambda Application that encapsulates all the infrastructure for the tuning tool, including the AWS Step Functions State Machine that you’ll invoke to run the benchmark tests.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20240714_AWSLambdaPowerTuning_AppResources.png&quot; alt=&quot;alt text&quot; title=&quot;AWS Lambda Power Tuning application resources&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Click on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powerTuningStateMachine&lt;/code&gt; resource to open the state machine, and click &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Start Execution&lt;/code&gt;, then enter the JSON payload to run the benchmark test with, where input parameters are documented on the tool’s &lt;a href=&quot;https://github.com/alexcasalboni/aws-lambda-power-tuning&quot;&gt;GitHub README&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For example, the following payload runs the tool against the given Lambda function, with 15 executions each for 512, 1024, 1536, 2048, and 3008 MB of memory, with a function payload specific to my API service, and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;balanced&lt;/code&gt; optimization strategy.&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;lambdaARN&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;arn:aws:lambda:us-west-2:123456789012:function:TestLambdaFunctionName&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;powerValues&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1536&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2048&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3008&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;num&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;resource&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;/v1/consent-management/services/{serviceId}/users/{userId}/consents&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;path&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;/v1/consent-management/services/TestServiceId/users/TestUserId/consents&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;httpMethod&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;GET&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;pathParameters&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;serviceId&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;TestServiceId&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;userId&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;TestUserId&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;requestContext&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;resourceId&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;1abc2d&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;resourcePath&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;/v1/consent-management/services/{serviceId}/users/{userId}/consents&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;operationName&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ListServiceUserConsent&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;httpMethod&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;GET&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;path&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;/v1/consent-management/services/{serviceId}/users/{userId}/consents&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;accountId&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;123456789012&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;protocol&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;HTTP/1.1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;stage&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;test&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;parallelInvocation&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;strategy&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;balanced&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;parallelInvocation&lt;/code&gt; to false after observing Lambda throttling errors with it set to true, since my test Lambda isn’t currently provisioned for high load, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strategy&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;balanced&lt;/code&gt; to equality weight minimizing latency and minimizing costs, while you can configure the tool to only consider one or use a different weighted average.&lt;/p&gt;

&lt;h2 id=&quot;analyzing-results&quot;&gt;Analyzing results&lt;/h2&gt;

&lt;p&gt;Once the execution completes, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Execution input and output&lt;/code&gt; tab will display the recommended amount of memory as the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;power&lt;/code&gt; value, the resulting average latency in milliseconds and cost per execution, and the URL to a more detailed visualization.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20240714_AWSLambdaPowerTuning_ExecutionOutput.png&quot; alt=&quot;alt text&quot; title=&quot;AWS Lambda Power Tuning execution output&quot; /&gt;&lt;/p&gt;

&lt;p&gt;By navigating to that URL, we can view a graph of average latency and execution costs for each amount of memory measured, along with summarized best and worst memories for latency and cost.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20240714_AWSLambdaPowerTuningResults.png&quot; alt=&quot;alt text&quot; title=&quot;AWS Lambda Power Tuning results visualization&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In this case, for my Lambda function, which is written in Java and queries a DynamoDB table, 2048 MB of memory resulted in the lowest average latency, while 1024 MB of memory had the lowest runtime costs.&lt;/p&gt;

&lt;p&gt;We can see that 512 MB actually costs more than 1024 MB, and this is due to the duration being several times higher which results in higher GB-second charges.&lt;/p&gt;

&lt;p&gt;This was only run for 15 iterations per memory allocation, so I increased the sample size and reran against 1024, 1536, and 2048 MB by setting powerValues and num to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;powerValues&quot;: [1024, 1536, 2048], &quot;num&quot;: 50&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I executed the Lambda function a couple times first with a test payload to eliminate cold starts as a compounding factor, and then ran the state machine with the new config, which resulted in the following output and visualization:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;power&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1536&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;cost&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;3.2760000000000005e-7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;duration&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;12.266666666666667&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;stateMachine&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;executionCost&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00023&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;lambdaCost&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00012891480000000002&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;visualization&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;https://lambda-power-tuning.show/#AAQABgAI;3t2NQUREREFERExB;ilmiNADhrzRWgeo0&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The more detailed visualization indicates that for our particular use case, we’re unlikely to see significant performance improvements from increasing memory above 1536 MB, and the marginal cost increase from 1024 MB to 1536 MB is acceptable for us.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20240714_AWSLambdaPowerTuningResultsRun2.png&quot; alt=&quot;alt text&quot; title=&quot;AWS Lambda Power Tuning results visualization for second run&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can see a more detailed table view of the underlying data by going to the step function execution’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Detail&lt;/code&gt; tab, selecting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Table&lt;/code&gt; view, selecting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Analyzer&lt;/code&gt; task, and selecting the Analyzer panel’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Output&lt;/code&gt; tab.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20240714_AWSLambdaPowerTuningResultsAnalyzerDetails.png&quot; alt=&quot;alt text&quot; title=&quot;AWS Lambda Power Tuning results table view for second run&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;tear-down&quot;&gt;Tear-down&lt;/h2&gt;

&lt;p&gt;When you no longer need the tool, you can open the AWS CloudFormation console and delete the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;serverlessrepo-aws-lambda-power-tuning&lt;/code&gt; CloudFormation stack.&lt;/p&gt;

&lt;h2 id=&quot;outcome&quot;&gt;Outcome&lt;/h2&gt;

&lt;p&gt;The tool took under 10 minutes to deploy, execute, and fine-tune, and resulted in me changing my test Lambda’s memory allocation from 512 MB to 1536 MB.&lt;/p&gt;

&lt;p&gt;This lowered my API’s average latency from 50ms to 12ms, a 4.17x improvement, AKA 76% latency reduction.  Duration costs increased by 8% to $0.3276/million executions, which is minimal for my service’s scale.&lt;/p&gt;

&lt;p&gt;Given the latency improvements of choosing the right amount of memory, and how easy this tool is to use, I’d recommend it to anyone building services on AWS Lambda.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;AWS Lambda docs introducing AWS Lambda Power Tuning: &lt;a href=&quot;https://docs.aws.amazon.com/lambda/latest/operatorguide/profile-functions.html&quot;&gt;https://docs.aws.amazon.com/lambda/latest/operatorguide/profile-functions.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS Lambda Power Tuning GitHub repository with usage details: &lt;a href=&quot;https://github.com/alexcasalboni/aws-lambda-power-tuning&quot;&gt;https://github.com/alexcasalboni/aws-lambda-power-tuning&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS Lambda pricing: &lt;a href=&quot;https://aws.amazon.com/lambda/pricing/&quot;&gt;https://aws.amazon.com/lambda/pricing/&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Sun, 14 Jul 2024 00:00:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/lambda-power-tuning/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/lambda-power-tuning/</guid>
        
        
        <category>aws</category>
        
      </item>
    
      <item>
        <title>Serializing and deserializing DynamoDB pagination tokens to support paginated APIs</title>
        <description>&lt;p&gt;When using AWS’s Java 2.x SDK, DynamoDB scan and query responses provide pagination tokens in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Map&amp;lt;String, AttributeValue&amp;gt; lastEvaluatedKey&lt;/code&gt; object, which represents the primary key of the last processed DynamoDB item.  You can then pass this value as the “exclusive start key” for the next query to get the next page of results.&lt;/p&gt;

&lt;p&gt;When your service retrieves all pages of results locally, this isn’t a problem.  However, when you want to provide a paginated API backed by DynamoDB, you’ll need to convert this attribute value map into a format that can be passed over HTTP, AKA “serialize” the object into a string.&lt;/p&gt;

&lt;p&gt;When your client requests the next page of results with that string pagination token, you’ll also need to convert that string back into the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Map&amp;lt;String, AttributeValue&amp;gt;&lt;/code&gt; format that the AWS SDK expects, AKA “deserialize” the string to the original data structure.&lt;/p&gt;

&lt;h2 id=&quot;prior-method-for-serializingdeserializing-pagination-tokens&quot;&gt;Prior method for serializing/deserializing pagination tokens&lt;/h2&gt;

&lt;p&gt;Before May 2023, building paginated APIs backed by DynamoDB was not very convenient, as you’d have to build your own custom serialization and deserialization code.&lt;/p&gt;

&lt;p&gt;Example implementation using Immutables and Jackson, with a sample DynamoDB table primary key that has both a partition key and a sort key:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.core.JsonParseException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.core.JsonProcessingException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.databind.ObjectMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.databind.annotation.JsonDeserialize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.fasterxml.jackson.databind.annotation.JsonSerialize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.immutables.value.Value.Immutable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.immutables.value.Value.Parameter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.immutables.value.Value.Style&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;software.amazon.awssdk.services.dynamodb.model.AttributeValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;java.io.IOException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;java.util.Base64&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;java.util.Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;cm&quot;&gt;/**
 * Serializable representation of a Product DynamoDB pagination token.
 * Using Immutables to generate safe, immutable value objects.
 * @see https://immutables.github.io/
 */&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@JsonDeserialize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ProductNextTokenBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@JsonSerialize&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Style&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;visibility&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Style&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;ImplementationVisibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;PRIVATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Immutable&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ProductNextToken&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@Parameter&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getPartitionKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@Parameter&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getSortKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;cm&quot;&gt;/**
 * Class encapsulating logic to convert DynamoDB pagination tokens between attribute value
 * maps used by the AWS SDK, and string values that can be passed over HTTP.
 */&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ProductSerializer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;PRODUCT_TABLE_PARTITION_KEY&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;YourDynamoDBTablePartitionKeyName&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;PRODUCT_TABLE_SORT_KEY&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;YourDynamoDBTableSortKeyName&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;nc&quot;&gt;ProductSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ObjectMapper&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;objectMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;objectMapper&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;objectMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;cm&quot;&gt;/**
     * Serialize a lastEvaluatedKey from an attribute value map to a string.
     *
     * @param lastEvaluatedKey attribute map returned by paginated DynamoDB queries.
     * @return serialized String token that can be passed over HTTP.
     * @throws JsonProcessingException exception thrown if unable to parse the key.
     */&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;serializeLastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AttributeValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;JsonProcessingException&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lastEvaluatedKey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ProductNextToken&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tokenObject&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ProductNextTokenBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;partitionKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;PRODUCT_TABLE_PARTITION_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sortKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;PRODUCT_TABLE_SORT_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Base64&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUrlEncoder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;encodeToString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;objectMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeValueAsBytes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tokenObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;cm&quot;&gt;/**
     * Deserialize a lastEvaluatedKey from a string to an attribute value map.
     *
     * @param lastEvaluatedKey attribute map returned by paginated DynamoDB queries.
     * @return serialized String token that can be passed over HTTP.
     * @throws IOException exception thrown if unable to decode encodedLastEvaluatedKey.
     * @throws JsonParseException exception thrown if unable to deserialize the decoded key into a ProductNextToken.
     */&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AttributeValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;deserializeLastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;encodedLastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;IOException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;JsonParseException&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;encodedLastEvaluatedKey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

      &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ProductNextToken&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;deserializedToken&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;objectMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;readValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
          &lt;span class=&quot;nc&quot;&gt;Base64&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUrlDecoder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;decode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;encodedLastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
          &lt;span class=&quot;nc&quot;&gt;ProductNextToken&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

      &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AttributeValue&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;partitionKeyValue&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AttributeValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
          &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deserializedToken&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getPartitionKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
          &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

      &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AttributeValue&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sortKeyValue&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AttributeValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
          &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deserializedToken&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getSortKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
          &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
          &lt;span class=&quot;no&quot;&gt;PRODUCT_TABLE_PARTITION_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;partitionKeyValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;no&quot;&gt;PRODUCT_TABLE_SORT_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sortKeyValue&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is a lot of code to maintain and test, with multiple exception cases.  We can remove the dependency on specific key structure by generalizing the code to iterate over the map and JSON key-value pairs, as shown in &lt;a href=&quot;https://github.com/aws/aws-sdk-java-v2/issues/3224&quot;&gt;https://github.com/aws/aws-sdk-java-v2/issues/3224&lt;/a&gt;, but this is still more complex than should be necessary for what we’d prefer to be simple “stringify” and “unstringify” methods.&lt;/p&gt;

&lt;h2 id=&quot;serializationdeserialization-with-the-dynamodb-enhanced-document-library&quot;&gt;Serialization/deserialization with the DynamoDB Enhanced Document library&lt;/h2&gt;

&lt;p&gt;Since May 2023, AWS’s Java 2.x SDK includes an Enhanced Document library that simplifies converting pagination tokens between the AWS SDK’s objects and JSON strings that can be passed over HTTP.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/enhanced/dynamodb/document/EnhancedDocument.html&quot;&gt;software.amazon.awssdk.enhanced.dynamodb.document.EnhancedDocument&lt;/a&gt; class includes utility methods that make serialization and deserialization one-liners.&lt;/p&gt;

&lt;p&gt;AWS blog post demonstrating use cases: &lt;a href=&quot;https://aws.amazon.com/blogs/devops/introducing-the-enhanced-document-api-for-dynamodb-in-the-aws-sdk-for-java-2-x/&quot;&gt;https://aws.amazon.com/blogs/devops/introducing-the-enhanced-document-api-for-dynamodb-in-the-aws-sdk-for-java-2-x/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sample code for converting between &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Map&amp;lt;String, AttributeValue&amp;gt;&lt;/code&gt; pagination tokens and JSON strings:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;software.amazon.awssdk.enhanced.dynamodb.document.EnhancedDocument&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;software.amazon.awssdk.services.dynamodb.model.AttributeValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;java.io.UncheckedIOException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;cm&quot;&gt;/**
 * Class encapsulating logic to convert DynamoDB pagination tokens between attribute value
 * maps used by the AWS SDK, and string values that can be passed over HTTP.
 */&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ProductSerializer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;cm&quot;&gt;/**
      * Convert a DynamoDB attribute value map to a JSON string.
      * @param attributeValueMap DynamoDB item key represented as a map from attribute names to attribute values
      * @return String JSON string representation of the DynamoDB item key
      */&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;serializeLastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AttributeValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attributeValueMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;EnhancedDocument&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromAttributeValueMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;attributeValueMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toJson&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;cm&quot;&gt;/**
      * Convert a JSON string representation of a DynamoDB pagination token to the format required by DynamoDB API calls.
      * @param paginationTokenJson JSON string representing the last paginated API call&apos;s last evaluated record key
      * @return Map&amp;lt;String, AttributeValue&amp;gt; exclusive start key for the next paginated DynamoDB scan/query API call
      * @throws UncheckedIOException exception thrown if fail to parse pagination token
      */&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AttributeValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;deserializeLastEvaluatedKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;paginationTokenJson&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;UncheckedIOException&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;EnhancedDocument&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromJson&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;paginationTokenJson&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is much more manageable, with serialization and deserialization functionality now provided out-of-the-box as part of the standard AWS SDK.&lt;/p&gt;

&lt;p&gt;We can pass this serialized JSON string to our API clients as the client-facing pagination token.  Optionally, if it’s important to us to obfuscate our internal DynamoDB key structure from clients, we can add back a Base64 encode/decode layer on top of the JSON strings using the same code snippets from the earlier example.&lt;/p&gt;
</description>
        <pubDate>Sat, 18 May 2024 17:00:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/serializing-deserializing-dynamodb-pagination-tokens/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/serializing-deserializing-dynamodb-pagination-tokens/</guid>
        
        
        <category>aws</category>
        
      </item>
    
      <item>
        <title>Concurrency from single host applications up to massively distributed services</title>
        <description>&lt;p&gt;Concurrency is when multiple software threads or programs are run at the same time, and is a key aspect of many modern applications.&lt;/p&gt;

&lt;p&gt;Web browsers run dozens of concurrent processes based on your activity, querying servers, downloading files, and executing scripts all at once.&lt;/p&gt;

&lt;p&gt;Online services with millions of active users run a scaled up number of concurrent processes across thousands of servers, with various distributed system design patterns to support this.&lt;/p&gt;

&lt;p&gt;This post will describe several levels of concurrency, how they’re commonly applied, and pros and cons of each approach.&lt;/p&gt;

&lt;h2 id=&quot;levels-of-concurrency&quot;&gt;Levels of concurrency&lt;/h2&gt;
&lt;h3 id=&quot;multi-threaded-applications&quot;&gt;Multi-threaded applications&lt;/h3&gt;
&lt;p&gt;Within application code, we can run multiple threads concurrently.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20231130_DistributedComputeToHandleMillionTps-MultiThreadedApp.png&quot; alt=&quot;alt text&quot; title=&quot;Diagram of a multi-threaded application&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This approach can be locally applied regardless of whether an application is run on a single host or in a distributed service.  However, multi-threaded code increases code complexity and introduces thread safety issues, and an error in one thread may take down the entire application.&lt;/p&gt;

&lt;p&gt;A common use case for multi-threading is when we need to make multiple requests to other services that may each take multiple seconds to complete.  We can trigger each request in a separate thread to run them concurrently, and collect the results at the end of the longest running call, rather than synchronously making one request at a time after the prior response has returned.&lt;/p&gt;

&lt;h4 id=&quot;latency-trade-offs&quot;&gt;Latency trade-offs&lt;/h4&gt;

&lt;p&gt;Example 1: Suppose we will run 4 requests that each take 2 seconds, 5 seconds, 5 seconds, and 1 second to complete, and each thread adds 20 milliseconds of overhead to start and close.  We’ll exclude the time to combine results as equivalent between multi-threaded and synchronous approaches.  Our runtime with multi-threading will be &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max(2, 5, 5, 1) + 0.02*4&lt;/code&gt; = 5.08 seconds, compared to the synchronous approach taking &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2 + 5 + 5 + 1&lt;/code&gt; = 13 seconds to make all requests.  In this scenario, multi-threading reduces our latency by 7.92 seconds.&lt;/p&gt;

&lt;p&gt;Example 2: Splitting tasks into threads does not come for free and may not worthwhile for very short-lived requests.  For example, if we have 1000 requests that each take 0.01 seconds to complete, running each request in a separate thread would take &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max(0.01) + 0.02*1000&lt;/code&gt; = 20.01 seconds, compared to the synchronous approach taking &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1000*0.01&lt;/code&gt; = 10 seconds.  In this case, the synchronous approach is twice as efficient as multi-threading.&lt;/p&gt;

&lt;p&gt;Since the cost of such a high branching factor is high, in reality, we’ll typically break this workflow up into batches of requests per thread, such as 200 requests per thread.&lt;/p&gt;

&lt;p&gt;Example 2b: Given 1000 requests that each take 0.01 seconds to complete, if we split the work into 5 batches of 200 requests per thread, computing all the results would take &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max(200*0.01, 200*0.01, 200*0.01, 200*0.01, 200*0.01) + 0.02*5&lt;/code&gt; = 2.1 seconds, compared to the synchronous approach taking &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1000*0.01&lt;/code&gt; = 10 seconds.  By batching the work before applying multi-threading, we can reduce latency compared to synchronous calls by 7.9 seconds.&lt;/p&gt;

&lt;p&gt;Multi-threading provides the most latency reduction when we’re able to run multiple long-running tasks in parallel, especially multi-second tasks, whether each task is a single long-running request or a series of requests adding up to seconds.&lt;/p&gt;

&lt;h3 id=&quot;multi-container-hosts&quot;&gt;Multi-container hosts&lt;/h3&gt;
&lt;p&gt;Within a host, we can run multiple containers which each receive allocated memory and run an isolated instance of application code.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20231130_DistributedComputeToHandleMillionTps-MultiContainerHost.png&quot; alt=&quot;alt text&quot; title=&quot;Diagram of a multi-container host&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This allows us to fully utilize a host’s CPU and memory, while we will eventually get to a point where the host no longer has sufficient CPU or memory capacity to add more containers, or where performance begins to drop due to increased context switching and IO bottlenecks.&lt;/p&gt;

&lt;p&gt;Isolating concurrent applications in separate containers also improves system reliability, since regardless of individual application failures, the other containers can continue running.  However, we are still vulnerable to host-level failures.&lt;/p&gt;

&lt;p&gt;A single host can be sufficient for some small-scale services that only have a few hundred concurrent requests and are acceptable to periodically take offline for maintenance.  For services that need to provide 24/7 availability or handle more traffic, we will graduate to distributed services where this host will be a single unit of a larger architecture, leading us to the multi-host cluster.&lt;/p&gt;

&lt;h3 id=&quot;multi-host-clusters-behind-a-load-balancer&quot;&gt;Multi-host clusters behind a load balancer&lt;/h3&gt;
&lt;p&gt;When we require high availability or more concurrency than a single host can support, we can set up a load balancer that distributes traffic across multiple hosts, forming a cluster of hosts.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20231130_DistributedComputeToHandleMillionTps-MultiHostCluster.png&quot; alt=&quot;alt text&quot; title=&quot;Diagram of a multi-host cluster&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This allows us to horizontally scale, that is, add or remove servers to our resource pool as needed.  Horizontal scaling makes our service more robust to individual host failures and enables more flexibility in our infrastructure, allowing us to swap out different types of hosts at will, patch or update individual hosts without affecting service availability, and pay for just as many hosts as are needed to meet current demand.&lt;/p&gt;

&lt;p&gt;This is often the go-to design pattern for services that need to process thousands of concurrent requests, which a single host may no longer be able to handle.&lt;/p&gt;

&lt;h3 id=&quot;multi-cluster-services&quot;&gt;Multi-cluster services&lt;/h3&gt;
&lt;p&gt;When we have more traffic than a single load balancer can handle, we can set up a DNS load balancer to distribute traffic across multiple clusters.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/20231130_DistributedComputeToHandleMillionTps-MultiClusterService.png&quot; alt=&quot;alt text&quot; title=&quot;Diagram of a multi-cluster service&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is rarely the starting point for a new service.  We only want to add this level of complexity when absolutely necessary, such as when scaling up to millions of concurrent requests, or after hitting infrastructure restrictions on load balancer concurrent connections or maximum attached endpoints.&lt;/p&gt;

&lt;p&gt;Many cloud providers provide distributed DNS load balancers that remove the single point of failure of a traditional load balancer, scale to millions of concurrent users, and automatically route traffic to the closest regional cluster.&lt;/p&gt;

&lt;h4 id=&quot;dns-load-balancer-trade-offs&quot;&gt;DNS load balancer trade-offs&lt;/h4&gt;

&lt;p&gt;DNS load balancers are more limited in functionality than many specialized load balancers.  For example, AWS network load balancers can support more granular access controls and security configurations, and integrate with compute services to automatically replace unhealthy hosts that fail to respond to the load balancer.&lt;/p&gt;

&lt;p&gt;DNS also requires its connected endpoints to be accessible to the Internet, which is not always ideal.  Following the security principle of defence-in-depth, when protecting critical data or infrastructure, anything that doesn’t need to be connected to the Internet, shouldn’t be.  Network load balancers can be set up in protected virtual private networks to only allow access from allow-listed hosts or other trusted networks.&lt;/p&gt;

&lt;p&gt;For these reasons, in some scenarios it will make sense to have the added complexity of both a frontend DNS load balancer to distribute traffic to the closest cluster, and backend application load balancers that provide more functionality and integration with your local infrastructure.&lt;/p&gt;

&lt;p&gt;If you don’t need any functionality that isn’t supported by a DNS load balancer, can live with your servers being accessible from the Internet, and already manage your own health monitoring and host replacement strategy, then you can simplify your architecture by having a DNS load balancer directly route traffic to your backend servers.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;We’ve discussed how concurrency can be applied at multiple levels:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Multi-threaded applications that run multiple tasks in parallel, such as querying several websites simultaneously&lt;/li&gt;
  &lt;li&gt;Multi-container or multi-process hosts that run multiple applications in isolation from one another, so that a given application can continue running if others fail&lt;/li&gt;
  &lt;li&gt;Multi-host clusters that enable horizontally scaling a service to process hundreds of thousands of concurrent requests&lt;/li&gt;
  &lt;li&gt;Multi-cluster services that enable routing traffic to local load-balanced clusters that can be independently scaled, to process millions of concurrent requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many distributed services now start with multi-host clusters for reliability and scalability reasons, so that any given host can be replaced without impacting customer service, and additional hosts can be added as needed.&lt;/p&gt;

&lt;p&gt;A single load balancer and backend compute cluster can often handle hundreds of thousands of concurrent requests or more, while the load balancer may become a single point of failure for your service.  Distributed DNS load balancers can help to mitigate this concern when it’s acceptable for your servers to be accessible from the Internet.&lt;/p&gt;

&lt;p&gt;For applications where you need to handle millions of concurrent requests and have business requirements not met by a single DNS load balancer, such as needing granular access control for your backend servers or integrations with other infrastructure, a DNS load balancer in front of multiple load-balanced clusters can meet these demands with the trade-off of an additional layer of complexity.&lt;/p&gt;

&lt;h2 id=&quot;addendum&quot;&gt;Addendum&lt;/h2&gt;

&lt;p&gt;Before scaling your service to process millions of concurrent requests and paying hundreds of thousands of dollars to do so, make sure this is really necessary.&lt;/p&gt;

&lt;p&gt;Would it be more efficient to extract some of your use cases to a separate microservice?&lt;/p&gt;

&lt;p&gt;Are your hosts really doing unique work on every call?  Could some of that work be deduplicated, or could the right application of a caching layer reduce your traffic and/or average latency by orders of magnitude?&lt;/p&gt;

&lt;p&gt;Also, note that millions of concurrent users do not always translate into millions of transactions per second.  If each user only needs to make a server request every few seconds, with multiple seconds between where they locally interact with rendered results, you may only have tens to hundreds of thousands of transactions per second, which while still high, lowers the required complexity of the system.&lt;/p&gt;

&lt;p&gt;Software architecture design is an iterative process, and the optimal design will change along with the business, so it’s often worth starting with the simplest approach that meets current needs and can be scaled up or down as needed based on customer traffic.  There’s no prize for building the most expensive service that no one uses.&lt;/p&gt;
</description>
        <pubDate>Fri, 01 Dec 2023 03:00:00 +0000</pubDate>
        <link>https://www.marksayson.com/blog/concurrency-from-app-to-massively-distributed-service/</link>
        <guid isPermaLink="true">https://www.marksayson.com/blog/concurrency-from-app-to-massively-distributed-service/</guid>
        
        
        <category>distributed-systems</category>
        
        <category>system-design</category>
        
      </item>
    
  </channel>
</rss>
