Posts

View Categories

  • Strategies for querying periodic S3 data snapshots

    A common AWS analytics use case is making aggregate queries across multiple data sets stored in S3. For example, one partner team may store product metadata, while another team stores purchase order metadata, and we may want to join these data sets to determine which products are most popular across each marketplace.

    In this post I'll cover a few options for syncing data to S3 and retrieving data snapshots for use in aggregate queries, and specifically discuss the use case where we need to maintain access patterns to complete, recent data snapshots without partial unavailability during data syncs.

    Continue reading...

  • Granting AWS Organization member accounts access to Cost Explorer

    By default, adding accounts to an AWS Organization results in consolidated billing and cost management in the Organization management account, and Organization member accounts lose access to Cost Explorer, Billing, and other cost management services.

    This post walks through how to allow member accounts to access cost management services, to enable each team to review and manage their AWS spending.

    Continue reading...

  • Using AWS Organizations to standardize security controls across AWS accounts

    AWS Organizations provide a helpful, zero-cost way to centralize management of AWS accounts, with support for consolidating billing, role-based permission sets, service and resource control policies, and AWS service configurations.

    Continue reading...

  • Reducing Lambda latency by 76% with AWS Lambda Power Tuning

    Optimizing AWS Lambda memory capacity can decrease customer-facing latencies by up to 2-5 times without significantly increasing hardware costs, but can take some trial and error.

    The AWS Lambda Power Tuning tool can be used to determine the optimal memory capacity for any Lambda function within minutes, and can be set up in a few button clicks.

    Featured image

    Continue reading...

  • Serializing and deserializing DynamoDB pagination tokens to support paginated APIs

    When using AWS's Java 2.x SDK, DynamoDB scan and query responses provide pagination tokens in a String to AttributeValue Map object, which represents the primary key of the last processed DynamoDB item. You can then pass this value as the "exclusive start key" for the next query to get the next page of results.

    When your service retrieves all pages of results locally, this isn't a problem. However, when you want to provide a paginated API backed by DynamoDB, you'll need to convert this attribute value map into a format that can be passed over HTTP, AKA "serialize" the object into a string.

    Continue reading...

  • Concurrency from single host applications up to massively distributed services

    This post will describe several levels of concurrency, how they're commonly applied, and pros and cons of each approach.

    Many distributed services now start with multi-host clusters for reliability and scalability reasons, so that any given host can be replaced without impacting customer service, and additional hosts can be added as needed.

    Continue reading...

  • Process for designing distributed systems

    In this post I'll step through my process for designing distributed systems, with example questions and artifacts associated with each step.

    We will take the following steps to design a new service:

    1. Validate whether this service needs to exist

    2. Clarify business requirements

    3. Estimate scale

    4. Define system interfaces and data models

    5. Define data flow and storage

    6. Define high-level system components

    7. Design individual components

    The artifacts of each step can be validated with stakeholders to ensure we're on the right track before continuing. They collectively add to a design document that can be referred to both while building the service, and afterwards to understand its inner workings.

    Continue reading...

  • Choosing between AWS compute services

    When building a new service in AWS, it can be difficult to decide between all the available compute services. In this post I’ll give a brief overview of the main options and describe how I compare and choose between them for a given project.

    Continue reading...


subscribe via RSS