Posts
-
Strategies for querying periodic S3 data snapshots
A common AWS analytics use case is making aggregate queries across multiple data sets stored in S3. For example, one partner team may store product metadata, while another team stores purchase order metadata, and we may want to join these data sets to determine which products are most popular across each marketplace.
In this post I'll cover a few options for syncing data to S3 and retrieving data snapshots for use in aggregate queries, and specifically discuss the use case where we need to maintain access patterns to complete, recent data snapshots without partial unavailability during data syncs.
-
Granting AWS Organization member accounts access to Cost Explorer
By default, adding accounts to an AWS Organization results in consolidated billing and cost management in the Organization management account, and Organization member accounts lose access to Cost Explorer, Billing, and other cost management services.
This post walks through how to allow member accounts to access cost management services, to enable each team to review and manage their AWS spending.
-
Using AWS Organizations to standardize security controls across AWS accounts
AWS Organizations provide a helpful, zero-cost way to centralize management of AWS accounts, with support for consolidating billing, role-based permission sets, service and resource control policies, and AWS service configurations.
-
Reducing Lambda latency by 76% with AWS Lambda Power Tuning
Optimizing AWS Lambda memory capacity can decrease customer-facing latencies by up to 2-5 times without significantly increasing hardware costs, but can take some trial and error.
The AWS Lambda Power Tuning tool can be used to determine the optimal memory capacity for any Lambda function within minutes, and can be set up in a few button clicks.
-
Serializing and deserializing DynamoDB pagination tokens to support paginated APIs
When using AWS's Java 2.x SDK, DynamoDB scan and query responses provide pagination tokens in a String to AttributeValue Map object, which represents the primary key of the last processed DynamoDB item. You can then pass this value as the "exclusive start key" for the next query to get the next page of results.
When your service retrieves all pages of results locally, this isn't a problem. However, when you want to provide a paginated API backed by DynamoDB, you'll need to convert this attribute value map into a format that can be passed over HTTP, AKA "serialize" the object into a string.
-
Concurrency from single host applications up to massively distributed services
This post will describe several levels of concurrency, how they're commonly applied, and pros and cons of each approach.
Many distributed services now start with multi-host clusters for reliability and scalability reasons, so that any given host can be replaced without impacting customer service, and additional hosts can be added as needed.
-
Process for designing distributed systems
In this post I'll step through my process for designing distributed systems, with example questions and artifacts associated with each step.
We will take the following steps to design a new service:
1. Validate whether this service needs to exist
2. Clarify business requirements
3. Estimate scale
4. Define system interfaces and data models
5. Define data flow and storage
6. Define high-level system components
7. Design individual components
The artifacts of each step can be validated with stakeholders to ensure we're on the right track before continuing. They collectively add to a design document that can be referred to both while building the service, and afterwards to understand its inner workings.
-
Choosing between AWS compute services
When building a new service in AWS, it can be difficult to decide between all the available compute services. In this post I’ll give a brief overview of the main options and describe how I compare and choose between them for a given project.
subscribe via RSS