Files
system-design-101/data/guides/10-principles-for-building-resilient-payment-systems-by-shopify.md
Kamran Ahmed ee4b7305a2 Adds ByteByteGo guides and links (#106)
This PR adds all the guides from [Visual
Guides](https://bytebytego.com/guides/) section on bytebytego to the
repository with proper links.

- [x] Markdown files for guides and categories are placed inside
`data/guides` and `data/categories`
- [x] Guide links in readme are auto-generated using
`scripts/readme.ts`. Everytime you run the script `npm run
update-readme`, it reads the categories and guides from the above
mentioned folders, generate production links for guides and categories
and populate the table of content in the readme. This ensures that any
future guides and categories will automatically get added to the readme.
- [x] Sorting inside the readme matches the actual category and guides
sorting on production
2025-03-31 22:16:44 -07:00

2.0 KiB
Raw Blame History

title, description, image, createdAt, draft, categories, tags
title description image createdAt draft categories tags
10 Principles for Building Resilient Payment Systems 10 principles for building resilient payment systems based on Shopify. https://assets.bytebytego.com/diagrams/0336-shopify.png 2024-03-07 false
real-world-case-studies
payment systems
resilience

Shopify has some precious tips for building resilient payment systems.

Lower the timeouts, and let the service fail early

The default timeout is 60 seconds. Based on Shopifys experiences, read timeout of 5 seconds and write timeout of 1 second are decent setups.

Install circuit breaks

Shopify developed Semian to protect Net::HTTP, MySQL, Redis, and gRPC services with a circuit breaker in Ruby.

Capacity management

If we have 50 requests arrive in our queue and it takes an average of 100 milliseconds to process a request, our throughput is 500 requests per second.

Add monitoring and alerting

Googles site reliability engineering (SRE) book lists four golden signals a user-facing system should be monitored for: latency, traffic, errors, and saturation.

Implement structured logging

We store logs in a centralized place and make them easily searchable.

Use idempotency keys

Use the Universally Unique Lexicographically Sortable Identifier (ULID) for these idempotency keys instead of a random version 4 UUID.

Be consistent with reconciliation

Store the reconciliation breaks with Shopifys financial partners in the database.

Incorporate load testing

Shopify regularly simulates the large volume flash sales to get the benchmark results.

Get on top of incident management

Each incident channel has 3 roles: Incident Manager on Call (IMOC), Support Response Manager (SRM), and service owners.

Organize incident retrospectives

For each incident, 3 questions are asked at Shopify: What exactly happened? What incorrect assumptions did we hold about our systems? What we can do to prevent this from happening?