Building distributed systems is challenging due to their complexity, scale, and unpredictability. Common strategies to increase performance and reliability are queues, retries, caches and more. But be aware! Naively implemented optimizations have often the very much opposite effect to what you expect. While they can not only prevent your system from working they can even affect your recovery process, which is even worse.
In this session we will explore why you cannot simply add a retry here, a queue there and have a highly performant and reliable system. We will dive deeper into how to properly implement mechanisms and patterns in distributed systems to reduce failures and make recovery as smooth as possible.
In this session we will explore why you cannot simply add a retry here, a queue there and have a highly performant and reliable system. We will dive deeper into how to properly implement mechanisms and patterns in distributed systems to reduce failures and make recovery as smooth as possible.
Florian Mair
Dynatrace
Florian is a experience software engineer with 10+ years of experience in Java, Python and Go. He has worked on some of the largest systems the tech world has to offer at companies such as IBM, AWS and Dynatrace.
At Dynatrace he is responsible for building the future of Hyperscaler Observability.
At Dynatrace he is responsible for building the future of Hyperscaler Observability.