RDS Streams — reality check

Marcin Sodkiewicz
5 min readFeb 3, 2021

--

In the previous article we have showed you how to setup RDS Aurora cluster with triggers invoking AWS lambda on every data changes to mimic DynamoDB Streams behaviour.

Below we have summary of things worth to consider when choosing this solution in your system.

Lambda throttling

We have tested lambda throttling behaviour by setting up ReservedConcurrentExecutions to 0 and inserting multiple elements. After that we have set up ReservedConcurrentExecutions to 10 to re-enable events processing.
Your transaction will be safe in case of lambda being throttled and you will receive notification with a delay. Yet it will be delivered. Lambda retry policies apply here. We would advise to set up dead letter queue.
You can find more details here.

Lambda invocation failure in transaction

With usage of asynchronous invocation we are limiting number of issues that could happen inside our transaction (and this is where trigger logic is invoked). Major issues that can happen regarding lambda invocation failure in Aurora are:

Malformed Arn
In case of lambda failure due to wrong arn you won’t be able to commit transaction and it can impact the system.

Although name of the function works as well

Insufficient permissions
In case of insufficient permissions you won’t be able to commit transaction.

Lambda failure

In case of lambda failure your transaction is safe. Triggering lambda won’t cause rollback. AWS will try to invoke your lambda multiple times. I would advise to set up dead letter queue.

Performance

Performance impact is massive. I haven’t run any exhaustive testing after seeing the first results. I expected performance impact, yet reality exceeded my wildest expectations. Reason for that is fact that we are calling lambda asynchronously, but in synchronous way (row by row). If we will calculate time for single trigger it’s just 35ms, yet it accumulates with number of items.

I have created simple insert with multiple values that you can find in data/routes.sql from repo used for RDS infrastructure setup. Results comparison:

Without triggers:
[2021-02-01 19:02:47] 1,999 rows affected in 836 ms
[2021-02-01 19:17:02] 1,999 rows affected in 885 ms
With triggers:
[2021-02-01 19:07:36] 1,999 rows affected in 1 m 10 s 462 ms
[2021-02-01 19:14:18] 1,999 rows affected in 1 m 11 s 377 ms

Looks bad, right? Let’s make it even worse.

Network failure

Let’s simulate network failure. I did it by detaching NAT Gateway from routing table. Now our Aurora can’t connect to the internet. After that we could try to insert new item.

INSERT INTO flights.routes (airportFrom, airportTo, connectingAirport, newRoute, seasonalRoute, operator, `group`, tags, similarArrivalAirportCodes, carrierCode) 
VALUES ('FOO', 'NETWORK_FAILURE', null, false, false, 'RYANAIR', 'CITY', '', '', 'FR');

As an effect write will hang for a looooong time and eventually will fail. We are adding additional complexity on our SQL transactions with this side-effect.

Phantom notifications

This is the major issue we have found with that approach. Trigger logic is invoked inside transaction. What will happen if it will fail? Whole transaction will fail. What is worth consideration is asking yourself a question What will happen if transaction will fail due to data modification operations inside transaction?. This one is pretty scary if you will use this feature to some critical operations. Imagine such scenario:

You hope that lambda won’t be invoked? Yeah, I would like that too, but it’s not what happens. In such situation we will get lambda invoked for all three INSERTS despite that they never were persisted in our DB.

Unordered notifications

Using triggers with lambda async invocations we won’t have order of our changes. It affects consumption of that events streams a lot.

Potential use cases

That was quite a long list of flaws of this solution. In engineering it’s crucial to think how to use features to build something great. So after all what are potential use cases for RDS streams?

Legacy and 3rd party systems
This mechanism could be useful for scenarios with legacy software that was ‘lifted and shifted’ to the cloud that is not anymore under the development or 3rd party solution deployed on database managed by your company.
You have to keep in mind all the cons mentioned above when using that solution for your use case.

Change notifications
You could use RDS events to get notifications about data changes in your SQL database. I would advise to not follow this path and implement it as a part of your system flow. This approach should be used as a last resort in my opinion. If you have to do it, you should use only entity id to load it from DB for processing. It won’t be performant, but as I said… last resort.

Static content preparation
In case that you have data that has complex relations and some software for data management, yet you are exposing it as a single document. You could rebuild your data on data change (event based on Phantom events) and store it on S3 potentially exposed through Cloudformation. Example data sets for this use case can be data like: available markets, currencies, dictionaries, flight routes and such. Such operations are not prone to the issues caused by Phantom notifications as we will rebuild data always based on the actual DB state.

Summary

In our opinion this approach has too many flaws for majority of use cases. It’s hard to build system based on the false event source especially at the cost of major performance degradation.

PS
We’ve seen some use cases in the internet (also from AWS blog post) for that triggers based approach to obtain analytics data. Knowing what are the characteristics of this approach it’s simply not acceptable. Especially if those values are something that might be driving your business and decision making process.

Many thanks to my fellow CloudDude Michal Woronowicz!

--

--

No responses yet