This project has a hard contract: every run is exactly one hour. That constraint is what keeps results comparable across configurations and makes exported visualisation consistent.
After I fixed shutdown reliability, the next issue wasn't infrastructure. It was methodology.
The discovery: the "happy path" trap
Reviewing telemetry from a standard run, I noticed that my default load generation (single ECS task) was not reliably pushing Redis into the state I actually care about.
Memory usage was climbing, but in many cases the run could finish without reaching maxmemory. That means the test is still useful as a smoke check, but it's not a strong performance validation: it mostly measures a cache that's not under pressure.
