top of page

How we migrated TBs of student sessions data in AWS with zero downtime?

  • Writer: Kenil Domadia
    Kenil Domadia
  • Feb 13, 2024
  • 4 min read

This article delves into the migration process of terabytes of student data, encompassing billions of sessions, from self-managed Cassandra within AWS to a more efficient database system. It highlights the challenges faced, the strategic approach adopted, and the post-migration benefits.


The primary objective was to transition billions of data points with zero downtime within a tight timeline. The existing system on Cassandra was incurring substantial costs, approximately $420,000 annually, necessitating a swift and effective migration strategy. The migration was a pivotal move to enhance the company's data handling capabilities into the future, ensuring robustness and agility in their data-driven operations.


Problem Analysis


The challenge was formidable: migrating billions of student session data from Cassandra to a new database system with zero downtime. The existing self-managed Cassandra database was costly, with annual expenses of around $420,000, and lacked the scalability needed for the forecasted traffic in the upcoming years. The migration needed to be executed swiftly and efficiently, without the luxury of a direct database connection, typically used in Database Migration Service (DMS) jobs. 


Directly linking to a decade-old, self-managed Cassandra cluster using DMS jobs, handling hundreds of thousands of concurrent users, presented considerable risks. We evaluated which Cassandra tables to migrate, considering our recent shift from a monolith to microservices. This required selecting only essential tables for migration, as some functions had moved to other microservices. Thankfully, the Cassandra service's GET APIs were a key factor in accessing necessary data, proving crucial for our migration plan.


The goal was not just to migrate the data but also to enhance the database's performance, ensuring it could handle the increased load and provide better service in the long term.


Choosing new database within AWS


DynamoDB was our first immediate choice because it promised a world of seamless scalability, cost-effectiveness, and lightning-fast responses, all familiar territory for our developers. Excitedly, we embarked on a proof of concept (POC), and everything was going smoothly - it felt like a celebration was just around the corner.


But then, we hit a snag: DynamoDB had a record size limit of 400KB and a significant portion of our database had records exceeding this limit. We rolled up our sleeves and tried our first trick - compressing the data before inserting it into the table. We thought, "A little extra CPU usage is a small price to pay." But even after compression, our data stubbornly remained over the 400KB limit. Restructuring the table wasn't an option. Our data was like a treasure chest of student responses, packed with images and complexities, much like a puzzle that refused to be simplified. 


We moved on to the next workaround of storing large objects in Amazon S3 with a pointer in DynamoDB. It introduced significant complexity in synchronizing data across two different services. The additional latency and cost of frequent access of large objects from S3 did not align with our efficiency goals.



Large object storage strategies for Amazon DynamoDB


We considered various options, including DynamoDB, DocumentDB, and Aurora Serverless PostgreSQL, ultimately selecting DocumentDB for its alignment with the company's specific needs.


Zero Downtime Migration Plan


The migration was executed in phases:


Phase One: Implement Dual-writing of new data to both Cassandra and DocumentDB, ensuring a seamless transition for ongoing operations.

Phase Two: Migrate historical data from Cassandra to DocumentDB, a critical step in preserving the integrity of past records.

Phase Three: Route production traffic to the new DocumentDB database along with dual writing to the old Cassandra cluster, allowing for a fallback to the Cassandra system if any issues were identified with DocumentDB.

Phase Four: The final cutover, where production traffic was routed exclusively to the new DocumentDB database, marking the sunset of the old Cassandra cluster.



Migration Phases for zero downtime

The second phase of our migration, involving the transfer of historical data, required a careful, controlled approach since DocumentDB was already handling live production traffic from Phase 1. Our strategy was to develop a Lambda script for migration that would use a unique student session identifier to retrieve data from the Cassandra service using GET APIs and then insert it into DocumentDB using PUT APIs.


We gathered all historical millions of records session identifiers, representing sessions

created before Phase 1, and implemented a throttling mechanism through SQS-Lambda, allowing for the parallel processing of 20 rows per lambda execution. To manage the rate of requests to the service, we used "Concurrent Lambdas." Our performance tests showed that with a Lambda concurrency of 10 and 200 sessions being migrated in parallel, we could complete the migration in a week. This timeline gave us a comfortable window to monitor the migration process and gear up for the subsequent phase.



Migration plan for pre-stored historical data using AWS Lambda and SQS


We scheduled each new phase to remain in production for a week for observation before progressing to the subsequent phase. The implementation of feature flags in each phase allowed us the flexibility to switch between phases as needed. Additionally, we conducted performance tests for each phase to verify that the service could effectively handle the traffic.


Migration Benefits


  • Cost Efficiency: The shift to DocumentDB resulted in substantial cost savings, reducing annual expenses by about $300K compared to the Cassandra system.

  • Scalability: DocumentDB offered enhanced scalability, adeptly handling the increasing volume of data and user traffic, a critical factor for growing user base.

  • System Stability: Post-migration, the new database environment exhibited good stability and performance, and significantly reduced latency. It’s been a quite, well-behaved database for the past 6 months without any weekend-long firefights.




 
 
 

24 Comments


Kurlon
Kurlon
Dec 06, 2025

When I started using Shikshade I noticed how much easier exam prep becomes when everything is organized in one place. Browsing through notes, opening PDFs, and checking updated exam details helped me catch things I would have missed on my own. Simple features like clear syllabus pages or quick access to previous papers save a lot of time and make studying feel less scattered.


Like

EBREVINIL vinilos y stickers decorativos
EBREVINIL vinilos y stickers decorativos
Nov 01, 2025

Los vinilos decorativos son una de las tendencias más populares en el mundo de la decoración interior por su versatilidad, durabilidad y excelente relación calidad-precio. Estos adhesivos decorativos permiten transformar cualquier ambiente de forma rápida y sencilla, sin necesidad de obras ni complicadas instalaciones. Gracias a su amplia variedad de diseños, colores y acabados, los vinilos decorativos se adaptan a todo tipo de estilos, desde ambientes modernos y minimalistas hasta espacios clásicos, infantiles o temáticos. Su aplicación es muy sencilla: basta con elegir la superficie limpia y lisa para adherirlos, ya sea una pared, una puerta, una ventana, un mueble o incluso un electrodoméstico. Además, pueden retirarse fácilmente sin dejar residuos, lo que los convierte en una solución decorativa práctica y…

Like

Miranda T. Lacey
Miranda T. Lacey
Aug 28, 2025

API88


API88


Situs toto slot gacor dan togel 4d terbaik hari ini

Like

Miranda T. Lacey
Miranda T. Lacey
Aug 28, 2025

POOLS303


Situs POOLS303 siap melayani kalian 24 jam untuk toto togel dan toto slot gacor terlengkap hari ini

Like

Miranda T. Lacey
Miranda T. Lacey
Aug 28, 2025

AGENT303


Situs AGENT303 selalu siap memberikan pelayanan toto slot dan togel 4d 24 jam setiap hari

Like
bottom of page