Synapse Spark Implementation
Business Problem
Largest IT organization wanted to reduce the latency when data was made available to customers
Current system processed data in 4-hour batches and data was not available to customer for 24 hours
Data had to go through several layers after integration before being made available to customer
They were seeking to develop a solution in Azure Synapse that handles concurrent customer requests and reduce the lag time in data availability.
Solution
Separate pipelines for serving data while concurrently refreshing view when new batches arrive
New batch of data is loaded into memory and indexed in the background
Creating a view post data pre-processing. New requests under process to be completed and the new view is context-switched in
Hyperspace Indexing to reduce query time on data
- Open-source indexing on Spark developed by client.
- Reduced query response time by half.
Outcome
Data is loaded in the background without interrupting service to customers
Seamless context-switching to updated data when pre-processing and indexing are completed
Reduced complexity, saved money, and made it easier for development
Handled large amounts of incoming data