Data Lake Solution

Business Problem

  • The requirement was to develop a data platform to collect, organize, process and provide insights into business, operational aspects and enable development of customer value added, data driven product features and dashboards.
  • ​Raw data was stored with no oversight of the contents​
  • The platform needs to have defined mechanisms to catalog, and secure data. Without these elements, data cannot be found, or trusted resulting in a “data swamp“.​

Solution

  • We have developed a data lake solution which has the following the features​
  • Real time streaming data from source systems.​
  • Connectors developed for Oracle DB (PeopleSoft, Banner) and Workday.​
  • Data access is controlled with views set up in Apache Hive which is connected to data lake.​
  • ETLS run in loop and identify changed files (via Hive) and update Report Mart. Sample ETL scripts and reports developed for HR Diversity data. PostgreSQL acts as Report Mart.​
  • Data changes form source system are reflected in the reports within two minutes.​

Outcome

  • Easier and quicker to populate as no transformation is involved ​
  • Allows to import any amount of data that can come in real-time​
  • Allows organizations to generate different types of insights including reporting on historical data​
  • Ability to store all types of structured and unstructured data​
  • Elimination of data silos​
  • Democratized access to data via a single, unified view of data​

Let's talk about
your next big project

Looking for a new career?

For all career & job related inquires Send your resumes to career@peopletech.com

Indian Employees For inquiries on background verification, PF, and any other information needed, please contact hr.communique@peopletech.com

USA Employees For inquiries related to employment/background verification please contact USA-HR@peopletech.com