Data Lake / Data Warehouse Deployment
Data Lake / Data Warehouse Deployment
OVERVIEW
Based in Los Angeles (California), OpenMethods is an enterprise software company that streamlines customer interactions and business operations by integrating with CRM and other systems to provide simplified, secure, and efficient customer experiences.
Open Methods needed a data platform to collect, organize, process and provide insights into business, operational aspects and enable development of customer value added, data driven product features and dashboards.
The company was facing a few challenges like:
- Raw data was stored with no oversight of the contents.
- For a data lake to make data usable, it needs to have defined mechanisms to catalog, and secure data. Without these elements, data cannot be found, or trusted resulting in a “data swamp".
- Meeting the needs of wider audiences require Data Lakes to have governance, semantic consistency, and access controls.
SOLUTION PROVIDED BY PEOPLE TECH GROUP
People Tech proposed the following solution:
- Real time streaming data from source systems (batch load scripts are also in place).
- Connectors were developed for Oracle DB (PeopleSoft, Banner) and Workday.
- Used Oracle DB Streams feature. Identified changes from Oracle Redo Logs and stream to Kafka.
- Data access was controlled with views set up in Apache Hive which is connected to data lake.
- ETLS ran in loop and identified changed files (via Hive) and updated Report Mart. Sample ETL scripts and reports were developed for HR Diversity data. PostgreSQL acted as Report Mart.
- Data changes form source system are reflected in the reports within 2 minutes.
BENEFITS
Benefits of the proposed solution include:
- Easier and quicker to populate as no transformation is involved.
- Allows to import any amount of data that can come in real-time.
- Allows organizations to generate different types of insights including reporting on historical data.
- Ability to store all types of structured and unstructured data.
- Elimination of data silos.
- Democratized access to data via a single, unified view of data.