Senior Data Engineer
- Verfügbarkeit einsehen
- 0 Referenzen
- 80‐100€/Stunde
- 60489 Rödelheim
- National
- en | de
- 27.09.2024
Kurzvorstellung
Qualifikationen
Projekt‐ & Berufserfahrung
7/2023 – 9/2023
Tätigkeitsbeschreibung
Senior Data Engineer
Akkodis Germany
through Akkodis Germany Tech Freelance GmbH
Employment Type: Freelance Contract
Location: Remote/Stuttgart
Role: Data Engineer
Project: Global Reporting Platform
Project Technology Stack
Cloud Platform/Services: SAP FICO, SAP HR, Microsoft Azure, Azure Data Factory, Azure Databricks, Azure Logic Apps
Source System: SAP, Azure Blobs, REST API, MS SQL Server, CSV, etc.
Target System: Microsoft Azure SQL Database
ETL Tool/Programming Language: Azure Data Factory V2, Python, PySpark
Other programming languages: Python, T-SQL
Scheduling Tool: Azure Data Factory Triggers
Other Azure tools: Azure Data Explorer, Azure Data Studio
Project Details:
Project 1: Global Reporting Platform
Responsibilities:
Data Pipeline/ELT:
- Design, Develop and Maintain ETL/Data pipelines using Azure Data Factory and Python.
- Designed and led the implementation of end-to-end data pipelines on Azure Data Factory, ensuring efficient data movement and transformation across multiple sources. Resulted in a 30% reduction in data processing time and improved data accuracy.
- Setup all meta data tables, their configurations, store procedures, views for pipeline reusability to load using Generic Import pipelines.
- Reduced 60% to 70% development time of source to data lake and data lake to staging lay mappings by developing generic ADF pipelines.
- Setup all database objects needed for logging pipeline run information.
- Creation of ADF Linked Services, Data Sets, pipelines to read data from SAP tables using SAP Table linked service and load data into Azure Data Lake Storage Gen2.
- Creation of various types of Data Sources, Linked Services, Pipelines, Global Variables, Triggers, etc ADF objects required for pipeline development.
- Creation of Global, Linked Service, Data Source, pipeline parameters for reusability.
- Create ADF pipelines using various activities like Copy Data, Web, Lookup, foreach, store procedure, execute pipeline, etc.
- Uses various data flow transformations such as select, filter, join, derive column, exists, sequence, etc.
- Create ADF Self Hosted Runtime and read data from on premises source system like SAP, etc.
- Debugging ADF pipelines using Data Flow Debug Clusters for verifying the data or transformation results.
- Creation of Generic SCD Type 2 pipelines for loading data into historized tables.
- Creation of documentation of various processes, data models, data flow diagrams, ETL Architecture on Confluence.
- Configuration of GIT Repositories for various environments and releases.
- Creation of Azure Key Vault resource for password encryption in data pipelines.
- Creation of Azure Pipelines to execute PySpark Notebooks from Azure Data Bricks workspace.
- Creation of PySpark Notebooks in Azure Databricks to perform various transformations and loading.
- Creation of various azure resource consumption reports for budget optimization.
- Creation of Azure Logic App workflows for email notification in case of Data Pipeline failure or Fatal errors.
- Used ChatGPT to find various performance optimization, Data Testing techniques.
Database Tasks:
- Creation of various Azure SQL Server database objects such as schemas, tables, sequences, store procedures, views, etc.
- Help business analyst to identify various dimensions as per Report requirements and optimize model.
- Creating various Master Data and Meta Data tables, views & store procedures for data enrichment and job run logging information.
Documentation:
- Creation of documentation of various processes, data models, data flow diagrams, ETL Architecture, Data Pipelines, Database Objects on Confluence.
Team Activities:
- Participating in various SCRUM meetings for creating user stories, estimation, backlog grooming, retrospective, etc.
DevOps:
- Creation of code repositories in Azure DevOps and developing CI/CD release pipelines for deployment to UAT & PROD environment.
- Creation of CI/CD release pipelines to automatically deploy application code objects from Dev to UAT & PRD DevOps repositories.
- Creation of Azure Key Vault, credentials and integrating it with ADF Linked services, activities for retrieving the secrets.
Databricks, Azure Synapse Analytics, Data Warehousing, Microsoft Azure, Microsoft SQL-Server (MS SQL), Python
9/2020 – 6/2023
Tätigkeitsbeschreibung
Contract Type: Contract
Role: Data Engineer
Project: Energy Data Lake
Project Technology Stack
Cloud Platform: Microsoft Azure
Source System: Azure Blobs, REST API, MS SQL Server, Snowflake, CSV, Excel, XML, etc.
Target System: Microsoft Azure SQL DB, MS SQL Server, Snowflake, CSV
ETL Tool/Programming Language: Talend Data Integration, Azure Data Factory V2, Python
Other programming languages: Python, T-SQL, SnowSQL
Scheduling Tool: Azure Batch Service, Talend Management Console
Big Data, Data Warehousing, Microsoft Azure
3/2020 – 12/2020
Tätigkeitsbeschreibung
Contract Type: Contract
Role: ETL Developer
Project: Trade & Transaction Regulatory Reporting (TCIS/TAPI)
MIFIR/EMIR Transaction Regulatory Reporting to various LCAs.
Project Technology Stack
Source System: XML Files, Flat Files, Oracle
Target System: Oracle 19c, XML, CSV
ETL Tool: Informatica PowerCenter 10.2
Other programming languages: Python, Oracle SQL & PLSQL, Unix Shell Scripting
Scheduling Tool: Control-M
Informatica
1/2020 – offen
Tätigkeitsbeschreibung
Contract Type: Freelance
Role: ETL Developer
Project: Procurify Integration with DATEV
Read all bill details including purchase orders, approvals, attachments from Procurify, a cloud-based procurement management system and send it to the Flowwer2, a target system for the Procurify DATEV Connector. Flowwer2 is DATEV approved tool which can be connect to a specific DATEV Client and can send via structured data as well as attachments to DATEV.
Flowwer2 will be used to receive and send invoice data and related attachments (invoice.pdf, po.pdf, shipping slip.pdf AND approval log.pdf) to DATEV.
ETL
8/2019 – 12/2019
Tätigkeitsbeschreibung
Deutsche Boerse, Frankfurt am Main through Marlin Green Ltd
Vertragsart: Freiberuflich
Role: ETL Entwickler
Project: Regulatory Reporting Hub (RRH)
MIFIR/EMIR Transaction Regulatory Reporting to NCAs e.g. BaFin, AMF, etc.
Project Technology Stack
Source System: XML Files, Flat Files, Oracle
Target System: Oracle, XML, CSV
ETL Tool: Informatica Powercenter 10.2
Other programming languages: Oracle SQL & PLSQL, Unix Shell Scripting
Scheduling Tool: Control-M
Responsibilities
- Design, Develop and Maintain Informatica ETL/Data pipelines
- Performance tuning of ETL pipelines for faster loading in various environments
- Bug Fixing, Deployment, Production Support, Data Analysis
- Read data from XML & Flat files to load into staging, core layer and further to Delivery Area in Oracle database.
- Perform various cleansing and data completeness checks
- Enrich data from various reference/lookup tables and load into core layer
- Used various transformation like XML Source Qualifier, XML Parser, XML Generator, Transaction Control, Normalizer, lookup, update strategy, etc.
- Performance optimization of informatica mappings and sessions for faster loads
- Developed SCD Type1 and 2 mappings to load history data into data mart.
ETL, Informatica
10/2017 – 7/2019
Tätigkeitsbeschreibung
Commerzbank, Frankfurt am Main through JOB AG Source One GmbH
Vertragsart: Freiberuflich
Role: ETL Entwickler
Project
Compliance (CMC & CAF) - AML Reporting - Frankfurt & Singapore
This was a data Integration project which includes providing data from various banking applications like Murex Cash, Murex Equity, Murex Currency, etc. for compliance reporting.
Project Technology Stack
Source System: Flat Files, MS SQL Server
Target System: Oracle, Flat Files, Hadoop HDFS
ETL Tool: Informatica Powercenter 10.1, Informatica BDM
Other programming languages: Oracle SQL & PLSQL, Unix Shell Scripting, UC4 Scripting
Scheduling Tool: Automic UC4
Responsibilities:
- Design ETL Pipelines and ETL Architecture.
- Design Informatica ETL jobs as per the quality and software development standards.
- Source to target data mapping analysis and design.
- Analyze, design, develop, test and document Informatica ETL programs from detailed and high-level specifications, and assist in troubleshooting.
- Creation of project-related documents like HLD, LLD, etc.
- Created reusable transformations and mapplets
- Developed data ETL pipelines for Change Data Capture (CDC)
- Creation of data pipelines to load into Hadoop HDFS.
- Complex Informatica Powercenter ETL development and Quality Assurance.
- Design and develop various slowly changing dimension load e.g. Type 1, Type 2.
- Responsible for finding various bottlenecks and performance tuning at various levels like mapping level, session level, and database level.
- Extensive use of various active and passive transformations like Filter, Router, Expression, Source Qualifier, Joiner, and Look up, Update Strategy, Sequence Generator, Rank, and Aggregator.
- Debugging and troubleshooting Sessions using the Informatica Debugger and Workflow Monitor.
- Implement various loads like Daily Loads, Weekly Loads, and Quarterly Loads.
- Conduct Unit tests, Integration tests, performance tests, etc.
- Contact point for problems in the Production environment and Defects Tracking with business. (3rd-Level-Support)
- Supported deployment team in various environment's deployments
- Developed database objects including tables, Indexes, views, sequences, packages, triggers and procedures to troubleshoot any database problems
ETL, Informatica, Oracle-Anwendungen
6/2015 – 9/2017
Tätigkeitsbeschreibung
Aldi Sued, Muelhiem an der Ruhr through Templeton & Partners Ltd
Vertragsart: Freiberuflich
Role: ETL Tech Lead
Project: Retail Enterprise Data Warehouse
Project Technology Stack
Source System: MS SQL Server, Flat Files, Oracle
Target System: Oracle Exadata
ETL Tool: Informatica Powercenter 10.1
Other programming languages: Oracle SQL & PLSQL, Unix Shell Scripting
Scheduling Tool: Informatica Scheduler
Project Methodology: Scrum/Agile
Responsibilities
- Participate in scoping, data quality analysis, source system data analysis, target system requirements, volume analysis and migration window determination.
- Implement various loads like Daily Loads, Weekly Loads, and Quarterly Loads.
- Perform data cleansing tasks.
- Perform test using sample test data in accordance with the client data migration/integration needs.
- Contact point for problems in the Production environment and Defects Tracking with business. (3rd-Level-Support)
- Helped Business Analyst in refining mapping specification documents.
- Developed Informatica Powercenter mappings to move data from stage to target tables
- Developed PL/SQL Packages, Procedures and Functions accordance with Business Requirements.
- Documented various input databases and data sources.
- Debugging and troubleshooting Sessions using the Informatica Debugger and Workflow Monitor.
- Responsible for finding various bottlenecks and performance tuning at various levels like database, ETL, etc.
- Created Materialized Views and partitioning tables for performance reasons.
- Worked on various back end Procedures and Functions using PL/SQL.
- Developed UNIX shell scripts to perform various user requirements.
- Designing Tables, Constraints, Views, and Indexes etc.
- Developed database objects including tables, Indexes, views, sequences, packages, triggers and procedures to troubleshoot any database problems
- Tuned complex Stored Procedures for faster execution
- Responsible for Analyzing and Implementing the Change Requests.
- Involved in handling the changes in compiling jobs and scripts according to the database changes.
ETL, Informatica
1/2015 – 5/2015
Tätigkeitsbeschreibung
HRS, Köln through Informationsfabrik GmbH
Vertragsart: Freiberuflich
Role: Senior ETL Consultant
Project: Hotel Enterprise Data Warehouse
Project Technology Stack
Source System: MS SQL Server, Flat Files, Oracle, XML
Target System: Sybase IQ
ETL Tool: Informatica Powercenter
Other programming languages: Oracle SQL & PLSQL, T-SQL, Unix Shell Scripting
Scheduling Tool: Control-M
Project Methodology: Waterfall
Data Modeling: Dan Linstedt Data Vault Modelling
Responsibilities
- Use of Data Vault as Data Modeling approach for the Hotel Enterprise Data Warehouse.
- Define the ETL Architecture to load the Data Vault model and data mart.
- Analyzed sourced data to identify candidates for Hub, Satellite and link tables
- Developed Informatica Powercenter mappings, sessions and workflows to load Hub, Link & Satellite tables.
- Added Hub, Link & Satellite tables including business keys, Surrogate keys, descriptive satellite information to Data Vault Model.
- Implement various loads like Daily Loads, Weekly Loads, and Quarterly Loads.
- Perform data cleansing tasks.
- Perform test using sample test data in accordance with the client data migration/integration needs.
- Contact point for problems in the Production environment and Defects Tracking with business. (3rd-Level-Support)
- Developed Informatica Powercenter mappings to move data from stage to core and data mart layer.
- Documented various input databases and data sources.
- Debugging and troubleshooting Sessions using the Informatica Debugger and Workflow Monitor.
- Developed UNIX shell scripts to perform various user requirements.
Data Warehousing, ETL, Informatica
1/2014 – 9/2014
Tätigkeitsbeschreibung
Karstadt, Essen through Questax AG
Vertragsart: Freiberuflich
Role: Senior ETL Consultant
Project:
Karstadt information systems for measures and analytics (KARISMA)
The goal of this project was to create centralized Analytical and Reporting system for Karstadt Warehouse GmbH. The major part of the project was to replace existing SAP BW Reporting system and create new enterprise data warehouse with Informatica PowerCenter 9.5.1 for ETL and Cognos 10 for Reporting. Informatica PowerExchange 9.5.1 with BCI (Business Content Integration) & Data Integration using ABAP methods were used to connect to Karstadt SAP Retail system and read data from SAP Standard and Customized Data Sources. IBM Netezza 7 was used as Target system with Informatica PowerExchange for Netezza.
Project Technology Stack
Source System: SAP, IDOC, Flat Files, XML
Target System: IBM Netezza
ETL Tool: Informatica Powercenter 9.5, Informatica Powerexchange 9.5
Other programming languages: Oracle SQL & PLSQL, Unix Shell Scripting
Scheduling Tool: Informatica Scheduler
Project Methodology: Waterfall
Data Warehousing, ETL, Informatica
7/2011 – 12/2013
Tätigkeitsbeschreibung
Deutsche Bank, Frankfurt through Datamatics Global Solutions GmbH & Hays AG
Job Type: Employee & Vertragsart: Freiberuflich
Role: Senior ETL Consultant
Informatica Powercenter ETL Tool Development & Support
for Data Migration and Data Integration Projects.
Projects:
#1 Retail Banking - Postbank Savings Deposit Accounts Migration
This project involves migration of savings account
deposits data from Mainframe IBM Z/OS systems to SAP
Deposit Management application using PowerCenter 9.1 HF
1 and Power exchange 9.1. This involves reading data
from flat files, mainframe data sets, oracle and
writing data into flat files which will be then
uploaded into SAP Deposit Management application. Power
exchange 9.1 was used for the connecting to mainframe
and reading of Mainframe data sets. ETL Informatica
PowerCenter 9.1 was used for the extraction,
transformation, and loading of data into the target
systems. The project had only single load i.e. one time
migration. This project involves data extract,
transformation and loads of 250 to 500 million records.
#2 Retail Banking - Postbank Savings Deposit Accounts Integration
This project involves integration of savings deposit
account data from SAP, Mainframe systems in to the
Oracle enterprise data warehouse.
ETL activities includes loading of this data into
Oracle enterprise data warehouse used for Retail
banking reporting in Deutsche Bank, Germany. ETL
Informatica PowerCenter 9.1 HF 1 was used for the
extraction, transformation, and loading of data into
the target systems. The project had several loads like
Daily Loads, Weekly Loads, Monthly Loads, Quarterly
Loads, and YTD Loads. These loads were implemented
using Incremental Loading (Change Data Capture), and
Slowly Changing Dimension Mappings. This project
involves data extract, transformation and loads of 30
to 50 million records.
#3 Retail Banking - Auto Deployment
Deutsche Bank started this project to save huge time
for deployment of all ETL components e.g. informatica
powercenter workflows, informatica powerexchange data
maps, parameter files, shell scripts,etc. This project
helped Deutsche Bank to save time spend by deployers in
deploying multiple environments, error free deployments
and hence reduce cost.
#4 Retail Banking - LDAP Integration
Data Warehousing, ETL, Informatica
11/2010 – 6/2011
Tätigkeitsbeschreibung
American Home Mortgage Servicing Inc, Texas through Hitachi Consulting Pvt Ltd, Pune
Job Type: Employee
Role: Senior ETL Consultant
Project :
Home Mortgage Enterprise Data Warehouse
Data Warehousing, ETL, Informatica
10/2008 – 6/2010
Tätigkeitsbeschreibung
Sigma Systems, Pune
Job Type: Employee
Role: Software Engineer
Oracle, Unix, Java Development & Support
Oracle-Anwendungen, UNIX, Java (allg.)
Zertifikate
Ausbildung
Pune India
Über mich
- Highly proficient in Architecture, Design, Development, Implementation and Support of ETL or ELT data processing pipelines for Data warehouses or Data Lakes using ETL tools like Microsoft Azure Data factory V2, Informatica Powercenter, Talend.
- Proficient in designing & customizing data models using data modelling techniques likes dimensional modelling, data vault modelling, etc.
- Worked with various structured and semi structured data sources like SAP, Azure SQL Server Database, REST API, CSV, XML, JSON, PARQUET, etc.
- Designed various data layers e.g. stage layer, core layer, reporting layers, etc. for efficient & in time data processing with high quality.
- Worked closely with Data Architect, Business Analysts, Product Owners, Tech Leads to help in drafting design, architecture, requirement specification, etc for developing ETL/ELT data processing pipelines.
- Extensively worked in Scrum projects having high involvement in various phases e.g. sprint planning, creating user stories, tasks, etc.
Weitere Kenntnisse
Data Pipeline/ELT:
- Design, Develop and Maintain ETL/Data pipelines using Azure Data Factory and Python.
- Designed and led the implementation of end-to-end data pipelines on Azure Data Factory, ensuring efficient data movement and transformation across multiple sources. Resulted in a 30% reduction in data processing time and improved data accuracy.
- Setup all meta data tables, their configurations, store procedures, views for pipeline reusability to load using Generic Import pipelines.
- Reduced 60% to 70% development time of source to data lake and data lake to staging lay mappings by developing generic ADF pipelines.
- Setup all database objects needed for logging pipeline run information.
- Creation of ADF Linked Services, Data Sets, pipelines to read data from SAP tables using SAP Table linked service and load data into Azure Data Lake Storage Gen2.
- Creation of various types of Data Sources, Linked Services, Pipelines, Global Variables, Triggers, etc ADF objects required for pipeline development.
- Creation of Global, Linked Service, Data Source, pipeline parameters for reusability.
- Create ADF pipelines using various activities like Copy Data, Web, Lookup, foreach, store procedure, execute pipeline, etc.
- Uses various data flow transformations such as select, filter, join, derive column, exists, sequence, etc.
- Create ADF Self Hosted Runtime and read data from on premises source system like SAP, etc.
- Debugging ADF pipelines using Data Flow Debug Clusters for verifying the data or transformation results.
- Creation of Generic SCD Type 2 pipelines for loading data into historized tables.
- Creation of documentation of various processes, data models, data flow diagrams, ETL Architecture on Confluence.
- Configuration of GIT Repositories for various environments and releases.
- Creation of Azure Key Vault resource for password encryption in data pipelines.
- Creation of Azure Pipelines to execute PySpark Notebooks from Azure Data Bricks workspace.
- Creation of PySpark Notebooks in Azure Databricks to perform various transformations and loading.
- Creation of various azure resource consumption reports for budget optimization.
- Creation of Azure Logic App workflows for email notification in case of Data Pipeline failure or Fatal errors.
- Used ChatGPT to find various performance optimization, Data Testing techniques.
Database Tasks:
- Creation of various Azure SQL Server database objects such as schemas, tables, sequences, store procedures, views, etc.
- Help business analyst to identify various dimensions as per Report requirements and optimize model.
- Creating various Master Data and Meta Data tables, views & store procedures for data enrichment and job run logging information.
Documentation:
- Creation of documentation of various processes, data models, data flow diagrams, ETL Architecture, Data Pipelines, Database Objects on Confluence.
Team Activities:
- Participating in various SCRUM meetings for creating user stories, estimation, backlog grooming, retrospective, etc.
DevOps:
- Creation of code repositories in Azure DevOps and developing CI/CD release pipelines for deployment to UAT & PROD environment.
- Creation of CI/CD release pipelines to automatically deploy application code objects from Dev to UAT & PRD DevOps repositories.
- Creation of Azure Key Vault, credentials and integrating it with ADF Linked services, activities for retrieving the secrets.
Persönliche Daten
- Englisch (Muttersprache)
- Deutsch (Grundkenntnisse)
- Europäische Union
Kontaktdaten
Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.
Jetzt Mitglied werden