Big Data, Python, Data Vault, BI, DWH
- Verfügbarkeit einsehen
- 0 Referenzen
- auf Anfrage
- 12207 Berlin
- DACH-Region
- ur | en | de
- 05.11.2024
Kurzvorstellung
Qualifikationen
Projekt‐ & Berufserfahrung
4/2022 – offen
Tätigkeitsbeschreibung
AWS Cloud Adoption and Infrastructure Consulting
Advised chemical industry clients on AWS cloud adoption, data management, and infrastructure architecture, establishing a robust data platform. Proposed architecture solutions, evaluating AWS building blocks to align with client needs, while actively contributing to project planning and team communications.
Project and Incident Management
• Coordinated with Cyber Security and Group Admins for secure implementations and incident management.
• Managed IT demands and change requests in ServiceNow, enhancing efficiency.
• Handled GitHub team permissions, ensuring secure project access.
• Facilitated Databricks GitHub app requests, managing repos for various environments.
Data Lake Management and AWS Setup
• S3 Data Collection: Led sessions to define data categories for S3, aligning with company classifications.
• AWS Transfer Family with Lambda: Implemented Transfer Family with Lambda for custom authentication, enhancing data security.
• IAM Policies: Consulted on IAM policy setup for secure, cross-account S3 access.
• Redshift Setup: Managed Redshift cluster setup, consulting on keys and Copy Commands for optimization.
• Liquibase: Advised on Liquibase adoption for database versioning in CICD.
Architecture and Security Management
• AWS Account Documentation: Documented purpose, environment classification, IAM policies, S3 access, service principals, GitHub OIDC integration, and VPC connections to on-prem via Transit Gateway.
• Subnet and CIDR Blocks: Designed and documented subnet layouts for network security.
• Firewall Management: Controlled firewall openings/closings per compliance standards.
• Resource Monitoring: Monitored AWS resources, optimizing costs.
• Databricks Access: Configured SCIM passthrough and meta-IAM roles for identity management.
• GitHub OIDC Integration: Established IAM roles for GitHub-to-AWS access via OIDC.
Databricks Data Engineering and Data Transfer
• Unity Catalog: Deployed with Terraform for data governance.
• Delta Tables and Workflows: Created Delta Tables and workflows for streamlined data processing.
• Python Notebooks: Built notebooks for analysis, including data cleanup and public datasets (ELWIS, PEGELONLINE WSV, DWD).
• Data Transfer and Monitoring: Developed Shell/SFTP scripts and managed BOXI exports over SFTP for data archiving.
Outcome: Enabled Cloud adoption for Data and Analytics needs (including CICD), increased Data acquisition, and flexible Data Pipelines for Reporting use-cases.
Tools: Project Data Management, AWS, Architecture, Python, RDS, Redshift, DMS, S3, Lambda, Transfer Family, Databricks, Spark, Subnets/CIDR allocation, IAM Roles, IAM Policies, Bucket Policies, Microsoft Graph API, Power BI, SAP BI (BOXI)
Amazon Web Services (AWS), Cloud-Services, Cloud Computing, Cloud Spezialist, Databricks, Datenmanagement, IT-Berater, Projektmanagement (IT), SAP BusinessObjects (BO)
2/2022 – 9/2024
Tätigkeitsbeschreibung
Datenbankaufbau und -strukturen nach Data Vault Design, 3NF und Data Marts (Consumption Layer).
Engaged as a Data Vault 2.0 Modeler with a Group Data Office from Insurance and Reinsurance business, for Group implementation of Commercial Insurance (PLC), Reinsurance data collection, sourced from various Operating Entities across various Geographical locations.
• Updating and enhancing Group Commercial Common/Unified Data Model (nach Data Vault 2.0).
• Updating Data Standards for the Data Model.
• Enhancing Global Definitions (Global Business Glossary)
• Developing business Data Examples for Operating Entities.
• Writing Mapping Guidelines.
• Unifying CAT Risk 3NF Data Model into group Commercial DV 2.0 Data Model.
• Participation in Enterprise Ontology clarification Workshops
• Business examples and presentation slide decks for various Scopes in the model, Reinsurance (FAC/Treaty/Quota Share), Incident/Claim classifications, Policy and Object Terms (L&Ds), IIP (International Insurance Programs) to name a few
• Data Mapping and Validation Example preparations, show-casing bi-directional mapping between group Commercial DV 2.0 Data Model and CAT Risk and Cyber 3NF Data Models.
• Writing detailed Mapping Guideline for Data Deliveries into Group Commercial Common Ingestion DV2.0 Data Model.
• Led solutioning in Data Model Workshops.
• Unifying Claims Data Model into group Commercial DV 2.0 Data Model.
• Participation in Workshops with Data Architects from other Business units.
• Member of Community of Practitioners on Business Intelligence, Architecture and Data Modeling.
Outcome: Enabled Group Data Reporting for Portfolio steering on Common Data Standards and Global Business Glossary
Tools: SQLDBM, PostgresSQL, Azure Synapse Analytics Pool DB SQL, GitHub, Confluence, Informatica Axon (GBG), Sharepoint, Excel, Powerpoint, Enterprise Ontology
Azure Synapse Analytics, Data Vault, Datenarchitekt, Datenmodelierung, Postgresql
10/2021 – 1/2022
Tätigkeitsbeschreibung
Engaged as a Data Vault 2.0 developer for implementation of IFRS17 Account Standard requirements and adaptions, at an Insurance company in Munich. As an Integration Layer developer, I was mainly responsible for:
• Overseeing operations in Integration Layer
• Analysing and Developing new Raw Vault and Business Vault hubs, links and satellites
• Ensuring relationships between entities loaded from different source systems.
• Integrating new file based data source for Top Adjustments for month end closings.
• Working with Effectivity Satellites
• Met requirements for Ledge Specific postings (GAAP Codes) and Automated Reversals.
• Provided Loading Templates for Premiums (Policy), Claims and Cash transactions.
• Analysing Data Quality errors
• Generating SQL packages and carrying out deployments.
• Preparing Visual Data Vault Diagrams!
• Documentation in Confluence
• Exchanging with other IL Developers, Engineering Manager and Product Managers.
• Raising Pull Requests in Github, and Merging them to Master (after a successful review).
• Creating Test Cases in Tricentis TOSCA
• Working in an agile 10 days Sprint basis, creating User Stories and allocating Story Points.
• Participating in Feature Grooming sessions.
• Using Jenkins to manage and schedule Runtime jobs
Tools: SQL Developer, Oracle, GitHub, Tricentics TOSCA, Eclipse, Jenkins
Oracle Database, SQL Entwickler, Eclipse
5/2021 – 10/2021
Tätigkeitsbeschreibung
Working as Project Manager for GDPR implementation in Data Lake storage at a Real State Platform company.
• Responsible for coordination between tech team and legal.
• Engagement with external DPO, to clarify GDPR requirements.
• Over-seeing team plannings.
• Stake holder management
• Communication with data consumers and producers.
Tools: Miro, Confluence, JIRA , Project documentation
Datenschutz, Projektmanagement
1/2021 – 7/2021
Tätigkeitsbeschreibung
Working as Data Engineer for Fashion eCommerce Shop, on a migration project, to migrate fashion
Products data from Product Information Management system (PIM / Akeneo) community edition to new Enterprise version.
The solution was developed in Python, to pull data from older PIM via APIs, join and transform, and Publish into new PIM via APIs.
• Performed Data Mapping between old Data Structures and new Data Structures.
• Finalized Data Transformation requirements analysis.
• Wrote data processing and transformation modules.
• Wrote module responsible for interacting with APIs.
• API Authentication
• Managed mapping and transformation rules in Json files.
Python Libraries:
requests, multiprocessing, Json, logging
Technologies: Python 3.7, PyCharm, Akeneo PIM, Restful APIs, JSON, CSV
API-Entwickler, Python, Json, Representational State Transfer (REST), Produkt- / Sortimentsentwicklung
7/2020 – 9/2021
Tätigkeitsbeschreibung
Developing and Maintaining Middleware Restful APIs for Integration use-cases, such as:
• between SAP and Salesforce
• DocuSign and Salesforce,
• Microservices, legacy ERPs and Internal Systems that maintain Parts information
• GCP
Working with Json, XML and iDoc (SAP) Formats, to develop Integrtations and Gateways. Analysis of source SAP Data structures and Data models, Salesforce and real-time micro-services. Data Mapping between SAP, Salesforce and the Micro-service.
Implementation of HMAC.
• SQL development
• Stored procedure development
• Use of Transaction Management
• Exception Handling.
Regularly prepared Swagger specifications, Test Evidence and UAT documentation.
Technologies: MSSQL Server 2014 (SQL/T-SQL), Middleware (IBM App Connect Studio), JSON/XML, CSV, Swagger, GCP, Restful APIs
Transact-Sql, Microsoft SQL-Server (MS SQL), API-Entwickler, XML, Json
5/2020 – 7/2020
Tätigkeitsbeschreibung
As a Data Vault developer, I was responsible for enriching Raw Vault and Business Vault with new Satellites containing SAP Bookings data.
• Analysis of existing Data Vault Model and Data Loading routines.
• Performed Extension in existing Data Vault Model.
• Creation of new Links and Satellites in Raw and Business Vault.
• Working with Events Data Processing in Data Vault.
Creation of Stored Procedures for daily loading of new data source. Enabled data extraction through BCP and Powershell.
Data Warehousing, Transact-Sql, Data Vault, Microsoft SQL-Server (MS SQL)
11/2019 – 2/2020
Tätigkeitsbeschreibung
Fast paced intensive development of multiple data integration modules in Python on Linux in Docker.
These developments enabled data integration and provision for a new web based software:
Integration with Nifi over nifi-api. This component works with Json retrieved from Nifi Rest API, to traverse through Nifi Flow, Process Groups and Processors.
Metadata handling component. This component handles for target MySql database data type conversions, against Big Data AVRO primitive data types based meta data.
Component to download large CSV files over Rest API with Streams (use of requests iter_content) in Parallel threads.
Loading of CSV files into MySql using Data Load Infile command over Sqlalchemy(+pymysql)
Data Integration Job. A main python job that combines other components together.
Python Libraries:
requests, Sqlalchemy, multiprocessing, pandas, dotenv, logging
Technologies: Python 3.7, Linux, Docker, PyCharm, Liquibase, Putty, Real VNC, Citrix.
ETL, Mysql, Python
4/2019 – 7/2019
Tätigkeitsbeschreibung
Erfahrung mit Python
Data Vault 2.0 basiert Data Lake Entwicklung und Beratung mit Hadoop Cloudera (CDH) und Amazon Stacks.
Hadoop Cloudera (CDH):
- GDPR Compliant HDFS Data Lake using AVRO file format.
- Hive/Impala based Data Vault Entities & Information Mart.
Amazon S3 and Redshift:
- S3 based Data Lake and external Athena/Redshift tables.
- Redshift based Data Vault and Virtualised Information Mart.
Pre-computed Hash keys Materialised as AVRO files in Lake.
Technologien: Python 3,7, AWS, S3, Redshift, DMS, SQS, Cloudera, Avro, Hive, Impala
Apache Hadoop, Big Data, Python, Amazon Web Services (AWS)
9/2018 – 8/2019
Tätigkeitsbeschreibung
● Rest APIs and Gateways with Json, Xmls, Idocs and Javascript.
● Data structure / model analysis between SAP, Salesforce and real-time micro-services and respective data mappings.
● Development in MS SQL Server 2014 SSMS.
● SQL development and stored procedure development with Transaction Management and Exception Handling.
● Test Evidence and UAT documentation.
Technologien: MSSQL Server 2014, Middleware, JSON/XML, CSV
Microsoft SQL-Server (MS SQL), Idoc, Json, Representational State Transfer (REST)
8/2017 – 3/2018
Tätigkeitsbeschreibung
Part of FE team, responsible for implementation of MicroStrategy Use Cases for the retail business.
Responsibilities include:
Business Validation of Requirements, with RE & Arch. team.
Solution Concept Workshops, with Arch. & Business teams.
Implementation of MicroStrategy Use Cases (package 2 & 3).
Liaising between Backend and Frontend teams.
Extensive Development Experience with MSTR Documents.
Use of Panel Stacks, Selectors, Grids and Graph components.
Use of Multiple Datasets.
Extensive experience with Visual Insights and OLAP Metrics.
Datasets with Level and Derived Metrics.
Technical feats include:
● Use of Transaction Services.
● Mapping of Attributes (IDs, Forms).
● Parent-Child relationships & Hierarchies.
● Use of multiple Datasets, based on multiple Data Marts.
Advanced Topics include:
● Setting up MicroStrategy Job Prioritisations
● iCube Optimisation & Incremental Refresh reports.
Operational tasks include bi-weekly deployments.
Microstrategy, Data Warehousing, Oracle-Anwendungen
9/2015 – 2/2017
Tätigkeitsbeschreibung
I was responsible for leading BI and analytics function of the company, a member of management team. Close cooperation with other Heads, Team Leads and C-levels. Vendor management (Microstrategy). Streamlined many data acquisition, processing and KPI calculation challenges (e.g. Payment processing). Built visualizations and dashboards, together with maths intensive calculations for returns and portfolio performance.
Responsible for leading BI and Analytics function of Crosslend.
Close collaboration with Executives, Marketing, Operations, Finance, Product, Engineering and DevOps.
Enabled self service BI, rolled-out Microstrategy.
Investor Fact sheets and pitch-decks.
Financial Metrics, IRRs, Annualized Net Returns (unadjusted) and Default Curves.
Marketing Performance dashboards and reports (per Channel).
Customer Insights for Operations and CC team.
Payment processing and overdue related KPIs.
Visualizations, simulations and correlations.
Successful closing of audits (positive opinion).
Microstrategy, Data Warehousing, Business Intelligence (BI), Mysql, ETL
7/2014 – 8/2015
Tätigkeitsbeschreibung
I was responsible for Data Warehouse Architecture and managing company relationship with Exasol (service provider), trained and hired people (DWH Engineers), built overall DWH Architecture, Infrastructure, integrated unstructured NoSQL data (Mongo DB), modeled company core business tables, wrote Finite State Machine (for IFRS based classification), successful closing auditing (a pre-req for series-B funding).
Responsible for Data Warehouse Architecture and Data Engineering Team.
Managing Data Warehouse technology infrastructure and service providers.
Data Modelling company core Revenue and Accounting Fact tables.
Marketing data mart, performance data at campaign and keyword level (Hierarchy).
Finite state machine (for IFRS based classification) and Payment Waterfall calculations.
Data historisation design concepts.
Integration of unstructured NoSQL data (Mongo DB).
Successful closing of audits and series B funding.
Tech-stack: Exasol, Mongo DB, Postgres, Pentaho Kettle, Python and LUA.
Online Analytical Processing, Data Warehousing, Open Source, Postgresql, Mongodb, ETL, Datenbankentwicklung, Lua Scripting, Python
8/2012 – 6/2014
Tätigkeitsbeschreibung
I was part of ERP/MIS team, responsible for Customer Analytics pipeline. Carried out wide set of responsibilities and functions. Came across aggregation requirements using Hadoop (Java Map/Reduce). Oracle, Exasol, Pentaho Kettle technology stack. Lead Oracle DWH migration to a new HW. Re-wrote legacy ETLs, migrated IBM Unica CRM in-house, managed freelancers.
Responsible for Customer pipeline within Zalando BI.
Cohort trend analysis for customers -Hyperlink entfernt-
Analysis of website click log files using Hadoop (Java Map/Reduce).
Design and Development of Customer Survey Data (Oracle PL/SQL).
Interfacing operational subset for forecast analysis (Exasol).
Migration of IBM Unica CRM in house. Redesigning CRM Data Model and simplifying ETLs.
Leading migration of Oracle DB to new HW, improving backup and recovery options.
Tech-stack: Hadoop, Oracle, Exasol, Pentaho Kettle, PostgresSql and Business Objects
Apache Hadoop, Data Warehousing, SAP BusinessObjects (BO), Postgresql, Oracle Database, ETL, CRM Beratung (allg.), Enterprise Resource Planning
Zertifikate
SqlDBM
dbt Labs
Ausbildung
Lahore, Pakistan
Über mich
Weitere Kenntnisse
● Datenmodelierung, Data Vault 2.0, Data Standards, Data Governance - Erfahrung 2J+
● Data Vault 2.0 Zertifiziert.
● SqlDBM Fundamentals Zertifiert
● dbt Fundamentals Zertifiert.
● Oracle Zertifiziert (11g DBA).
● MicroStrategy, Oracle, Exasol, Python
Momentan arbeite ich an Kenntnis Erweiterung in Databricks.
Persönliche Daten
- Englisch (Fließend)
- Deutsch (Fließend)
- Urdu (Muttersprache)
- Europäische Union
Kontaktdaten
Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.
Jetzt Mitglied werden