Senior Data Engineer / MLOps
- Verfügbarkeit einsehen
- 0 Referenzen
- auf Anfrage
- 28203 Bremen
- auf Anfrage
- de | en | fr
- 11.06.2024
Kurzvorstellung
Qualifikationen
Projekt‐ & Berufserfahrung
6/2022 – 12/2022
Tätigkeitsbeschreibung
Cluster Migration of Internal Data Warehouse
As data volumes continue to grow for eCommerce companies and the number of data consumers within the
organization is increasing, sometimes old infrastructure will not be able to keep up with the challenges.
Additionally, In this particular case, the computing and warehousing cluster has to be on-premise for data
security reasons. After new cluster-infrastructure had been provided by an external provider, all data warehouse
and computing logic has to be migrated from the old infrastructure to the new infrastructure. An additional
challenge is to maintain backwards compatibility of the migrated processes at all times.
Key Achievements:
• Migration and Deployment of 30+ airflow DAGs with 20 – 50 Tasks each on new infrastructure
• Co-development of a python client library for Apache Livy that is used by 100+ airflow tasks
• Deployment of 20+ Apache Hive databases with 10 – 50 tables each in three Data Warehouse layers via
Ansible
• Code review of 5-10 merge requests per week
Technologies: Apache airflow, Python, Apache Hive, Apache Spark, PySpark, Apache Livy, Apache Hadoop,
Ansible
Ansible, Apache Hadoop, Apache Spark, Python
2/2020 – 6/2022
Tätigkeitsbeschreibung
Platform for Real Time Fraud Detection in eCommerce
In order to prevent financial and reputational loss in eCommerce platforms an automated detection of fraud
patterns in online shops is needed. The software should be able to scale out over multiple shop systems and data
sources. Further requirements are monitoring traffic in real time and incorporating expert knowledge alongside
machine learning models.
Key Achievements:
• Lead design of the platform
• Implementation of a proof of concept from which 80% of code made the first product iteration
• Technical Lead for a team of 5 Developers
• Successful deployment and zero downtime operations on customer premises at around 15 million events
per day
• Design of cloud based testing environment that can be brought up in less than 15 minutes (Infrastructure
as Code) and handle up to 10 times of production workload
Technologies: Apache Flink, Apache Kafka, Redis, Terraform, AWS, kubernetes, helm docker, Datadog
Amazon Web Services (AWS), Docker, Apache Flink, Apache Kafka, Kubernetes
6/2018 – 2/2021
Tätigkeitsbeschreibung
Webtracking Event Pipeline with snowplow in AWS
For an eCommerce Platform it is crucial to have a detailed picture of customer behaviour on which business
decisions can be based. Either in real-time or from the data warehouse. For that a flexible, scalable, and fieldtestet solution is necessary which can run in the cloud. Additionally, all browser events need a custom
enrichment with business information from the backend in order to provide necessary context e.g. for „Add to
Cart“-events.
Key Achievements:
• Integration of snowplow event-pipeline in cloud based shop architecture
• Day to day operations of event-pipeline at ca. 4 million events per day
• Co-Engineering of custom enrichment in the webshop backend (ca. 1000+ lines of code) and handover
of ownership to the backend team
• Setup of custom real time event monitoring (< 1s latency) with elasticsearch and kibana
• Setup of custom scheduling and deployment processes for 5 components of the snowplow event-pipeline
Technologies: snowplow, kubernetes, amazon EMR, amazon kinesis, amazon redshift, Apache airflow, kibana,
elasticsearch, NodeJS, Gitlab CI
Elasticsearch, Kubernetes
6/2018 – 1/2021
Tätigkeitsbeschreibung
Product Recommendation Engines: Collaborative Filtering and Item
Similarity with Neural Nets
To enrich the shopping experience of the customer and to drive additional sales, the eCommerce platform should
be able to recommend customers additional products. Two orthogonal strategies are employed: Product based
similiarity based on neural network embeddings and collaborative filtering based on user behaviour.
Additionally, Performance monitoring for the recommendations is needed.
Key Achievements:
• Productionize both models based on proof of concepts by ML engineer including data aquisition,
running of the model and data output
• Scheduling and operations of productionized models, including 3 different code bases and more than 5
regularly scheduled jobs
• Operationalization of 10+ performance metrics over 5 dashboards for stakeholders
Technologies: Python keras, Python pandas, amazon EMR, Apache Mahout, amazon Redshift, apache airflow,
apache superset
Amazon Web Services (AWS), Keras, Pandas, Python
6/2017 – 2/2021
Tätigkeitsbeschreibung
ETL-Pipeline Architecture with Apache Airflow und kubernetes
A datadriven company needs to have a reliable and scalable infrastructure as a key components of the corporate
decision making. Engineers as well as analysts need to be enabled to create ETL-processes and ad-hoc reports
without the need to consult with a data engineer. The data architecture of the company needs to provide
scalability, clear separation between testing and production and ease of use.
Key Achievements:
• Leading conception of cloud based infrastructure based on the above requirements
• Initial training of 5 developers an onboarding of more than 10 developers since
• Initial setup and operation of apache airflow with intially ca. 10 jobs, scaling up to more than 100
regular scheduled jobs at present
Technologies: Apache airflow, kubernetes, docker, AWS, gitlab CI
Amazon Web Services (AWS), Docker, Kubernetes
2/2017 – 11/2018
Tätigkeitsbeschreibung
A/B-Testing Plattform
In order to enable an eCommerce organization to become a datadriven organization there must be (among other
things) a framework present to compare different version of the website against each other. Many members of
the organization and departments need to be able to create and conduct experiments without the assistance of a
data engineer. Anther important factor for the framework was the usage Bayesian statistics.
Key Achievements:
• Leading Conception of testing framework including randomization logic, statistical modelling and
grapical presentation in the frontend
• Implementation of proof of concept for statistical engine
• Implementation of production code for frontend, backend, statistical engine
• Training of stakeholders from 3 different departments in methodology and statistical background of A/Btesting
Technologies: python PyMC3, Python SciPy, Apache Spark, Python pySpark, Apache airflow, docker,
kubernetes, VueJS, Redshift
Apache Spark, Docker, JavaScript, Kubernetes, Python
Zertifikate
Ausbildung
Kiel
Über mich
Weitere Kenntnisse
Tensorflow, Apache airflow, pySpark, pyMC3, Java, Spring, JUnit, Mockito, maven, ant, hybris, JavaScript, NodeJS, ExpressJS, VueJS, ChartJS, MySQL, PostgreSQL, Redis, amazon Redshift, Cassandra, AWS, kubernetes, helm, docker, terraform, Gitlab CI, Apache airflow, datadog, Apache Spark, Apache Flink, Apache Kafka, amazon Kinesis, snowplow, Object Oriented Programming, Test Driven Development, Functional Programming, Scrum Master, Linux
Persönliche Daten
- Deutsch (Muttersprache)
- Englisch (Fließend)
- Französisch (Grundkenntnisse)
- Europäische Union
Kontaktdaten
Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.
Jetzt Mitglied werden