Senior Data Engineer / MLOps

offline

Verfügbarkeit einsehen
0 Referenzen

auf Anfrage
28203 Bremen
auf Anfrage
de | en | fr
11.06.2024

Kurzvorstellung

			Senior Data Engineer / MLOps mit Cloud-Erfahrung und besonderem Schwerpunkt auf dem Apache Toolstack.
		

Qualifikationen

		 Amazon Web Services (AWS)5 J.
 Apache airflow
 Apache Flink
 Apache Kafka2 J.
 Apache Spark2 J.
 Big Data
 Docker5 J.
 Java (allg.)
 Kubernetes5 J.
 Python4 J.

		

Projekt‐ & Berufserfahrung

Senior Data Engineer 
									Multichannel-Retailer, Non-Food, 50.000+ Employees, Remote								

6/2022 – 12/2022 (7 Monate)

Details anzeigen

Tätigkeitszeitraum

6/2022 – 12/2022

Tätigkeitsbeschreibung

Cluster Migration of Internal Data Warehouse

As data volumes continue to grow for eCommerce companies and the number of data consumers within the
organization is increasing, sometimes old infrastructure will not be able to keep up with the challenges.
Additionally, In this particular case, the computing and warehousing cluster has to be on-premise for data
security reasons. After new cluster-infrastructure had been provided by an external provider, all data warehouse
and computing logic has to be migrated from the old infrastructure to the new infrastructure. An additional
challenge is to maintain backwards compatibility of the migrated processes at all times.

Key Achievements:
• Migration and Deployment of 30+ airflow DAGs with 20 – 50 Tasks each on new infrastructure
• Co-development of a python client library for Apache Livy that is used by 100+ airflow tasks
• Deployment of 20+ Apache Hive databases with 10 – 50 tables each in three Data Warehouse layers via
Ansible
• Code review of 5-10 merge requests per week

Technologies: Apache airflow, Python, Apache Hive, Apache Spark, PySpark, Apache Livy, Apache Hadoop,
Ansible

Eingesetzte Qualifikationen

Ansible, Apache Hadoop, Apache Spark, Python

Lead Developer / Data Engineer 
									IT Consultancy, Internal Product Development, 150+, Bremen								

2/2020 – 6/2022 (2 Jahre, 5 Monate)

Details anzeigen

Tätigkeitszeitraum

2/2020 – 6/2022

Tätigkeitsbeschreibung

Platform for Real Time Fraud Detection in eCommerce

In order to prevent financial and reputational loss in eCommerce platforms an automated detection of fraud
patterns in online shops is needed. The software should be able to scale out over multiple shop systems and data
sources. Further requirements are monitoring traffic in real time and incorporating expert knowledge alongside
machine learning models.

Key Achievements:
• Lead design of the platform
• Implementation of a proof of concept from which 80% of code made the first product iteration
• Technical Lead for a team of 5 Developers
• Successful deployment and zero downtime operations on customer premises at around 15 million events
per day
• Design of cloud based testing environment that can be brought up in less than 15 minutes (Infrastructure
as Code) and handle up to 10 times of production workload

Technologies: Apache Flink, Apache Kafka, Redis, Terraform, AWS, kubernetes, helm docker, Datadog

Eingesetzte Qualifikationen

Amazon Web Services (AWS), Docker, Apache Flink, Apache Kafka, Kubernetes

Lead Data Engineer 
									Multichannel-Retailer, Furniture, 6.000+ Employees, Remote								

6/2018 – 2/2021 (2 Jahre, 9 Monate)

Details anzeigen

Tätigkeitszeitraum

6/2018 – 2/2021

Tätigkeitsbeschreibung

Webtracking Event Pipeline with snowplow in AWS

For an eCommerce Platform it is crucial to have a detailed picture of customer behaviour on which business
decisions can be based. Either in real-time or from the data warehouse. For that a flexible, scalable, and fieldtestet solution is necessary which can run in the cloud. Additionally, all browser events need a custom
enrichment with business information from the backend in order to provide necessary context e.g. for „Add to
Cart“-events.

Key Achievements:
• Integration of snowplow event-pipeline in cloud based shop architecture
• Day to day operations of event-pipeline at ca. 4 million events per day
• Co-Engineering of custom enrichment in the webshop backend (ca. 1000+ lines of code) and handover
of ownership to the backend team
• Setup of custom real time event monitoring (< 1s latency) with elasticsearch and kibana
• Setup of custom scheduling and deployment processes for 5 components of the snowplow event-pipeline

Technologies: snowplow, kubernetes, amazon EMR, amazon kinesis, amazon redshift, Apache airflow, kibana,
elasticsearch, NodeJS, Gitlab CI

Eingesetzte Qualifikationen

Elasticsearch, Kubernetes

Data Engineer 
									Multichannel-Retailer, Furniture, 6.000+ Employees, Remote								

6/2018 – 1/2021 (2 Jahre, 8 Monate)

Details anzeigen

Tätigkeitszeitraum

6/2018 – 1/2021

Tätigkeitsbeschreibung

Product Recommendation Engines: Collaborative Filtering and Item
Similarity with Neural Nets

To enrich the shopping experience of the customer and to drive additional sales, the eCommerce platform should
be able to recommend customers additional products. Two orthogonal strategies are employed: Product based
similiarity based on neural network embeddings and collaborative filtering based on user behaviour.
Additionally, Performance monitoring for the recommendations is needed.

Key Achievements:
• Productionize both models based on proof of concepts by ML engineer including data aquisition,
running of the model and data output
• Scheduling and operations of productionized models, including 3 different code bases and more than 5
regularly scheduled jobs
• Operationalization of 10+ performance metrics over 5 dashboards for stakeholders

Technologies: Python keras, Python pandas, amazon EMR, Apache Mahout, amazon Redshift, apache airflow,
apache superset

Eingesetzte Qualifikationen

Amazon Web Services (AWS), Keras, Pandas, Python

Data Engineer / Platform Architect 
									Multichannel-Retailer, Furniture, 6.000+ Employees, Remote								

6/2017 – 2/2021 (3 Jahre, 9 Monate)

Details anzeigen

Tätigkeitszeitraum

6/2017 – 2/2021

Tätigkeitsbeschreibung

ETL-Pipeline Architecture with Apache Airflow und kubernetes

A datadriven company needs to have a reliable and scalable infrastructure as a key components of the corporate
decision making. Engineers as well as analysts need to be enabled to create ETL-processes and ad-hoc reports
without the need to consult with a data engineer. The data architecture of the company needs to provide
scalability, clear separation between testing and production and ease of use.

Key Achievements:
• Leading conception of cloud based infrastructure based on the above requirements
• Initial training of 5 developers an onboarding of more than 10 developers since
• Initial setup and operation of apache airflow with intially ca. 10 jobs, scaling up to more than 100
regular scheduled jobs at present

Technologies: Apache airflow, kubernetes, docker, AWS, gitlab CI

Eingesetzte Qualifikationen

Amazon Web Services (AWS), Docker, Kubernetes

Data Scientist / Fullstack Developer 
									Client: Multichannel-Retailer, Furniture, 6.000+ E, Remote								

2/2017 – 11/2018 (1 Jahr, 10 Monate)

Details anzeigen

Tätigkeitszeitraum

2/2017 – 11/2018

Tätigkeitsbeschreibung

A/B-Testing Plattform

In order to enable an eCommerce organization to become a datadriven organization there must be (among other
things) a framework present to compare different version of the website against each other. Many members of
the organization and departments need to be able to create and conduct experiments without the assistance of a
data engineer. Anther important factor for the framework was the usage Bayesian statistics.

Key Achievements:
• Leading Conception of testing framework including randomization logic, statistical modelling and
grapical presentation in the frontend
• Implementation of proof of concept for statistical engine
• Implementation of production code for frontend, backend, statistical engine
• Training of stakeholders from 3 different departments in methodology and statistical background of A/Btesting

Technologies: python PyMC3, Python SciPy, Apache Spark, Python pySpark, Apache airflow, docker,
kubernetes, VueJS, Redshift

Eingesetzte Qualifikationen

Apache Spark, Docker, JavaScript, Kubernetes, Python

Weitere Projekt‐ & Berufserfahrung anzeigen Weitere Projekt‐ & Berufserfahrung ausblenden

Zertifikate

Scrum Master

2013

Ausbildung

Philosophie / Informatik

Magister Artium

2008
Kiel

Über mich

			Senior Data Engineer mit Cloud-Erfahrung und besonderem Schwerpunkt auf dem Apache Toolstack.
			

Weitere Kenntnisse

			Python, pandas, jupyter, numpy, matplotlib, flask, scikit-learn, keras,

Tensorflow, Apache airflow, pySpark, pyMC3, Java, Spring, JUnit, Mockito, maven, ant, hybris, JavaScript, NodeJS, ExpressJS, VueJS, ChartJS, MySQL, PostgreSQL, Redis, amazon Redshift, Cassandra, AWS, kubernetes, helm, docker, terraform, Gitlab CI, Apache airflow, datadog, Apache Spark, Apache Flink, Apache Kafka, amazon Kinesis, snowplow, Object Oriented Programming, Test Driven Development, Functional Programming, Scrum Master, Linux

Persönliche Daten

Sprache

							Deutsch (Muttersprache)
Englisch (Fließend)
Französisch (Grundkenntnisse)

							

Reisebereitschaft

auf Anfrage

Arbeitserlaubnis

Europäische Union

Home-Office

unbedingt

Profilaufrufe

724

Alter

Berufserfahrung

							15 Jahre und 7 Monate
							(seit 05/2009)
							

Projektleitung

5 Jahre

Kontaktdaten

Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.

Jetzt Mitglied werden