freiberufler Machine Learning Engineer / Data Scientist / Data Engineer / Data Science Project Manager auf freelance.de

Machine Learning Engineer / Data Scientist / Data Engineer / Data Science Project Manager

online
  • auf Anfrage
  • 90763 Fürth
  • National
  • de  |  en  |  hu
  • 02.12.2024

Kurzvorstellung

Highly skilled and experienced freelance machine learning engineer/consultant specialized in state of the art deep learning, machine learning and data science with a proven track record of delivering high-quality results in a fast-paced environment

Qualifikationen

  • Data Science10 J.
  • Maschinelles Lernen10 J.
  • Python10 J.
  • Agile Methodologie10 J.
  • Amazon Web Services (AWS)5 J.
  • Big Data10 J.
  • Elasticsearch2 J.
  • Git5 J.
  • Google Cloud2 J.
  • Jenkins1 J.
  • Jira10 J.
  • Large Language Models1 J.
  • MLOps1 J.
  • Natural Language Processing7 J.
  • Natural Language Understanding3 J.
  • Pytorch7 J.
  • Scrum10 J.
  • SQL10 J.
  • Transformer1 J.

Projekt‐ & Berufserfahrung

Machine Learning Engineer / Data Scientist for Search Engines
OTTO, Hamburg
8/2023 – offen (1 Jahr, 5 Monate)
E-Commerce
Tätigkeitszeitraum

8/2023 – offen

Tätigkeitsbeschreibung

As a machine learning engineer and data scientist in the search team at OTTO, my main task is to use state of the art machine learning techniques to improve the search experience for our customers. The Solr search engine, which processes 1.000 queries per second and supports around 20 million product variants 24/7, is central to OTTO's e-commerce platform. All improvements are extensively tested and validated through online experiments.

Learning to Select: Improved query precision by filtering out irrelevant results through comprehensive data-driven solutions on clickstream data. Also identified and removed fraudulent and bot-generated queries to improve model performance and data integrity.

Hybrid Search: Collaborated with two teams to develop a system that integrates both lexical and semantic search approaches to provide more relevant search results.

Advanced Spell Check: Designed, implemented, validated and brought to production a leading-edge spell checking system. This solution not only corrects customer spelling errors but also guides them towards the most relevant products.

Query Intent Detection: I also led the development of a customer query intent detection approach to identify non-product and navigation queries, and to recognize brand names and their context within search queries (Named entity recognition and classification).

Toolkit: AWS, GCP, BigQuery, Clickstream Data, FastText, Huggingface Transformers, MLflow, OpenAI API, SageMaker, AirFlow, Docker, Jenkins, Terraform, Grafana, Prometheus, Elasticsearch, Kibana, Confluence, Jira, Miro, Agile/Scrum, FastAPI, Poetry, Python, PyTorch, GitHub, Online Experiments/Testing, Solr, Pair Programming

Eingesetzte Qualifikationen

Large Language Models, Agile Methodologie, Amazon Web Services (AWS), Big Data, Confluence, Data Science, Docker, Elasticsearch, Git, Google Cloud, Jenkins, Jira, Maschinelles Lernen, MLOps, Natural Language Processing, Natural Language Understanding, Python, Pytorch, Scrum, SQL, Transformer

Large Language Model (LLM) Integration Consultant
Kundenname anonymisiert, Fürth
2/2023 – 7/2023 (6 Monate)
IT & Entwicklung
Tätigkeitszeitraum

2/2023 – 7/2023

Tätigkeitsbeschreibung

As an external consultant, I helped startups to use GPT and other large language models (LLMs). I provided training, evaluated use cases, assessed limitations such as security, performance, accuracy and explored options/alternatives to the OpenAI API.

Toolkit: Haystack, Hugging Face models, LangChain, Ollama, OpenAI API, Python

Eingesetzte Qualifikationen

Large Language Models, Data Science, Langchain, Maschinelles Lernen, Natural Language Processing, Natural Language Understanding, Python, Pytorch

Data Product Owner & Solution Architect / Machine Learning Consultant
RTL Deutschland, Köln
9/2021 – 1/2023 (1 Jahr, 5 Monate)
Medienbranche
Tätigkeitszeitraum

9/2021 – 1/2023

Tätigkeitsbeschreibung

As a freelance consultant and expert in machine learning applications for content understanding, I supported the RTL Data team in building the next generation multi-purpose platform "RTL+" in cooperation with Deezer, using visual (video), audio and text data. An integral part of my role was to manage and balance the needs and expectations of the various stakeholders involved in the project.

The primary goal of this project is to derive and provide additional metadata from the raw content that can be used by downstream applications such as search, recommendation, and personalization. The key challenge is to establish a clean, reliable, scalable, and production-ready state-of-the-art solution for a large number of building blocks and to create an efficient execution pipeline on top of it.

Video based models: Aesthetic Ranking, Dominant Color Extraction, End Credits Detection, Face Detection, Image Quality Detection, Logo Detection, Mood Detection, Object detection and Recognition, Place Prediction, Scene and Shot-Boundary Detection, Shot Type Detection by using and optimizing both pre-trained and self-trained models.

Audio based models and solutions: Speech-to-Text transcriptions using Google’s Speech-to-Text API and Whisper from Open-AI on Podcasts and other audio sources and music identification.

NLP solutions: language detection (fastText), festivity detection, kids content detection, adult content detection, topic modeling (BERTopic), keyword extraction (KeyBERT) and text summarization.

Toolkit: Argo Workflows, Confluence, Elasticsearch, FFmpeg, GitLab CI/CD, Google BigQuery, Google Cloud Platform (GCP), Google Data Studio, Hugging Face models, Jira, Jupyter, MLflow, NumPy, Poetry, PyTorch, Python, SQL, Scrum, TensorFlow, Terraform, pandas

Eingesetzte Qualifikationen

Natural Language Understanding, Agile Methodologie, Big Data, Computer Vision, Confluence, Data Science, Elasticsearch, Google Cloud, Jira, Maschinelles Lernen, Natural Language Processing, Product Owner, Python, Pytorch, Scrum, SQL, Tensorflow

Machine Learning Engineer / Data Scientist / Deep Learning Expert
adidas, Herzogenaurach
9/2017 – 6/2021 (3 Jahre, 10 Monate)
E-Commerce / Fashion
Tätigkeitszeitraum

9/2017 – 6/2021

Tätigkeitsbeschreibung

As a freelance consultant and expert in Deep Learning, Machine Learning and Data Science, I specialized in fraud recognition, product recommendation systems, image recognition/classification, anomaly detection, time series analysis and NLP. I guided agile projects from conception to production and maintenance & optimization.

I focused on eCommerce solutions that leveraged consumer data, product master data & descriptions, product images and sales transactions.

Product Similarity: The goal of the product similarity solution was to improve downstream system performance by identifying similar or related products for a given product, which could then be used as a benchmark or replacement product. The similarity was determined using various modalities, including visual similarity (via an image autoencoder), consumer behavior (using clickstream data) and product descriptions (by using NLP transformer-based models).

Toolkit: AWS, Bitbucket, Confluence, Jenkins, Jira, Jupyter, PySpark, Python, Scrum, TensorFlow

Dynamic Pricing: The goal of this project was to identify poor-performing products in an early stage, uncover any potential product issues, and determine the right actions (such as an optimal price change) to boost performance. The ultimate goal was to gradually replace the existing solution.

Toolkit: AWS, Bitbucket, Confluence, Jenkins, Jira, Jupyter, Matplotlib, PySpark, Python, Scrum, TensorFlow, XGBoost

Consumer Lifetime Value (CLTV): I was responsible for the conception, implementation, and maintenance of the historical and future monetary value attributed to individual consumers. This included regular extensions and adaptations (e.g. for new markets/brands) and deep dive analyses into the model's most important features.

The models, which were based on consumer behavior data, ran in production and were updated on a weekly basis for all consumers. The results (KPIs) were intensively used in downstream systems and for marketing campaigns.

Toolkit: Bitbucket, Confluence, Exasol, Jenkins, Jira, Jupyter, Matplotlib, Python, SHAP, Scrum, XGBoost

Visual Product Embeddings: I was responsible for the conception and implementation of a variational autoencoder based on product images. The source images were filtered, downscaled, and prepared for a convolutional neural network (VAE) that generated embeddings capable of capturing design elements of a product image. These embeddings were used to find similar products and also fed into downstream models to improve product-based models. The solution ran in production and was updated with new images on a weekly basis.

Toolkit: AWS, Bitbucket, Confluence, Exasol, Jenkins, Jira, Jupyter, Keras, Matplotlib, OpenCV, PySpark, Python, SageMaker, Scrum, TensorFlow, Variational Autoencoder

Purchase Propensity Scores: I was responsible for the conception, implementation, and maintenance of a model for predicting consumer purchase intentions. The solution had been running very stably in production for a few years already and the results had made a significant contribution to marketing channels.

Toolkit: Bitbucket, Confluence, Exasol, Jira, Matplotlib, Python, SHAP, SQL, XGBoost

Eingesetzte Qualifikationen

Amazon Web Services (AWS), Agile Methodologie, Big Data, Computer Vision, Confluence, Data Science, Exasol, Git, Jira, Keras, Maschinelles Lernen, Microstrategy, Natural Language Processing, Python, Pytorch, Scrum, SQL, Tableau, Tensorflow

Machine Learning Engineer / Data Scientist / Deep Learning Expert
Kundenname anonymisiert, Nürnberg, Berlin, Karlsruhe
1/2014 – 8/2017 (3 Jahre, 8 Monate)
E-Commerce
Tätigkeitszeitraum

1/2014 – 8/2017

Tätigkeitsbeschreibung

Machine Learning / Data Science Projects

Date: 2017
Technology: TensorFlow, Keras, Convolutional Neural Networks
Use case: Image detection, classification and metadata extraction for product images
Goal: Enrich metadata for product descriptions from images, finding outliers - helping content management teams to improve data quality

Date: 2017
Technology: TensorFlow, Keras, Convolutional Neural Networks
Use case: Prototype for detecting product variants on an image
Goal: Reduce manual effort, fully automate and scale processes

Date: 2016 Technology: TensorFlow, LSTM
Use case: Audio signal analysis and synthesis using Deep Learning
Goal: Various experiments to deconstruct and construct audio signals

Date: 2016
Technology: TensorFlow, LSTM, Convolutional Neural Networks
Use case: Classification and anomaly detection using Deep Learning
Goal: Labeling transactions, find anomalies, reduce manual work

Date: 2015
Technology: Random Forest
Use case: Product recommendation using Random Forest algorithm
Goal: Product recommendation optimized on long time revenue

Date: 2014
Technology: Regression
Use case: Fraud detection
Goal: Find similarities between new customer registrations to prevent multiple registrations (fraud/misusage)

Date: 2014
Technology: Apache Mahout
Use case: Product recommendation using Item-Based Collaborative Filtering
Goal: Recommend similar Products for known users based on user behavior on a marketplace platform

Date: 2014
Technology: Apache Mahout
Use case: Product recommendation using k-nearest neighbors algorithm
Goal: Show similar Products for unknown users on a marketplace platform

Eingesetzte Qualifikationen

Data Science, Big Data, SQL, Agile Methodologie, Jira, Maschinelles Lernen, Mysql, Oracle Database, Projektleitung / Teamleitung, Projektmanagement, Python, Scrum, Ubuntu

Zertifikate

Python Essentials for MLOps - Duke University
Coursera
2024
Databricks to Local LLMs - Duke University
Coursera
2024
Microsoft Azure Databricks for Data Engineering
Coursera
2023
deeplearning.ai - Machine Learning Engineering for Production (MLOps)
2022
deeplearning.ai - Natural Language Processing Specialization
2020
Machine Learning Engineer Nanodegree at Udacity
2018
Neural Networks and Deep Learning by deeplearning.ai on Coursera
2017
Deep Learning Nanodegree Foundation at Udacity
2017
Neural Networks for Machine Learning by University of Toronto on Coursera
2016
Machine Learning: Clustering & Retrieval by University of Washington on Coursera
2016
Machine Learning: Classification by University of Washington on Coursera
2016
Machine Learning With Big Data (2015) by University of California, San Diego on Coursera
2016
Machine Learning by Stanford University on Coursera
2016
Machine Learning: Regression by University of Washington on Coursera
2016
Introduction to Big Data Analytics (2015) by University of California, San Diego on Coursera
2015
Machine Learning Foundations: A Case Study Approach by University of Washington on Coursera
2015
Introduction to Big Data (2015) by University of California, San Diego on Coursera
2015
Certified Scrum-Master
2015
iSAQB® Certified Professional for Software Architecture
2015
Sun Certified Java Programmer
2010

Über mich

I have worked on projects for various clients in different industries, using my expertise to help the organisation improve efficiency, reduce costs, and increase revenue through the use of data-driven solutions.

Frameworks:
- Keras, PyTorch, scikit-learn, TensorFlow, XGBoost
- Conda/Anaconda, Jupyter, Matplotlib, NumPy, openCV, pandas, plotly, Poetry
- MLflow, SageMaker, Vertex AI

Applications:
- Anomaly Detection, Audio Analysis and Synthesis, Clickstream Analysis, Computer Vision, Content Understanding, Data Analysis, Data Mining, Data Visualisation, Deep Learning, Dynamic Pricing, Fraud Detection, Image Processing, Image Recognition/Classification, Machine Learning, Natural Language Processing (NLP), Natural Language Understanding, Product Similarities, Recommendation Systems, Speech Recognition

Algorithms:
- Deep Neural Networks, Convolutional Neural Networks, LSTM, (Variational-)Autoencoder, Transformers
- Hyperparamer Tuning, Transfer Learning
- Model/Feature Analysis using SHAP
- Dimensionality-Reduction (PCA, t-SNE, LDA, Autoencoder, UMAP)

Programming Skills:
- Python
- C/C++, Java, MATLAB/GNU Octave, PHP
- Clean Code, PyTest, Static Code Analysis, Unittest
- Jenkins, Git, GitHub, GitLab
- Software Development and Software Architecture
- Linux, macOS, Windows

Database skills:
- Apache Spark, BigQuery, Elasticsearch, Exasol, Graylog, Kibana, MS-SQL, MySQL, Oracle DB

Big Data:
- Amazon Web Services (AWS), EMR, SageMaker, Apache Spark
- Google Cloud Platform (GCP), BigTable, BigQuery
- Hadoop, PySpark
- FFmpeg for Video Processing

Agile-Tools:
- Bitbucket, Confluence, Jira, Slack, Teams, Trello

Weitere Kenntnisse

iSAQB® Domain-Driven Design (DDD)

Studium der Elektrotechnik an der FH Nürnberg
Studium der Informatik an der UNI Erlangen

Persönliche Daten

Sprache
  • Deutsch (Muttersprache)
  • Englisch (Fließend)
  • Ungarisch (Fließend)
Reisebereitschaft
National
Arbeitserlaubnis
  • Europäische Union
Home-Office
bevorzugt
Profilaufrufe
8154
Berufserfahrung
25 Jahre und 11 Monate (seit 01/1999)
Projektleitung
15 Jahre

Kontaktdaten

Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.

Jetzt Mitglied werden