freiberufler Data Architect /Data Strategist/ Data Engineer / Data Scientist/ Data & AI Management auf freelance.de

Data Architect /Data Strategist/ Data Engineer / Data Scientist/ Data & AI Management

zuletzt online vor wenigen Tagen
  • 110€/Stunde
  • 80687 München
  • Umkreis (bis 200 km)
  • en  |  de  |  fr
  • 11.09.2024

Kurzvorstellung

I led a team of up to 20 people (consisting of Data Architects, Data Engineers, Data Scientists, Front-End/Backend developers ) to build a Big Data Analytics and AI platform. Have hands-on practical experience with cloud and on-prem data tech.

Qualifikationen

  • Apache Spark4 J.
  • Big Data3 J.
  • Data Mining5 J.
  • Data Science5 J.
  • Databricks2 J.
  • Django
  • Document Retrieval
  • Internet of Things (IoT)
  • Java (allg.)3 J.
  • Maschinelles Lernen
  • Microsoft Azure4 J.
  • Salesforce.Com

Projekt‐ & Berufserfahrung

Data Scientist and Data Engineer for Corporate Open AI ChatBot and Azure AI Search RAG Development
Kundenname anonymisiert, ...
4/2024 – offen (9 Monate)
Öl- und Gasindustrie
Tätigkeitszeitraum

4/2024 – offen

Tätigkeitsbeschreibung

Development of Internal Corporate ChatBot based on Azure RAG Vectorization


Development of a RAG Indexer Pipeline that ingests EnBW corporates Documents (i.e., PDF, Powerpoints, Images etc.) from SharePoint and MSTeams into Azure Blob Storage and data lake.

Write python codes that use Azure Form Recognizer, Azure Document Intelligence, Azure AI Search to perform OCR text extraction from PDFs and Images.
Implementation of large document semantic summarization using LangChain and OpenAI. Development of document chunking and data vectorization techniques to build a Vectorization Index that it used by the corporate chatbot.

Usage of Azure ML Studio to create integrated vectorization indexing pipeline for the RAG, which is based on Azure AI Search Enrichment Skillsets. Creation of Azure Functions to handle Custom Skillsets during document processing.

Development and extension of a frontend chatbot python based application using Django

Development of CI/CD pipeline for the automatic End-to-End ingestion, cracking, chunking, enrichment and vectorization of incoming documents from share point and Azure Data Factory.

Eingesetzte Qualifikationen

Django, Document Retrieval, Microsoft Azure

Azure Data Architect & Data Scientist
Kundenname anonymisiert, .
7/2023 – 3/2024 (9 Monate)
Telekommunikation
Tätigkeitszeitraum

7/2023 – 3/2024

Tätigkeitsbeschreibung

Real-Time Data Pipeline from Marketing Use Case:
Creation of a data pipeline using Azure Data Factory, Autoloader, Databricks Delta Live table, Kafka Client, which ingests and re-organizes marketing data from Sales Force. Creation of a data pipeline that captures customer data from LinkedIn campaigns, new company followers from LinkedIn company page and customer profiles from LinkedIn Sales Navigator. This data is stored in Databrick Lakehouse and utilized to build machine learning model for Next Best Action.

360° Contact Nurturing and Next Best Action ML Model:
Inference of contact interest and engagement from Social Media and Sales Force data (i.e., Lead, Campaigns, Portfolio). Building of machine learning models for Contact Nurturing and Next Best Action, as well as a cross selling product/portfolio recommendation system based on Neural Networks and XGBOOST and Matrix Factorization. This enables marketing team to handover quality leads to the sales team.

Eingesetzte Qualifikationen

Databricks, Data Mining, Data Science, Microsoft Azure, Salesforce.Com

Data Scientist & Cloud Data Architect
Kundenname anonymisiert, .
7/2022 – 6/2023 (1 Jahr)
Telekommunikation
Tätigkeitszeitraum

7/2022 – 6/2023

Tätigkeitsbeschreibung

Dockerization of a Machine Learning based Customer Default Credit Rating financial application on AWS Cloud to support monthly credit rating of customers.

Algorithmic extension and Retraining of a Customer Credit Rating application which comprises of multiple classifiers and regressor to predict if a customer would default on his/her credit or not. And by how much.

Eingesetzte Qualifikationen

Amazon Web Services (AWS), Databricks, Data Science, Microsoft Azure

Data Architect, Overall Data Quality Testing
Kundenname anonymisiert, .
11/2021 – 6/2022 (8 Monate)
Versicherungen
Tätigkeitszeitraum

11/2021 – 6/2022

Tätigkeitsbeschreibung

Creation of Data Pipeline and Data Flow using Azure Data Factory and Databricks Spark Cluster.

Hands on programming of customized data flow logic in PySpark, Scala, Python to trigger dedicated spark jobs that ensure the end-to-end movement of data from Azure Data Lake through several Postgre Databases to Power BI.

Eingesetzte Qualifikationen

Databricks, Microsoft Azure

Lead Data Scientist
Kundenname anonymisiert, München
8/2021 – 11/2021 (4 Monate)
Gesundheitswesen
Tätigkeitszeitraum

8/2021 – 11/2021

Tätigkeitsbeschreibung

Creation of a customized Resnet architecture for image classification. Generally, a wide variety of CNN, RNN and Word-Embedding architectures were explored. Using the aforementioned image features vectors, a muti-stage pipeline of models such as XGBoost, CNN, LSTM, custom Restnet and Unet were assembled.

Eingesetzte Qualifikationen

Data Science, Data Mining, Maschinelles Lernen, Neuronale Netze, Amazon Web Services (AWS)

Data Architect / Data Engineer
Kundenname anonymisiert, München
10/2020 – 7/2021 (10 Monate)
Banken
Tätigkeitszeitraum

10/2020 – 7/2021

Tätigkeitsbeschreibung

- Setup predictive maintenance cloud Infrastructure with Azure IoT Hub
- Automate stream data transfer from IoT devices to IoT Hub
- Implementation of Messaging Queue to store IoT data in Azure data lake
- Creation of Databricks notebooks and spark clusters for IoT sensor data analysis - Use PySpark , Deep Neural Networks and other Data Mining and Machine learning techniques to analyse data

Eingesetzte Qualifikationen

Big Data, Data Mining, Internet of Things (IoT)

Data Architect (Festanstellung)
Kundenname anonymisiert, München
3/2020 – 2/2021 (1 Jahr)
Banken
Tätigkeitszeitraum

3/2020 – 2/2021

Tätigkeitsbeschreibung

- Data Hub concept creation and architecture on Azure Cloud
- Implementation of data pipelines for the ingestion in Azure Data Factory.
- Implementation of Azure Functions to pre-process ingested data
- Secure orchestration and movement of data from ADF to Azure Datalake using Azure Key Vault permissions.
- Optimal p partitioning of Datalake data in Databrick Delta Lake format
- Creations of multiple data Apache spark data aggregation scripts in PySpark and Scala
- Setup of Databrick cluster, computation of business KPIs and expose results as rest interface to Qlik, PowerBI visualization

- Setup Azure Active Directory tenant, enable Databricks AD Passthrough, Azure Apps and CosmosDB management.

Eingesetzte Qualifikationen

Apache Spark, Big Data, Microsoft Azure

Data Scientist (Festanstellung)
Kundenname anonymisiert, München
4/2017 – 1/2019 (1 Jahr, 10 Monate)
Banken
Tätigkeitszeitraum

4/2017 – 1/2019

Tätigkeitsbeschreibung

-Architecture and building of a Machine Learning Python API that is seamlessly integrated to the Spring Boot Java based platform. The Python API uses of Flask/Gunicorn server.
- Build machine learning smart alert system which is based on a combination of self-made outlier detection algorithm and publicly available outlier detection algorithms
- Build of a machine learning system for bank data prediction.
- Using XGBoost and an ensemble of other machine learning techniques for bank data forecasting.
- Deep reinforcement learning for banking network optimization. Usage of both convolution and recurrent neural network to teach multiple agents.

Eingesetzte Qualifikationen

Big Data, Data Mining, Data Science

Data Architect / Data Engineer/Software Developer (Festanstellung)
Kundenname anonymisiert, München
6/2016 – 7/2019 (3 Jahre, 2 Monate)
Banken
Tätigkeitszeitraum

6/2016 – 7/2019

Tätigkeitsbeschreibung

- Build on-premise Data Centre

- Install on-premise clusters with numerous head and worker nodes using Linux Ubuntu

- Build data Ingestion pipelines to ingest data from file protocols (FTP/SFTP), relational databases (e.g., MySQL, Postgres, Oracle, MSQL Server), document database (e.g., MongoDB), ERP systems (e.g., GreenPlum, SAP), file systems

- Architecture and building of a Data Prep Layer whereby imported data are aggregated using custom Spark function in Scala, SparkSQL, Spark UDF.

- Utilization of Hive and HiveContext to improve the speed of spark data aggregation.

- Performance tuning and optimal data structure setup of MongoDB to deliver KPI results with billions of data points within microsecond

- Build micro-services to access business KPI results from MongoDB and other internal databases.

- Build Java and Spring Boot web-based application

Eingesetzte Qualifikationen

Apache Hadoop, Apache Spark, Apache Tomcat, Data Mining, Data Science, Java (allg.), Spring Framework

Ausbildung

Data Mining & Machine Learning
Dr. Ing.
2015
RWTH Aachen Universität

Weitere Kenntnisse

Big Data: Databricks Delta Lake, Lakehouse, Azure Data Factory, Datalake, AAD, Hadoop, Spark, Kafka, AWS, Data Ingestion and Integration

Machine Learning: PyTorch, Reinforcement Learning, Deep Neural network Architectures, Neural Search Architecture, Data Pipelines, Scikit-learn, Weka

Web Fullstack: Java, Vue, Angular, JSP, MongoDB, MySQL

Mobile: Swift, IoS App Development

Embedded Systems/IOT : Azure IOT Hub, Event Hub, Mosquito

Persönliche Daten

Sprache
  • Englisch (Muttersprache)
  • Deutsch (Fließend)
  • Französisch (Grundkenntnisse)
Reisebereitschaft
Umkreis (bis 200 km)
Home-Office
bevorzugt
Profilaufrufe
1792
Alter
47
Berufserfahrung
20 Jahre und 8 Monate (seit 04/2004)
Projektleitung
12 Jahre

Kontaktdaten

Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.

Jetzt Mitglied werden