Data Architect /Data Strategist/ Data Engineer / Data Scientist/ Data & AI Management
- Verfügbarkeit einsehen
- 0 Referenzen
- 110€/Stunde
- 80687 München
- Umkreis (bis 200 km)
- en | de | fr
- 11.09.2024
Kurzvorstellung
Qualifikationen
Projekt‐ & Berufserfahrung
4/2024 – offen
Tätigkeitsbeschreibung
Development of Internal Corporate ChatBot based on Azure RAG Vectorization
Development of a RAG Indexer Pipeline that ingests EnBW corporates Documents (i.e., PDF, Powerpoints, Images etc.) from SharePoint and MSTeams into Azure Blob Storage and data lake.
Write python codes that use Azure Form Recognizer, Azure Document Intelligence, Azure AI Search to perform OCR text extraction from PDFs and Images.
Implementation of large document semantic summarization using LangChain and OpenAI. Development of document chunking and data vectorization techniques to build a Vectorization Index that it used by the corporate chatbot.
Usage of Azure ML Studio to create integrated vectorization indexing pipeline for the RAG, which is based on Azure AI Search Enrichment Skillsets. Creation of Azure Functions to handle Custom Skillsets during document processing.
Development and extension of a frontend chatbot python based application using Django
Development of CI/CD pipeline for the automatic End-to-End ingestion, cracking, chunking, enrichment and vectorization of incoming documents from share point and Azure Data Factory.
Django, Document Retrieval, Microsoft Azure
7/2023 – 3/2024
Tätigkeitsbeschreibung
Real-Time Data Pipeline from Marketing Use Case:
Creation of a data pipeline using Azure Data Factory, Autoloader, Databricks Delta Live table, Kafka Client, which ingests and re-organizes marketing data from Sales Force. Creation of a data pipeline that captures customer data from LinkedIn campaigns, new company followers from LinkedIn company page and customer profiles from LinkedIn Sales Navigator. This data is stored in Databrick Lakehouse and utilized to build machine learning model for Next Best Action.
360° Contact Nurturing and Next Best Action ML Model:
Inference of contact interest and engagement from Social Media and Sales Force data (i.e., Lead, Campaigns, Portfolio). Building of machine learning models for Contact Nurturing and Next Best Action, as well as a cross selling product/portfolio recommendation system based on Neural Networks and XGBOOST and Matrix Factorization. This enables marketing team to handover quality leads to the sales team.
Databricks, Data Mining, Data Science, Microsoft Azure, Salesforce.Com
7/2022 – 6/2023
Tätigkeitsbeschreibung
Dockerization of a Machine Learning based Customer Default Credit Rating financial application on AWS Cloud to support monthly credit rating of customers.
Algorithmic extension and Retraining of a Customer Credit Rating application which comprises of multiple classifiers and regressor to predict if a customer would default on his/her credit or not. And by how much.
Amazon Web Services (AWS), Databricks, Data Science, Microsoft Azure
11/2021 – 6/2022
Tätigkeitsbeschreibung
Creation of Data Pipeline and Data Flow using Azure Data Factory and Databricks Spark Cluster.
Hands on programming of customized data flow logic in PySpark, Scala, Python to trigger dedicated spark jobs that ensure the end-to-end movement of data from Azure Data Lake through several Postgre Databases to Power BI.
Databricks, Microsoft Azure
8/2021 – 11/2021
TätigkeitsbeschreibungCreation of a customized Resnet architecture for image classification. Generally, a wide variety of CNN, RNN and Word-Embedding architectures were explored. Using the aforementioned image features vectors, a muti-stage pipeline of models such as XGBoost, CNN, LSTM, custom Restnet and Unet were assembled.
Eingesetzte QualifikationenData Science, Data Mining, Maschinelles Lernen, Neuronale Netze, Amazon Web Services (AWS)
10/2020 – 7/2021
Tätigkeitsbeschreibung
- Setup predictive maintenance cloud Infrastructure with Azure IoT Hub
- Automate stream data transfer from IoT devices to IoT Hub
- Implementation of Messaging Queue to store IoT data in Azure data lake
- Creation of Databricks notebooks and spark clusters for IoT sensor data analysis - Use PySpark , Deep Neural Networks and other Data Mining and Machine learning techniques to analyse data
Big Data, Data Mining, Internet of Things (IoT)
3/2020 – 2/2021
Tätigkeitsbeschreibung
- Data Hub concept creation and architecture on Azure Cloud
- Implementation of data pipelines for the ingestion in Azure Data Factory.
- Implementation of Azure Functions to pre-process ingested data
- Secure orchestration and movement of data from ADF to Azure Datalake using Azure Key Vault permissions.
- Optimal p partitioning of Datalake data in Databrick Delta Lake format
- Creations of multiple data Apache spark data aggregation scripts in PySpark and Scala
- Setup of Databrick cluster, computation of business KPIs and expose results as rest interface to Qlik, PowerBI visualization
- Setup Azure Active Directory tenant, enable Databricks AD Passthrough, Azure Apps and CosmosDB management.
Apache Spark, Big Data, Microsoft Azure
4/2017 – 1/2019
Tätigkeitsbeschreibung
-Architecture and building of a Machine Learning Python API that is seamlessly integrated to the Spring Boot Java based platform. The Python API uses of Flask/Gunicorn server.
- Build machine learning smart alert system which is based on a combination of self-made outlier detection algorithm and publicly available outlier detection algorithms
- Build of a machine learning system for bank data prediction.
- Using XGBoost and an ensemble of other machine learning techniques for bank data forecasting.
- Deep reinforcement learning for banking network optimization. Usage of both convolution and recurrent neural network to teach multiple agents.
Big Data, Data Mining, Data Science
6/2016 – 7/2019
Tätigkeitsbeschreibung
- Build on-premise Data Centre
- Install on-premise clusters with numerous head and worker nodes using Linux Ubuntu
- Build data Ingestion pipelines to ingest data from file protocols (FTP/SFTP), relational databases (e.g., MySQL, Postgres, Oracle, MSQL Server), document database (e.g., MongoDB), ERP systems (e.g., GreenPlum, SAP), file systems
- Architecture and building of a Data Prep Layer whereby imported data are aggregated using custom Spark function in Scala, SparkSQL, Spark UDF.
- Utilization of Hive and HiveContext to improve the speed of spark data aggregation.
- Performance tuning and optimal data structure setup of MongoDB to deliver KPI results with billions of data points within microsecond
- Build micro-services to access business KPI results from MongoDB and other internal databases.
- Build Java and Spring Boot web-based application
Apache Hadoop, Apache Spark, Apache Tomcat, Data Mining, Data Science, Java (allg.), Spring Framework
Ausbildung
RWTH Aachen Universität
Weitere Kenntnisse
Machine Learning: PyTorch, Reinforcement Learning, Deep Neural network Architectures, Neural Search Architecture, Data Pipelines, Scikit-learn, Weka
Web Fullstack: Java, Vue, Angular, JSP, MongoDB, MySQL
Mobile: Swift, IoS App Development
Embedded Systems/IOT : Azure IOT Hub, Event Hub, Mosquito
Persönliche Daten
- Englisch (Muttersprache)
- Deutsch (Fließend)
- Französisch (Grundkenntnisse)
Kontaktdaten
Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.
Jetzt Mitglied werden