Data Scientist
- Verfügbarkeit einsehen
- 0 Referenzen
- 75€/Stunde
- 04360 Tuusula
- Nähe des Wohnortes
- es | pt | en
- 30.06.2024
Kurzvorstellung
Qualifikationen
Projekt‐ & Berufserfahrung
8/2023 – 1/2024
Tätigkeitsbeschreibung
Built a complete NLP/AI pipeline to extraction information from documents.
From text curation, extraction and pre-processing to data visualisation.
Fine tuning and adapting open-source LLM and HuggingFace code.
Amazon Web Services (AWS), Apache Spark, Langchain, Large Language Models, Maschinelles Lernen, Natural Language Processing, Pandas, Python, Pytorch, Text-Extraction
6/2023 – 10/2024
Tätigkeitsbeschreibung
NLP/AI compliance prototype.
Using open-source LLMs and Hugging Face.
End-to-end pipeline from text extraction to exploratory UI.
Text-Extraction, Google Cloud, Langchain, Large Language Models, Maschinelles Lernen, Pandas, Pytorch, Selenium
3/2022 – 3/2023
TätigkeitsbeschreibungA recommendation system for a retail bank that suggests offerings based on transaction data and customer demographics. I implemented data extraction, feature construction and model formulation in Python, and extended a C++ matrix factorisation implementation to the specific sampling model. To support the development, operation and assessment of the recommender I also built with D3.js a graphical dashboard to show transaction patterns (categorisation, branch, time, customer category, volume,...) together with the recommended offerings.
Eingesetzte QualifikationenMaschinelles Lernen, C++, Python
1/2020 – 4/2020
Tätigkeitsbeschreibung
Took care of the Airflow delivery orchestration code for the data products of a consumer internet behaviour monitoring startup.
Unglamorous, complex dag code that had been written by people who had left the company, and didn't have much documentation, but was essential to the top line.
Step by step grokking and refactoring, writing unit tests, fixing issues, generalising code to add flexibility in delivery cadence and post-processing. Performance optimisation to delivery time (including in the orchestration itself).
Apache Spark, Java (allg.), Python
5/2019 – 9/2019
Tätigkeitsbeschreibung
Various bug fixes and improvements to the page view processing implementation of a consumer internet behaviour monitoring startup.
I also implemented a general (postgres) SQL to Parquet exporter in Java.
Apache Spark, Postgresql, Java (allg.), Scala
12/2018 – 4/2019
Tätigkeitsbeschreibung
Porting, unit and integration testing, documentation.
UDF/UDAF/UDTs written in Java by people who had left this consumer internet behaviour and monitoring startup.
Mostly functions dealing with text and URL matching, information extraction from text and URLs and supporting data structures (e.g. tries)
Apache Spark, Java (allg.), Scala
11/2018 – 3/2023
Tätigkeitsbeschreibung
Several iterations of Machine learning models including:
- Loan default risk estimation
- Credit card repayment risk estimation
- Customer churn
- Demand elasticity and deposit rate optimisation
- Card skimming analysis
Using various methodologies including bayesian probabilistic modelling (with PyMC), causal estimation, deep learning with times series (Tensorflow, RNN) and matrix factorisation.
On AWS environment. FastAPI serving, batch or continuous to Redshift.
Maschinelles Lernen, Tensorflow, Amazon Web Services (AWS)
6/2014 – 4/2019
Tätigkeitsbeschreibung
System architecture, full initial implementation, further development, support and handover to a team for further development
Designed and build an analytics backbone continuously taking customer and product data from several loosely integrated proprietary systems to Kafka for streaming analysis, data fusion and population of a Redshift Data Warehouse and Elasticsearch.
Some interesting aspects:
- Hybrid onsite + AWS, with anonymisation before reaching AWS, and extensive encryption everywhere
- Continuous ingestion from proprietary system's Oracle databases using Goldengate
- Streaming fusion of transaction data pieced from multiple sources
- Unified monitoring with Prometheus and Grafana across onsite and AWS. Comprehensive operational dashboard with processing volumes, lags, downtimes and critical resources
Apache Kafka, Oracle Database, Amazon Web Services (AWS), Elasticsearch
Zertifikate
Coursera
Joseph Pelrine
Ausbildung
Universidade Nova de Lisboa
Lisboa
Über mich
I've worked across industries including banking, fintech, cybersecurity, e-commerce, transportation, logistics and engineering. I'm equally comfortable writing code, leading a project, performing technical due diligence, conducting a workshop or coaching a team.
Persönliche Daten
- Spanisch (Muttersprache)
- Portugiesisch (Muttersprache)
- Englisch (Fließend)
- Französisch (Grundkenntnisse)
- Europäische Union
Kontaktdaten
Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.
Jetzt Mitglied werden