freiberufler Big Data Engineer auf freelance.de

Big Data Engineer

zuletzt online vor 2 Tagen
  • auf Anfrage
  • 61-306 Poznań
  • Europa
  • pl  |  en  |  de
  • 04.11.2024

Kurzvorstellung

Big Data Engineer with over 13 years of experience. A skilled problem solver passionate about the design, quality, and performance of developed solutions.

Qualifikationen

  • Amazon Web Services (AWS)4 J.
  • Apache Cassandra
  • Apache Hadoop7 J.
  • Apache Kafka
  • Apache Spark
  • Java
  • Java (allg.)6 J.
  • Neo4j
  • Python6 J.
  • Scala9 J.
  • SQL

Projekt‐ & Berufserfahrung

Big Data Engineer
Nomura, New York
11/2022 – offen (2 Jahre, 1 Monat)
Banken
Tätigkeitszeitraum

11/2022 – offen

Tätigkeitsbeschreibung

Working on Credit and Liquidity Risk Stress Testing system.

Programming languages: Scala, Python, Java.
Used technologies: Spark, Hadoop, Dremio, ActivePivot, Iceberg, AWS.

• Refactored existing ETL jobs to improve code re-use and make the code easier to
understand, test and extend.
• Changed the partitioning of existing data sets
• Achieved huge performance improvement after optimizing existing Spark jobs by:
- using Data Frame and Dataset APIs instead of RDDs
- using built-in Spark functions instead of custom row transformations
- using aggregations that support partial aggregation
- using broadcast joins
• Implemented a tool that verifies whether the differences in the number of records and
calculated measures between consecutive days fall within specified thresholds.
• Improved unit test coverage percentage

Eingesetzte Qualifikationen

Apache Hadoop, Apache Spark, Java (allg.), Python, Scala, Amazon Web Services (AWS)

Big Data Engineer
Groupon, Chicago
6/2022 – 10/2022 (5 Monate)
IT & Entwicklung
Tätigkeitszeitraum

6/2022 – 10/2022

Tätigkeitsbeschreibung

Worked on migration from on-prem Hadoop cluster to AWS.
Used technologies: Hadoop, EMR, Spark, Airflow, S3, Docker, Terraform.
Programming languages: Scala, Python.

• Converted multiple MapReduce jobs to Spark jobs.
• Updated Spark jobs to use DataFrames and Datasets instead of RDDs.
• Optimized existing Spark jobs.
• Created Airflow DAGs to schedule data processing.
• Deployed infrastructure using Terraform.

Eingesetzte Qualifikationen

Python, Scala, Amazon Web Services (AWS), Apache Hadoop, Apache Spark, Docker

Big Data Engineer
Nike, Hilversum
3/2021 – 6/2021 (4 Monate)
Konsumgüterindustrie
Tätigkeitszeitraum

3/2021 – 6/2021

Tätigkeitsbeschreibung

Used technologies: Hadoop, Spark, Hive, Docker, Kubernetes, Airflow, Terraform, AWS, MS SQL Server, Snowflake.
Programming language: Python.

• Implemented multiple PySpark jobs running on Kubernetes cluster to transform data
from MS SQL Server and store it into S3.
• Implemented Airflow pipelines to schedule PySpark jobs and define dependencies between
them.
• Deployed infrastructure using Terraform.

Eingesetzte Qualifikationen

Apache Hadoop, Microsoft SQL-Server (MS SQL), Apache Spark, Docker, Snowflake, Amazon Web Services (AWS), Kubernetes, Python

Big Data Engineer
Adidas, Herzogenaurach
5/2020 – 6/2022 (2 Jahre, 2 Monate)
Konsumgüterindustrie
Tätigkeitszeitraum

5/2020 – 6/2022

Tätigkeitsbeschreibung

Used technologies: Hadoop, Spark, Hive, Databricks, Docker, Kubernetes, EMR, S3, CloudFormation, Lambda, DynamoDB, Akka HTTP, Flask, Gunicorn.
Programming languages: Scala, Python.

• Designed and implemented a feature store for machine learning. Prepared a framework
for efficient calculation of thousands different aggregate values (features) from terabytes
of data.
• Dockerized Spark applications to run them as containers on EMR cluster in more isolated
and standardized way.
• Implemented application for serving machine learning model as REST API on Kubernetes
cluster using Flask, Gunicorn and Tensorflow Serving API. Significantly improved response
time of the API by using approximate nearest neighbor search algorithm.
• Implemented lambda function to transform new objects created in S3 bucket and store
records in DynamoDB table.
• Implemented REST API application using Akka HTTP to serve recommendations stored
in DynamoDB table.
• Optimized existing Spark applications.
• Worked with data scientists to optimize their solutions and make them production ready.

Eingesetzte Qualifikationen

Apache Hadoop, Apache Spark, Databricks, Docker, Python, Scala, Amazon Web Services (AWS), Kubernetes

Big Data Engineer
Nordea, Copenhagen
1/2018 – 4/2020 (2 Jahre, 4 Monate)
Banken
Tätigkeitszeitraum

1/2018 – 4/2020

Tätigkeitsbeschreibung

Used technologies: Hadoop, Spark, Kafka, Hive, Flume, HBase, Oozie, Splunk, Ansible.
Programming languages: Scala, Python.

• Implemented report generators for Core Banking Platform using Spark.
• Implemented Spark jobs for file compaction and repartitioning to improve performance of
report generators and Hive queries.
• Implemented random data generators for the purpose of verifying the performance of Spark
applications. Analyzed outputs of performance tests and made necessary improvements.
• Worked on migration from Cloudera to MapR distribution for Hadoop.
• Used Flume to read messages from Kafka, transform them and persist into HDFS and
HBase.
• Used Sqoop to ingest data from Oracle database into HDFS.
• Implemented ETL jobs for transforming files in various formats into Avro format.
• Automated deployment of applications using Ansible which greatly reduced the number
of issues during production deployments.

Eingesetzte Qualifikationen

Apache Kafka, Python, Scala, Ansible, Apache Hadoop, Apache Spark

Big Data Engineer
Agata Tudek LDI, Warsaw
1/2018 – 2/2018 (2 Monate)
IT & Entwicklung
Tätigkeitszeitraum

1/2018 – 2/2018

Tätigkeitsbeschreibung

Used technologies: Spark, Athena, S3, EMR, Neo4j.
Programming language: Python.

• Used PySpark and GraphFrames to run graph algorithms. Compared the performance
with Neo4j.
• Used PySpark to transform data stored in S3 and generate CSV files in order to import
them into Neo4j.

Eingesetzte Qualifikationen

Apache Spark, Python

Scala Software Engineer
Citi, Warsaw
7/2017 – 11/2017 (5 Monate)
Banken
Tätigkeitszeitraum

7/2017 – 11/2017

Tätigkeitsbeschreibung

Developed application for collecting risk data from various sources and processing it in
real-time.

Used technologies: Hadoop, Spark Streaming, Kafka, Avro, Camel.

Eingesetzte Qualifikationen

Apache Hadoop, Apache Kafka, Apache Spark, Scala, Apache Camel

Java, Scala Software Engineer
Kantwert, Poznań
7/2015 – 8/2017 (2 Jahre, 2 Monate)
IT & Entwicklung
Tätigkeitszeitraum

7/2015 – 8/2017

Tätigkeitsbeschreibung

Developed system that uses data from public sources to conclude „who-knows-who” relationships
and help companies to identify valuable relations within their existing customers.
Used technologies: Neo4j, Cassandra, Spark Streaming, Spark GraphX, Spring, Spray,
ActiveMQ, Docker, Redis, Solr.

Used technologies: Neo4j, Cassandra, Spark, ActiveMQ, Spring, Spray.

Details:
• Designed and implemented algorithm for concluding „knows” relationships between
persons using Spark.
• Designed and implemented algorithm for finding ultimate beneficial owner of company
using Spark GraphX.
• Created Neo4j Server plugin for finding shortest paths between nodes in the graph using
defined business rules.
• Implemented REST services that perform Cypher queries in order to retrieve data from
nodes and relationships.
• Implemented fast data import to Neo4j database by writing directly to the files using
batch inserter API.
• Implemented transformations of data stored in Cassandra using Spark into the format
that can be easily used to import data into Neo4j database.
• Designed and implementing synchronization between Cassandra and Neo4j using event driven architecture.
• Implemented searching nodes in the graph using Cypher queries and Lucene index.
• Data modeling.
• Configured and tuned Neo4j database.

Eingesetzte Qualifikationen

Apache Spark, Docker, Java (allg.), Scala, Apache Solr

Java Software Engineer
PSI Polska, Poznań
7/2014 – 6/2015 (1 Jahr)
IT & Entwicklung
Tätigkeitszeitraum

7/2014 – 6/2015

Tätigkeitsbeschreibung

Developed the PSIcarlos system for optimal planning and precise balancing of crude oil
transportation.

Used technologies: Spring, Hibernate, ActiveMQ, Oracle, Apache Tomcat.

Details:
• Designed and implemented new system functions based on defined requirements.
• Prepared technical documentation.
• Close cooperation within international team. Discussed customer’s requirements.
• Technical support for system users.

Eingesetzte Qualifikationen

Oracle Database, Apache Tomcat, Hibernate (Java), Spring Framework

Java, C# Developer
PrimeSoft, Poznań
3/2014 – 5/2014 (3 Monate)
IT & Entwicklung
Tätigkeitszeitraum

3/2014 – 5/2014

Tätigkeitsbeschreibung

Developed V-Desk workflow system for document circulation.

Used technologies: WPF, WinForms, MS SQL.

Details:
• Developed (mainly optimized) service for automatic text recognition from scanned
documents and retrieving key information from documents using regular expressions.
• Developed application for document scanning and barcode recognition.

Eingesetzte Qualifikationen

Microsoft SQL-Server (MS SQL), C#, Java (allg.)

C# Developer
V-TELL, Poznań
6/2012 – 2/2014 (1 Jahr, 9 Monate)
IT & Entwicklung
Tätigkeitszeitraum

6/2012 – 2/2014

Tätigkeitsbeschreibung

Coauthor of call center system.
Used technologies: WCF, WPF, Mono, PostgreSQL, MongoDB, Asterisk.

Details:
• Designed scalable system architecture.
• Developed multithreaded WCF services.
• Implemented calling in different modes by sending requests and handling events sent
using the AMI protocol from the Asterisk PBX.
• Implemented automatic calling by using integration of the Asterisk PBX with the PostgreSQL
database to create dynamic call queues.
• Developed the predictive dialer algorithm that calculates the number of calls to be made
based on collected statistics, e.g. percentage of received calls and talk time.
• Prepared the mechanism of sending, mixing, compressing and saving recorded calls to
the database.

Eingesetzte Qualifikationen

Mongodb, Postgresql, C#

Java Software Developer
Verax Systems, Poznań
7/2011 – 2/2013 (1 Jahr, 8 Monate)
IT & Entwicklung
Tätigkeitszeitraum

7/2011 – 2/2013

Tätigkeitsbeschreibung

Developed Verax Network Management System.

Used technologies: Spring, Hibernate, Adobe Flex, Oracle, MS SQL.

Details:
• Implemented advanced plugins for detecting problems and real-time monitoring of devices
and applications such as:
- PostgreSQL and MySQL database
- Active Directory service
- VMware ESX servers and virtual machines
- .NET applications
- Windows and Unix workstations
- Cisco, MRV and Juniper routers and switches
- APC UPS devices
- Devices with undetected type

• Created a module for monitoring changes in software installed on detected devices.

Eingesetzte Qualifikationen

Microsoft SQL-Server (MS SQL), Oracle Database, Hibernate (Java), Java (allg.), Spring Framework

Zertifikate

Professional Data Engineer
Google Cloud
2023
Knowledge Graphs - Foundations and Applications
openHPI
2023
Neo4j Certified Professional
Neo4j
2023
dbt Fundamentals
dbt Labs
2023
Hands On Essentials - Data Warehouse
Snowflake
2023
Databricks Certified Associate Developer for Apache Spark 3.0
Databricks
2022
AWS Certified Big Data – Specialty
Amazon Web Services Training and Certification
2019
MapR Certified Spark Developer v2
MapR Technologies, acquired by Hewlett Packard Enterprise company in 2019
2019
Structuring Machine Learning Projects
Coursera
2018
Neural Networks and Deep Learning
Coursera
2018
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
Coursera
2018
Convolutional Neural Networks
Coursera
2018
Machine Learning
Coursera
2016

Persönliche Daten

Sprache
  • Polnisch (Muttersprache)
  • Englisch (Fließend)
  • Deutsch (Grundkenntnisse)
Reisebereitschaft
Europa
Arbeitserlaubnis
  • Europäische Union
Home-Office
bevorzugt
Profilaufrufe
129
Alter
34
Berufserfahrung
13 Jahre und 4 Monate (seit 07/2011)

Kontaktdaten

Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.

Jetzt Mitglied werden