Big Data Engineer

offline

Verfügbarkeit einsehen
0 Referenzen

auf Anfrage
61-306 Poznań
Europa
pl | en | de
29.11.2024

Kurzvorstellung

			Big Data Engineer with over 13 years of experience. A skilled problem solver passionate about the design, quality, and performance of developed solutions.
		

Qualifikationen

		 Amazon Web Services (AWS)5 J.
 Apache Cassandra
 Apache Hadoop7 J.
 Apache Kafka
 Apache Spark
 Java
 Java (allg.)6 J.
 Neo4j
 Python7 J.
 Scala9 J.
 SQL

		

Projekt‐ & Berufserfahrung

Big Data Engineer 
									Nomura, New York								

11/2022 – offen (2 Jahre, 6 Monate)

Details anzeigen

Tätigkeitszeitraum

11/2022 – offen

Tätigkeitsbeschreibung

Working on Credit and Liquidity Risk Stress Testing system.

Programming languages: Scala, Python, Java.
Used technologies: Spark, Hadoop, Dremio, ActivePivot, Iceberg, AWS.

• Refactored existing ETL jobs to improve code re-use and make the code easier to
understand, test and extend.
• Changed the partitioning of existing data sets
• Achieved huge performance improvement after optimizing existing Spark jobs by:
- using Data Frame and Dataset APIs instead of RDDs
- using built-in Spark functions instead of custom row transformations
- using aggregations that support partial aggregation
- using broadcast joins
• Implemented a tool that verifies whether the differences in the number of records and
calculated measures between consecutive days fall within specified thresholds.
• Improved unit test coverage percentage

Eingesetzte Qualifikationen

Apache Hadoop, Apache Spark, Java (allg.), Python, Scala, Amazon Web Services (AWS)

Big Data Engineer 
									Groupon, Chicago								

6/2022 – 10/2022 (5 Monate)

Details anzeigen

Tätigkeitszeitraum

6/2022 – 10/2022

Tätigkeitsbeschreibung

Worked on migration from on-prem Hadoop cluster to AWS.
Used technologies: Hadoop, EMR, Spark, Airflow, S3, Docker, Terraform.
Programming languages: Scala, Python.

• Converted multiple MapReduce jobs to Spark jobs.
• Updated Spark jobs to use DataFrames and Datasets instead of RDDs.
• Optimized existing Spark jobs.
• Created Airflow DAGs to schedule data processing.
• Deployed infrastructure using Terraform.

Eingesetzte Qualifikationen

Python, Scala, Amazon Web Services (AWS), Apache Hadoop, Apache Spark, Docker

Big Data Engineer 
									Nike, Hilversum								

3/2021 – 6/2021 (4 Monate)

Details anzeigen

Tätigkeitszeitraum

3/2021 – 6/2021

Tätigkeitsbeschreibung

Used technologies: Hadoop, Spark, Hive, Docker, Kubernetes, Airflow, Terraform, AWS, MS SQL Server, Snowflake.
Programming language: Python.

• Implemented multiple PySpark jobs running on Kubernetes cluster to transform data
from MS SQL Server and store it into S3.
• Implemented Airflow pipelines to schedule PySpark jobs and define dependencies between
them.
• Deployed infrastructure using Terraform.

Eingesetzte Qualifikationen

Apache Hadoop, Microsoft SQL-Server (MS SQL), Apache Spark, Docker, Snowflake, Amazon Web Services (AWS), Kubernetes, Python

Big Data Engineer 
									Adidas, Herzogenaurach								

5/2020 – 6/2022 (2 Jahre, 2 Monate)

Details anzeigen

Tätigkeitszeitraum

5/2020 – 6/2022

Tätigkeitsbeschreibung

Used technologies: Hadoop, Spark, Hive, Databricks, Docker, Kubernetes, EMR, S3, CloudFormation, Lambda, DynamoDB, Akka HTTP, Flask, Gunicorn.
Programming languages: Scala, Python.

• Designed and implemented a feature store for machine learning. Prepared a framework
for efficient calculation of thousands different aggregate values (features) from terabytes
of data.
• Dockerized Spark applications to run them as containers on EMR cluster in more isolated
and standardized way.
• Implemented application for serving machine learning model as REST API on Kubernetes
cluster using Flask, Gunicorn and Tensorflow Serving API. Significantly improved response
time of the API by using approximate nearest neighbor search algorithm.
• Implemented lambda function to transform new objects created in S3 bucket and store
records in DynamoDB table.
• Implemented REST API application using Akka HTTP to serve recommendations stored
in DynamoDB table.
• Optimized existing Spark applications.
• Worked with data scientists to optimize their solutions and make them production ready.

Eingesetzte Qualifikationen

Apache Hadoop, Apache Spark, Databricks, Docker, Python, Scala, Amazon Web Services (AWS), Kubernetes

Big Data Engineer 
									Nordea, Copenhagen								

1/2018 – 4/2020 (2 Jahre, 4 Monate)

Details anzeigen

Tätigkeitszeitraum

1/2018 – 4/2020

Tätigkeitsbeschreibung

Used technologies: Hadoop, Spark, Kafka, Hive, Flume, HBase, Oozie, Splunk, Ansible.
Programming languages: Scala, Python.

• Implemented report generators for Core Banking Platform using Spark.
• Implemented Spark jobs for file compaction and repartitioning to improve performance of
report generators and Hive queries.
• Implemented random data generators for the purpose of verifying the performance of Spark
applications. Analyzed outputs of performance tests and made necessary improvements.
• Worked on migration from Cloudera to MapR distribution for Hadoop.
• Used Flume to read messages from Kafka, transform them and persist into HDFS and
HBase.
• Used Sqoop to ingest data from Oracle database into HDFS.
• Implemented ETL jobs for transforming files in various formats into Avro format.
• Automated deployment of applications using Ansible which greatly reduced the number
of issues during production deployments.

Eingesetzte Qualifikationen

Apache Kafka, Python, Scala, Ansible, Apache Hadoop, Apache Spark

Big Data Engineer 
									Agata Tudek LDI, Warsaw								

1/2018 – 2/2018 (2 Monate)

Details anzeigen

Tätigkeitszeitraum

1/2018 – 2/2018

Tätigkeitsbeschreibung

Used technologies: Spark, Athena, S3, EMR, Neo4j.
Programming language: Python.

• Used PySpark and GraphFrames to run graph algorithms. Compared the performance
with Neo4j.
• Used PySpark to transform data stored in S3 and generate CSV files in order to import
them into Neo4j.

Eingesetzte Qualifikationen

Apache Spark, Python

Scala Software Engineer 
									Citi, Warsaw								

7/2017 – 11/2017 (5 Monate)

Details anzeigen

Tätigkeitszeitraum

7/2017 – 11/2017

Tätigkeitsbeschreibung

Developed application for collecting risk data from various sources and processing it in
real-time.

Used technologies: Hadoop, Spark Streaming, Kafka, Avro, Camel.

Eingesetzte Qualifikationen

Apache Hadoop, Apache Kafka, Apache Spark, Scala, Apache Camel

Java, Scala Software Engineer 
									Kantwert, Poznań								

7/2015 – 8/2017 (2 Jahre, 2 Monate)

Details anzeigen

Tätigkeitszeitraum

7/2015 – 8/2017

Tätigkeitsbeschreibung

Developed system that uses data from public sources to conclude „who-knows-who” relationships
and help companies to identify valuable relations within their existing customers.
Used technologies: Neo4j, Cassandra, Spark Streaming, Spark GraphX, Spring, Spray,
ActiveMQ, Docker, Redis, Solr.

Used technologies: Neo4j, Cassandra, Spark, ActiveMQ, Spring, Spray.

Details:
• Designed and implemented algorithm for concluding „knows” relationships between
persons using Spark.
• Designed and implemented algorithm for finding ultimate beneficial owner of company
using Spark GraphX.
• Created Neo4j Server plugin for finding shortest paths between nodes in the graph using
defined business rules.
• Implemented REST services that perform Cypher queries in order to retrieve data from
nodes and relationships.
• Implemented fast data import to Neo4j database by writing directly to the files using
batch inserter API.
• Implemented transformations of data stored in Cassandra using Spark into the format
that can be easily used to import data into Neo4j database.
• Designed and implementing synchronization between Cassandra and Neo4j using event driven architecture.
• Implemented searching nodes in the graph using Cypher queries and Lucene index.
• Data modeling.
• Configured and tuned Neo4j database.

Eingesetzte Qualifikationen

Apache Spark, Docker, Java (allg.), Scala, Apache Solr

Java Software Engineer 
									PSI Polska, Poznań								

7/2014 – 6/2015 (1 Jahr)

Details anzeigen

Tätigkeitszeitraum

7/2014 – 6/2015

Tätigkeitsbeschreibung

Developed the PSIcarlos system for optimal planning and precise balancing of crude oil
transportation.

Used technologies: Spring, Hibernate, ActiveMQ, Oracle, Apache Tomcat.

Details:
• Designed and implemented new system functions based on defined requirements.
• Prepared technical documentation.
• Close cooperation within international team. Discussed customer’s requirements.
• Technical support for system users.

Eingesetzte Qualifikationen

Oracle Database, Apache Tomcat, Hibernate (Java), Spring Framework

Java, C# Developer 
									PrimeSoft, Poznań								

3/2014 – 5/2014 (3 Monate)

Details anzeigen

Tätigkeitszeitraum

3/2014 – 5/2014

Tätigkeitsbeschreibung

Developed V-Desk workflow system for document circulation.

Used technologies: WPF, WinForms, MS SQL.

Details:
• Developed (mainly optimized) service for automatic text recognition from scanned
documents and retrieving key information from documents using regular expressions.
• Developed application for document scanning and barcode recognition.

Eingesetzte Qualifikationen

Microsoft SQL-Server (MS SQL), C#, Java (allg.)

C# Developer 
									V-TELL, Poznań								

6/2012 – 2/2014 (1 Jahr, 9 Monate)

Details anzeigen

Tätigkeitszeitraum

6/2012 – 2/2014

Tätigkeitsbeschreibung

Coauthor of call center system.
Used technologies: WCF, WPF, Mono, PostgreSQL, MongoDB, Asterisk.

Details:
• Designed scalable system architecture.
• Developed multithreaded WCF services.
• Implemented calling in different modes by sending requests and handling events sent
using the AMI protocol from the Asterisk PBX.
• Implemented automatic calling by using integration of the Asterisk PBX with the PostgreSQL
database to create dynamic call queues.
• Developed the predictive dialer algorithm that calculates the number of calls to be made
based on collected statistics, e.g. percentage of received calls and talk time.
• Prepared the mechanism of sending, mixing, compressing and saving recorded calls to
the database.

Eingesetzte Qualifikationen

Mongodb, Postgresql, C#

Java Software Developer 
									Verax Systems, Poznań								

7/2011 – 2/2013 (1 Jahr, 8 Monate)

Details anzeigen

Tätigkeitszeitraum

7/2011 – 2/2013

Tätigkeitsbeschreibung

Developed Verax Network Management System.

Used technologies: Spring, Hibernate, Adobe Flex, Oracle, MS SQL.

Details:
• Implemented advanced plugins for detecting problems and real-time monitoring of devices
and applications such as:
- PostgreSQL and MySQL database
- Active Directory service
- VMware ESX servers and virtual machines
- .NET applications
- Windows and Unix workstations
- Cisco, MRV and Juniper routers and switches
- APC UPS devices
- Devices with undetected type

• Created a module for monitoring changes in software installed on detected devices.

Eingesetzte Qualifikationen

Microsoft SQL-Server (MS SQL), Oracle Database, Hibernate (Java), Java (allg.), Spring Framework

Weitere Projekt‐ & Berufserfahrung anzeigen Weitere Projekt‐ & Berufserfahrung ausblenden

Zertifikate

							Professional Data Engineer

								Google Cloud

2023

							Knowledge Graphs - Foundations and Applications

								openHPI

2023

							Neo4j Certified Professional

								Neo4j

2023

							dbt Fundamentals

								dbt Labs

2023

							Hands On Essentials - Data Warehouse

								Snowflake

2023

							Databricks Certified Associate Developer for Apache Spark 3.0

								Databricks

2022

							AWS Certified Big Data – Specialty

								Amazon Web Services Training and Certification

2019

							MapR Certified Spark Developer v2

								MapR Technologies, acquired by Hewlett Packard Enterprise company in 2019

2019

							Structuring Machine Learning Projects

								Coursera

2018

							Neural Networks and Deep Learning

								Coursera

2018

							Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization

								Coursera

2018

							Convolutional Neural Networks

								Coursera

2018

							Machine Learning

								Coursera

2016

Persönliche Daten

Sprache

							Polnisch (Muttersprache)
Englisch (Fließend)
Deutsch (Grundkenntnisse)

							

Reisebereitschaft

Europa

Arbeitserlaubnis

Europäische Union

Home-Office

bevorzugt

Profilaufrufe

194

Alter

Berufserfahrung

							13 Jahre und 9 Monate
							(seit 07/2011)
							

Kontaktdaten

Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.

Jetzt Mitglied werden