Cloud Engineer
- Verfügbarkeit einsehen
- 0 Referenzen
- auf Anfrage
- 0356 OSLO
- Europa
- sl | en | no
- 06.11.2024
Kurzvorstellung
Over 10 years Oracle experience prior.
Qualifikationen
Projekt‐ & Berufserfahrung
7/2023 – 1/2024
Tätigkeitsbeschreibung
The customer in the media industry had legacy services running in multiple AWS accounts with no or minimum documentation available.
• Technical due diligence of customer’s multiple AWS accounts
• Overview over services and data stored in the accounts (Python library created)
• Archiving 200 TB of S3 objects to Glacier Deep Archive with boto3
• Technologies used:
o AWS: CloudFormation, CloudFront, CloudWatch, CodeCommit, Cost Explorer, EC2, IAM, Lambda, Lightsail, RDS, Route53, Secrets Manager, SageMaker, S3, S3 Glacier
o Other: boto3, Docker, Jupyter , Pycharm, Python, shell
Python, Cloud (allg.), Iaas, Amazon Web Services (AWS)
11/2021 – 7/2023
Tätigkeitsbeschreibung
Projects:
- Oracle EDM/AWS/Cinchy integration
Making data from Oracle EDM available in data collaboration platform Cinchy. AWS Lambda periodically fetches data using EDM’s API. JSON objects land in an S3 bucket and Eventbridge triggers another Lambda to process the newly landed objects. The processed files are stored in another folder and Eventbridge is triggered again and a Lambda is started. This Lambda connects to Cinchy and creates objects and synchronizes the tables.
Data from S3 to Cinchy is moved using presigned url, which is generated by Lambda. API Gateway is used for authentication.
AWS SAM and CloudFormation builds the infrastructure in AWS.
• Creating Python library for fetching data from Oracle EDM’s REST API
• Creating infrastructure in AWS using AWS SAM and Cloudformation
• Writing pipelines in GitLab
• Working on secure connections between AWS accounts and various environments
• Technologies used:
o AWS: Lambda, IAM, API Gateway, Secrets Manager, KMS, SageMaker, S3, CloudWatch, Eventbridge, Cloudwatch Events
o Infrastructure as Code: AWS SAM, boto3, CloudFormation
o Other: GitLab, Jira, Docker, PyCharm, Jupyter Labs
- Snowflake/AWS integration
The bank introduced Snowflake in autumn 2021 and I was a part of the group (10-15 people) to integrate Snowflake with existing AWS ecosystem. My assignments were mostly on the AWS side, integration and security between AWS and Snowflake, building data pipelines between RDS and S3 and administration of GitLab runners. Everything was automated using Terraform, boto3 or CloudFormation.
• Working on integrating AWS with Snowflake: automating private connections using PrivateLink,
• Building pipelines between RDS and S3 using DMS, Lambda. Code written in CloudFormation and boto3. One source table with over 8 billion rows.
• Creating Snowflake external functions using API Gateway, Lambda, Secrets Manager – I have created an “External Function Creator” – a Lambda function that simplifies creation of Snowflake external functions for developers.
• Creation and maintenance of runners for GitLab (DataOps.Live) using EC2, Secrets Manager, CloudFormation, boto3.
• Technologies used:
o AWS: VPC, EC2, Lambda, IAM, API Gateway, Secrets Manager, KMS, RDS, DMS, SageMaker, S3, CloudWatch, Route53, PrivateLink
o Infrastructure as Code: Terraform, boto3, CloudFormation
o Other: Snowflake, bitbucket, GitLab, dbt, DataOps.Live, Jira, Docker, PyCharm, VSCode
Cloud Computing, Storage, Amazon Web Services (AWS), Linux Entwicklung, Python, Snowflake
6/2020 – 7/2021
Tätigkeitsbeschreibung
Details:
• In charge of creating and maintaining customer’s Python backend library for digesting data, saving data, preparing datasets for analytics and creating jobs for updates from Storage Account to customer’s databases.
• Architect behind the data storage and data availability: containers in Storage Account play the key role for cheap, secure and simple object storage. Parquet with Brotli for storing data. Pandas and Pyarrow used for manipulating blobs. Databricks has been tested and partially used.
• Preparing automated environments for Data Scientists: Innovation’s Jupyter Lab environment out of the box for each data scientist for seamless work – loading datasets with one function call.
• Creating API services for text analysis using Python, Flask, Container Instance, Docker and Azure APIM.
• Making customer’s datasets available externally using Python and APIM.
• Architecting scheduled data updates using Logic Apps, Docker, Container Registry and Instances.
• In charge of automation of the environments using Azure DevOps, Terraform, Docker, Ansible, GitHub and Azure CLI, as well as in charge of maintenance of environments via Terraform.
Ansible, Databricks, Docker, Iaas, Linux (Kernel), Microsoft Azure, Python
2/2020 – 5/2020
Tätigkeitsbeschreibung
Responsibilities:
• Planning and moving infrastructure (Postgres, Docker) from local servers into AWS.
• Writing Terraform scripts for provisioning AWS services like RDS, VPC, EC2, ECS, EKS, Fargate, IAM, CloudWatch, CloudTrail, CloudMap, S3, Athena, Glue
• Creating roles, users, groups in IAM
• Creating dashboards, cron tasks and logs in CloudWatch
• Auditing AWS in CloudTrail
• Using boto3 for manipulating blobs in S3
Details:
The customer decided to migrate into AWS cloud from local infrastructure and use automation tools to control the infrastructure and costs. My role was to migrate local Postgres to RDS/Postgres and move Docker containers into AWS ECS where Fargate type was used and services and tasks were provisioned to do the same job as in local environment.
Python, Amazon Web Services (AWS)
1/2020 – 2/2020
Tätigkeitsbeschreibung
Responsibilities:
• Introducing Hashicorp tools like Terraform, Consul, Vault to the customer
• Defining and explaining architecture and best practices of the tools
Details:
The customer needed help with infrastructure-as-code in IBM Cloud using Hashistack. The tools were introduced to the customer, work environment in Docker was written, Consul was introduced to extend the dynamic provisioning of multiple environments.
Iaas
6/2019 – 9/2019
Tätigkeitsbeschreibung
Responsibilities:
• Writing architectural proposal
• Creating PoCs in Azure and Amazon Web Services
Details:
The customer was looking for a solution that would have Kafka as the central technology running on internal servers. Various Bank systems would produce messages to Kafka which are be further mirrored to Azure Event Hubs where Azure Databricks can execute business logic to process them. The results of processing in Databricks are sent back to the Kafka cluster where the banks systems act as consumers and fetch results from processing.
Several PoCs are built in AWS and Azure using Kafka and following Azure services: Event Hubs, Blob Storage and Databricks. The programming language of choice is Scala (for communication with Kafka and in Databricks Notebook). Docker and Terraform were used for automation of infrastructure to assure rapid development and testing.
Databricks, Docker, Iaas, Apache Kafka
7/2018 – 2/2019
Tätigkeitsbeschreibung
Responsibilities:
• writing Terraform and Ansible scripts for dynamic provision of HDP 2.6 and HDP 3.0 clusters
• using cloud infrastructures VMWare and AWS
• integration of IBM tools with HDP: Spectrum Scale (former GPFS), BigSql, DataServer Management, SPSS Analytic Server
• provisioning HDP clusters using ansible-hortonworks from GitHub repository
• manual and automatic installs of HDP services such as Ambari, HDFS, YARN, Ranger, Hive, Spark…
• configuration and administration of HDP clusters
• Embedding Hashicorp tools (Terraform, Consul, Vault, Packer) to automate cluster provisioning
Details:
IBM’s customer’s idea is to automate creation of environments for data storage and analysis. All secrets are stored in Vault and cluster configuration is defined in Consul. The solution takes configuration and builds an HDP cluster based on the configuration. Everything is automated and dynamic. Clusters for test, production, storage, analysis,... in various sizes are provisioned.
Terraform scripts read configuration values from Consul, store secrets to Vault, and provision the infrastructure. Ansible scripts, called from Terraform, set up the HDP architecture on the infrastructure delivering a configured and functional HDP cluster ready to use.
When provisioning a default cluster, a certain amount of instances are created, HDP cluster is installed on them, connection to existing Spectrum Scale storage is done and IBM’s BigSql is installed. Various configuration in Consul allow creating various HDP clusters with different services according to the needs.
Apache Hadoop, DevOps (allg.), Software Architecture
1/2018 – 6/2018
Tätigkeitsbeschreibung
Responsibilities:
• AWS administrator – planning and building environments, automatization of processes – pay as you go
• Collecting data sources and doing feature engineering
• Data analysis in Python (Pandas, NumPy) using Jupyter and PyCharm
• Introducing Data Science and AWS to the researchers, helping them use AWS, Jupyter, Linux, Python…
Data Science, Python, Amazon Web Services (AWS)
11/2016 – 12/2017
Tätigkeitsbeschreibung
Responsibilities:
• Introducing Hadoop stack to the organization
• Designing and building Hadoop and Spark clusters in AWS
• Develop and drive technical roadmap for data and development infrastructure
• Defining knowledge roadmaps for internal employees in the field of Hadoop
• Machine Learning (evolutionary algorithms, feature engineering, neural networks) using Python
• Testing new technologies
Details:
As a Big Data developer focus was on all levels of the stacks. I have built two clusters in AWS, one is a pure Hadoop cluster (HDP 2.6) and the other one is a Spark cluster with separate storage in S3. The latter one launches on demand with dynamic resources. My tasks are architecture, maintenance and upgrades of the clusters. Both clusters rely heavily on Spark as the computational engine where I am mostly using Scala (for data integration), Python (for data science - ML) and SparkSQL.
Hive is the data warehouse on top of HDFS to provide users the SQL API.
Tested new visualization tools (Zeppelin, Druid, re:dash, superset…) to find best possible stack.
Key technology terms: Hortonworks, Ambari, HDFS, MapReduce2, YARN, Zookeeper, Hive, Zeppelin, Spark, Storm, Ranger, Redis, Flume, Sqoop, Druid, scikit learn, Jupyter.
PyCharm and Jupyter were used for the data science work. Main focus was on feature engineering, machine learning, evolutionary algorithms and neural networks.
Apache Hadoop, Python, Scala
11/2015 – 10/2016
Tätigkeitsbeschreibung
Responsibilities:
• Big Data Full Stack Developer
• Research, recommend and implement Big Data technologies
• Develop and drive technical roadmap for data and development infrastructure
• Administering Hortonworks & Apache Hadoop clusters in the cloud (OpenStack)
• Architect, design, data modelling and recommend data architecture
• Preparing analytical and data visualization environments
• Data analysis using SparkSql, Pyspark, Java, Hive and SparkR in Zeppelin, RStudio
Details:
University [...] is involved in a project owned by [...] (-Hyperlink entfernt-) and the University’s task is to provide infrastructure to cover researchers’, students’ and customers’ needs with working on distributed systems with “Big Data” technologies.
My tasks include decision making around technologies, cluster architecture, cluster set up on Switch OpenStack cloud, cluster configuration; testing, preparing and introducing the technologies to the users.
Key technology terms: OpenStack, Ambari, HDFS, MapReduce2, YARN, Zookeeper, Hive, Zeppelin, Spark, Storm, Slider, Ranger, Redis, Elastic, Flume, MySql, Sqoop, Cloudbreak.
Cluster computing framework offered to the users is Apache Spark. Focus is on Spark SQL, PySpark and SparkR.
CLI, Apache Zeppelin or RStudio are available on client nodes for the users to interact with the cluster. Statistical tools like Gauss, Strata or Matlab on Big Data technologies are under testing and evaluation.
I am maintaining 5 clusters, four of them are Hortonworks distributions and one is Apache. The main tool for administration is Ambari, some services (Zeppelin, Spark) are installed manually and maintained from the CLI.
With some eager students, we have started a Big Data Club at the University where the goal is bring Big Data closer to the business students.
Apache Hadoop, Software Architecture, Python
Ausbildung
Norwegian Business School
Oslo, Norway
University of Ljubljana
Ljubljana, Slovenia
Über mich
Over 4 years’ experience with open source technologies (big data). Installation, administration and configuration of Hadoop ecosystems - Apache and Hortonworks distributions in the AWS. Building and configuring Spark clusters and writing Spark code (Scala, PySpark and SparkR).
Weitere Kenntnisse
• Azure: Storage Account (blob), Key Vault, VM, Container Registry and Instances, API Management, Logic Apps, Databricks, Azure API for Python, Service Principal, Event Hubs.
• Experience with Big Data technologies: YARN, MapReduce2, Hive, HUE, Zookeeper, Pig, Oozie, Elastic, Kibana, Flume, Solr, Sqoop, Spark, Ambari, Zeppelin, Storm, Redis, Oracle Big Data Connectors, Oracle NoSql.
Persönliche Daten
- Slowenisch (Muttersprache)
- Englisch (Fließend)
- Norwegisch (Fließend)
- Deutsch (Grundkenntnisse)
- Europäische Union
- Schweiz
Kontaktdaten
Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.
Jetzt Mitglied werden