Data Engineer
As a Principal Data Engineer your responsibilities will include:
- Design and build data pipelines to process terabytes of data
- Orchestrate in Airflow the data tasks to run on Kubernetes/Hadoop for the ingestion processing and cleaning of data.
- Create Docker images for various applications and deploy them on Kubernetes
- Design and build best in class processes to clean and standardize data.
- Troubleshoot production issues in our Elastic Environment
- Tuning and optimizing data processes
Advancing the team s DataOps culture (CI/CD Orchestration Testing Monitoring) and building out standard development patterns
- Drive innovation by testing new technology and approaches to continually advance the capability of the data engineering function.
- Drive efficiencies in current engineering processes via standardization and migration of existing onpremise processes to the cloud
- Ensuring Data Quality building best in class data quality monitoring that ensure that all data products exceed customer expectations.
Required Qualifications:
- Computer Science bachelor s degree or similar.
- Good understanding of Data Modelling techniques i.e. DataVault Kimble Star
- Excellent understanding of ColumnStore RDBMS (DataBricks Snowflake Redshift Vertica Clickhouse)
- Good experience handling realtime near realtime and batch data ingestions
- Hands on experience on the following technologies:
- Developing processes in Spark
- Writing complex SQL queries f
- Building ETL/data pipelines
- Exposure to Kubernetes and Linux containers (i.e. Docker)
- Related/complementary open source software platforms and languages (e.g. Scala Python Java Linux)
- Proven track record of designing effective data strategies and leveraging modern data architectures that resulted in business value
- Experience building cloudnative data pipelines on either AWS Azure or GCP following best practices in cloud deployments
Strong DataOps experience (CI/CD Orchestration Testing Monitoring)
- Strong experience leading and developing data engineering teams
Demonstrated effective interpersonal influence collaboration and listening skills
Strong stakeholder management skills
Excellent time management organizational and prioritization skills with ability to balance multiple priorities.
Preferred Qualifications:
Experience with data tokenization and different techniques and tools i.e. DataVant Protegrity
Experience with Azure Data Factory Databricks and Snowflake
- Experience with Apache Spark and related Big Data stack and technologies PySpark Scala
Experience working with Apache Kafka building appropriate producer/consumer apps
Experience working with Kubernetes and Docker and knowledgeable about cloud infrastructure automation and management (e.g. Terraform)
Experience working in projects with agile/scrum methodologies
Familiarity with production quality ML and/or AI model development and deployment.
Healthcare industry knowledge and experience with exposure to EDI HIPAA HL7 and FHIR integration standards
Kubernetes, Docker