- Participate in the customer s system design meetings and collect the functional/technical equirements.
- Build up data pipelines for consumption by the Data Science team.
- Experience with ETL process and tools.
- Clear understanding and experience with Python and PySpark or Spark and SCALA with HIVE irflow Impala and Hadoop and RDBMS architecture.
- Experience in writing Python programs and SQL queries.
- Experience in SQL Query tuning.
- Experienced in Shell Scripting (Unix/Linux).
- Build and maintain data pipelines in Spark/PySpark with SQL and Python or SCALA.
- Knowledge of Cloud (Azure/AWS/GCP etc..) technologies.
- Good to have knowledge of Kubernetes CI/CD concepts Apache Kafka.
- Suggest and implement best practices in data integration.
- Guide the QA team in defining system integration tests as needed.
- Split the planned deliverables into tasks and assign them to the team.
- Needs to Maintain/Deploy the ETL code and follow the Agile methodology.
- Needs to work on optimization wherever applicable.
- Good oral written and presentation skills.
Requirements
- Degree in Computer Science IT or a similar field; a Master s is a plus.
- Handson experience with Python and PySpark.
- Handson experience with Spark and SCALA.
- Handson experience with Snowflake
- Great numerical and analytical skills.
- Working knowledge of cloud platforms such as MS Azure AWS etc.
- Technical expertise in data models data mining and segmentation techniques.
Python and PySpark or Spark and SCALA, Snowflake