Overview:
The Databricks Spark Developer plays a crucial role in harnessing the power of data using Databricks and Spark to develop and maintain efficient data pipelines. They are responsible for implementing scalable and reliable solutions that enable datadriven decisionmaking within the organization.
Key Responsibilities:
- Designing and implementing robust ETL processes using Databricks and Spark
- Developing and optimizing data pipelines for largescale data processing
- Collaborating with data engineers and data scientists to support their data infrastructure needs
- Building and maintaining data warehouse solutions to support business analytics
- Performing data modeling and optimization to ensure efficient data storage and retrieval
- Troubleshooting and resolving performance issues with data infrastructure and pipelines
- Implementing security and data governance best practices within the data platform
- Automating data quality checks and ensuring data consistency and accuracy
- Collaborating with crossfunctional teams to understand data requirements and deliver solutions
- Monitoring and maintaining the health of data pipelines and infrastructure
- Documenting technical design and architecture of data solutions
- Participating in code reviews and providing constructive feedback to peers
- Staying updated with the latest advancements in Databricks and Spark technologies
- Providing technical guidance and mentorship to junior team members
Required Qualifications:
- Bachelors degree in Computer Science Engineering or a related field
- Proven experience in developing data pipelines using Databricks and Spark
- Proficiency in ETL processes and data warehousing concepts
- Strong SQL skills with the ability to write complex queries for data manipulation and analysis
- Advanced programming skills in Python for data processing and manipulation
- Experience in data modeling and optimizing data storage for performance
- Deep understanding of big data technologies and distributed computing concepts
- Ability to troubleshoot and optimize data pipeline performance for efficiency and reliability
- Knowledge of data governance security and compliance best practices
- Excellent communication and collaboration skills to work effectively in a team environment
- Strong analytical and problemsolving abilities to tackle complex data engineering challenges
- Ability to multitask and prioritize tasks in a fastpaced and dynamic work environment
- Experience with cloud platforms such as AWS Azure or GCP is a plus
- Certifications in Databricks and Sparkrelated technologies are desirable
- Experience in Agile development methodologies and version control systems
Experience Required:
Data Engineering resource with Databricks (Databricks develops a webbased platform for working with Spark) experience with Python coding background & SQL
spark,python,data infrastructure,data modeling