As a ETL Pentaho Engineer you will be responsible for extracting and ingesting data from websites using web crawling tools. In this role you will own the creation process of these tools services and workflows to improve crawl/ scrape analysis reports and data management.
We will rely on you to test the data and the scrape to ensure accuracy and quality. You will own the process to identify and rectify any issues with breaks as well as scale scrapes as needed.
Job Responsibilities:
- Should be able to perform data extraction data loading solving the errors checking filter criteria.
- Understanding the steps input output and connection to database.
- Understanding on job/transformation inspecting the data.
- Understanding on salesforce basics fields objects and lookups.
- Knowledge on schema database views.
- Provide technical knowledge/expertise to support the requirements definition design and development of data warehouse components/subsystems specifically ExtractTransform Load or ExtractLoadTransform technologies and implementations and/or Change Data Capture technologies and implementations.
- Getting fallout report and analyzing the bugs and errors should understand how to get output file.
- Research design develop and modify the ETL and related database functions implementing changes/enhancements to the system.
- Design ETL processes and develop source to target transformations and load processes.
Requirements
- Bachelor s degree in computer science or a related field or the equivalent demonstrated experience
- Must have at least 2 years of experience in Pentaho
- Experience running large scale web scrapes
- Experience with Linux/UNIX HTTP HTML JavaScript and Networking
- Experience with techniques and tools for crawling extracting and processing data.
- Great communication skills (written and Spoken in English)
Good to have:
- Familiarity with system monitoring/administration tools
- Familiarity with version control opensource practices and code review
- Familiarity with Python