The position exists to continuously enhance the efficiency of existing Data pipeline Job/framework and setup a best practice for Big Data projects planned on the BI & Analytics platform.
Responsibilities:
- Review and tune the code for the Spark framework to increase the throughput.
- Establish best practices and guidelines to be followed by tech teams while writing codes in Spark.
- Create a framework to continuously monitor the performance and identify the resource intensive jobs.
- Able to write code, services and components in Java, Apache Spark and Hadoop.
- Responsible for systems analysis Design, Coding, Unit testing, CICD and other SDLC activities.
- Proven experience working with Apache Spark streaming and batch framework.
- Proven experience in performance tuning Java, Spark based applications is a must.
- Knowledge of working with different file formats like Parquet , JSON and AVRO.
- Data Warehouse experience working with RDBMS and Hadoop, Well versed with concepts of change data capture and implementation.
- Hands-on experience in writing codes to efficiently handle files on Hadoop.
- Ability to work in projects following Micro Services based architecture.
- Knowledge of S3 will be a plus.
- Ability to work proactively, independently and with global teams.
- Strong communication skills should be able to communicate effectively with the stakeholders and present the outcome of analysis.
- Working experience in projects following Agile methodology is preferred.
Skills:
- At least 3 to 9 years of working experience, preferably in banking environments
- Expert knowledge of technologies e.g. Hadoop, Hive, Presto, Spark, Java, RDBMS like Teradata and File storage like S3 are necessary.
- Candidate must speak and write well.