In this role, you will work as a Data Engineer within the Data Analytics team in implementing an enterprise-wide strategic initiative using the Data platform with following responsibilities: • Build data ingestion pipelines to bring in wide variety of data from multiple sources within the organization as well as from social media and public data sources. • Work with cross functional teams to source data and make it available for downstream consumption.
- At least 5 years of broad experience of working with Enterprise IT applications using below technical skillsets
- Must have experience in Azure databricks, wiring hive queries and No SQL (Couchbase preferred/any NoSQL)
- Experience with at least one Hadoop distribution (Hortonworks / Cloudera / MapR) will be advantage
- Experience in building data pipelines using batch processing with Apache Spark (Spark SQL, DataSet / Dataframe API) or Hive query language (HQL)
- Knowledge of Big data ETL processing tools (Sqoop, Hive)
- Experience with Hive and Hadoop file formats (Avro / Parquet / ORC)
- Basic knowledge of scripting (shell / bash)
- Experience with Azure Data Factory (ADF), Azure Databricks (ADB), Azure Synapse (ADW), ADSL
- Experience of working with multiple data sources including relational databases (SQL Server / Oracle / DB2 / Netezza), NoSQL / document databases, flat files
- Basic understanding of CI CD tools such as Jenkins, JIRA, Bitbucket, Artifactory.
- Basic understanding of DevOps practices using Git version control
- Ability to debug, fine tune and optimize large scale data processing jobs
Good to have skills:
- Knowledge of messaging platform such as Kafka / RabbitMQ
- Knowledge of stream processing using Spark Structure streaming / Kafka streams
- Experience with Zeppelin or Jupyter notebook
- Knowledge of market tools for big data processing (Attunity / StreamSets / Talend / Apache Nifi)
- Basic understanding of visualization tools (Qlik / SaS / PowerBI)
- At least 5 years of broad experience of working with Enterprise IT applications using below technical skillsets
- Must have experience in Azure databricks, wiring hive queries and No SQL (Couchbase preferred/any NoSQL)
- Experience with at least one Hadoop distribution (Hortonworks / Cloudera / MapR) will be advantage
- Experience in building data pipelines using batch processing with Apache Spark (Spark SQL, DataSet / Dataframe API) or Hive query language (HQL)
- Knowledge of Big data ETL processing tools (Sqoop, Hive)
- Experience with Hive and Hadoop file formats (Avro / Parquet / ORC)
- Basic knowledge of scripting (shell / bash)
- Experience with Azure Data Factory (ADF), Azure Databricks (ADB), Azure Synapse (ADW), ADSL
- Experience of working with multiple data sources including relational databases (SQL Server / Oracle / DB2 / Netezza), NoSQL / document databases, flat files
- Basic understanding of CI CD tools such as Jenkins, JIRA, Bitbucket, Artifactory.
- Basic understanding of DevOps practices using Git version control
- Ability to debug, fine tune and optimize large scale data processing jobs
Good to have skills:
- Knowledge of messaging platform such as Kafka / RabbitMQ
- Knowledge of stream processing using Spark Structure streaming / Kafka streams
- Experience with Zeppelin or Jupyter notebook
- Knowledge of market tools for big data processing (Attunity / StreamSets / Talend / Apache Nifi)
- Basic understanding of visualization tools (Qlik / SaS / PowerBI)