Data Engineer Data Engineer …

Capgemini
in Singapore
Permanent, Full time
Last application, 21 Jan 21
competitive package
Capgemini
in Singapore
Permanent, Full time
Last application, 21 Jan 21
competitive package
Posted by:
Swee Cheong Woo • Recruiter
Posted by:
Swee Cheong Woo
Recruiter
In this role, you will work as a Data Engineer within the Data Analytics team in implementing an enterprise-wide strategic initiative using the Data platform with following responsibilities: • Build data ingestion pipelines to bring in wide variety of data from multiple sources within the organization as well as from social media and public data sources. • Work with cross functional teams to source data and make it available for downstream consumption.
  • At least 5 years of broad experience of working with Enterprise IT applications using below technical skillsets
  • Must have experience  in Azure databricks, wiring hive queries and No SQL (Couchbase preferred/any NoSQL)
  • Experience with at least one Hadoop distribution (Hortonworks / Cloudera / MapR) will be advantage
  • Experience in building data pipelines using batch processing with Apache Spark (Spark SQL, DataSet / Dataframe API) or Hive query language (HQL)
  • Knowledge of Big data ETL processing tools (Sqoop, Hive)
  • Experience with Hive and Hadoop file formats (Avro / Parquet / ORC)
  • Basic knowledge of scripting (shell / bash)
  • Experience with Azure Data Factory (ADF), Azure Databricks (ADB), Azure Synapse (ADW), ADSL
  • Experience of working with multiple data sources including relational databases (SQL Server / Oracle / DB2 / Netezza), NoSQL / document databases, flat files
  • Basic understanding of CI CD tools such as Jenkins, JIRA, Bitbucket, Artifactory.
  • Basic understanding of DevOps practices using Git version control
  • Ability to debug, fine tune and optimize large scale data processing jobs

 

Good to have skills:

  • Knowledge of messaging platform such as Kafka / RabbitMQ
  • Knowledge of stream processing using Spark Structure streaming / Kafka streams
  • Experience with Zeppelin or Jupyter notebook
  • Knowledge of market tools for big data processing (Attunity / StreamSets / Talend / Apache Nifi)
  • Basic understanding of visualization tools (Qlik / SaS / PowerBI)
  • At least 5 years of broad experience of working with Enterprise IT applications using below technical skillsets
  • Must have experience  in Azure databricks, wiring hive queries and No SQL (Couchbase preferred/any NoSQL)
  • Experience with at least one Hadoop distribution (Hortonworks / Cloudera / MapR) will be advantage
  • Experience in building data pipelines using batch processing with Apache Spark (Spark SQL, DataSet / Dataframe API) or Hive query language (HQL)
  • Knowledge of Big data ETL processing tools (Sqoop, Hive)
  • Experience with Hive and Hadoop file formats (Avro / Parquet / ORC)
  • Basic knowledge of scripting (shell / bash)
  • Experience with Azure Data Factory (ADF), Azure Databricks (ADB), Azure Synapse (ADW), ADSL
  • Experience of working with multiple data sources including relational databases (SQL Server / Oracle / DB2 / Netezza), NoSQL / document databases, flat files
  • Basic understanding of CI CD tools such as Jenkins, JIRA, Bitbucket, Artifactory.
  • Basic understanding of DevOps practices using Git version control
  • Ability to debug, fine tune and optimize large scale data processing jobs

Good to have skills:

  • Knowledge of messaging platform such as Kafka / RabbitMQ
  • Knowledge of stream processing using Spark Structure streaming / Kafka streams
  • Experience with Zeppelin or Jupyter notebook
  • Knowledge of market tools for big data processing (Attunity / StreamSets / Talend / Apache Nifi)
  • Basic understanding of visualization tools (Qlik / SaS / PowerBI)
Capgemini logo
More Jobs Like This
See more jobs
Close
Loading...
Loading...