Specialist, Data Platform Engineer

Date: Oct 1, 2021

Location: North York, ON, CA, M9L 1N7

Company: Apotex


About Apotex Inc.

Apotex Inc. is a proudly Canadian, global pharmaceutical company that produces high-quality, affordable medicines for patients around the world. Apotex employs almost 8,000 people worldwide in manufacturing, R&D, and commercial operations.  Apotex Inc. exports to more than 100 countries and territories and operates in more than 45 countries, with a significant presence in Canada, the US, Mexico, and India. Through vertical integration, Apotex is comprised of multiple divisions and affiliates including Apotex Inc., focused on generics; Apobiologix, a division of Apotex Inc. focused on biosimilar development; Aveva, an affiliate of Apotex Inc. fully integrated global developer and manufacturer of complete transdermal solutions; Apotex Consumer Products, a division of Apotex Inc. focused on brand name products; and Global Active Pharmaceutical Ingredients (GAPI), a division of Apotex Inc. focused on the manufacturing of active pharmaceutical ingredients (API) for Apotex and third parties. For more information visit: www.apotex.com.


Job Summary

The Engineer, Platform Data is responsible for the expanding and optimizing the data pipeline architecture as well as optimizing data flow and collection of cross-functional teams.  The Platform Data Engineer will support the data analysis on data intiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects.

Job Responsibilities
  • Design, build, Optimize and Maintain data pipeline.
  • Optimize Data Ingestion Infrastructure Validation, understanding of Source Systems, Source and Target Data Model Architecture artifacts
  • Document Technical Data file standard in Data Lake (AVRO, ORC, Parquet, etc.), HDFS Build the pipeline to push Enterprise + Semantic data to Azure Synapse Analytics
  • Build Enterprisedata lake/hub(Raw, trusted/Curated, Data Provisioning Layer), Data Warehouse (Subject Areas, Logical and Physical Data marts, and Data Labs) for Reporting and Digital Innvocation capabilities.
  • Document HIVE/ SparkSQL meta-data store standards, & Policies to access the data in Data LakeAssemble large, complex data sets that meet functional / non-functional business requirements.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
  • Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
  • Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
  • Create data tools for team members to assist them in building and optimizing analytics production.
  • Work with data and analytics experts to strive for greater functionality in our data systems.
  • Unit testing, promote tested pipelines to UAT, & PROD environments, Code Productionization and transitioned to Prod Support Analyst teams
  • Build data transformation routines to flatten and denormalize data into quarriable data sets and Provide 3rd level Prod Support
  • Implement the directory structure and namespace for storing transformed + curated data, for both current state plus historical views
  • Implement logic and data transformation scripts to convert from raw to semantic / canonical form of data and Implement Data Quality Rules using HIVE/Spark SQL
  • Implement HIVE / Spark meta-layers and data frames and enable the query access to Data filesWorks in a safe manner collaborating as a team member to achieve all outcomes.
  • Works in a safe manner collaborating as a team member to achieve all outcomes.
  • Demonstrate Behaviours that exhibit our organizational Values: Collaboration, Courage, Perseverance, and Passion.
  • Ensure personal adherence with all compliance programs including the Global Business Ethics and Compliance Program, Global Quality policies and procedures, Safety and Environment policies, and HR policies.
  • All other relevant duties as assigned.
Job Requirements
  • Education
    • Minimum Undergraduate degree in a related field (Computer Science, Engineering or Sciences) or related work experiences.
  • Knowledge, Skills and Abilities
    • Knowledge of EDP Architecture, Reference Data Model (ADRM), Hadoop, HDFS, & Linux directory & file Management concepts & code syntax
    • Good understanding of Azure Data storage options (ADLS, HDInsight, structure & Namespace)
    • Knowledge of file standards (AVRO, Parquet, ORC, etc.), HIVE Meta stores, SPARK, Scala, Python & other process for persistent Data Frames to access data.
    • Knowledge in HIVE, Spark (Scala/Python), Data Lake Infrastructure tools & Techniques
    • Experience with Data Quality tools that can be used to convert data quality business rule logic into HIVE or Spark query language for execution in the Data Lake
    • Experience with ETL/ELT -type data integration tools that can be used to author HIVE QL or Spark SQL code through a visual interface (i.e., low-code / no-code techniques for authoring data transformation pipelines)
    • Experience using Data Ingestion/ ETL / ELT or Change Data Capture (CDC) software in an HDFS environment
    • Experience in using variety of data stores including Azure Data Lakes, SQL Database, Azure Data Warehouse and Azure Synapse
    • Experience with data warehouses, data mart creation, and data mart access control and data provisioning
    • Experience in implementing the Enterprise Data Models, Business Data Models, Logical and Physical Data Marts Models, and Sandboxes
    • Knowledge of modern batch and real-time file transfer protocols, e.g., Kafka, Apache Ni-fi, Storm, a plus.
    • Knowledge/Experience of SAP Data Ingestion Tools, Modern Data Lake Management, Azure
  • Experience
    • 5+ years Data Modelling experience
    • 10+ years of hands on experience in working with one or more of the following: SQL, Oracle, ETL, and database diagnostic tools;
    • Informatica tools sets (Enterprise Data Catalog, Data Quality, and AXON Data Governance)
    • Azure Data Engineer Certification is preferred




COVID-19 Update:

We have adapted our recruitment strategy to ensure our staff and applicants are safe by conducting our interviews and onboarding online.

Other measures Apotex has put into place include (but are not limited to):

  • staggering employee shifts to reduce the size of work groups
  • modifying our cafeteria space to enhance social distancing, the implementation of additional cleaning and sanitization routines
  • robust self-assessment and screening tools
  • non-surgical masks for employees working in GMP areas
  • travel restrictions
  • strict visitor screening protocol

It is important to note that while these are our protocols at this time, they are subject to change based on guidelines and regulations put in place by local and global government agencies.

For up-to-date information about Apotex’s ongoing efforts in response to COVID-19, visit: https://www1.apotex.com/global/covid-19-update or follow us on LinkedIn and Twitter



At Apotex, we are committed to fostering an inclusive, accessible work environment, where all employees feel valued, respected and supported.
Apotex offers accommodation for applicants with disabilities as part of its recruitment process. If you are contacted to arrange for an interview or testing, please advise us if you require an accommodation.