Machine Learning Engineer - Prodigy Labs a UST Company
  • Toronto, Ontario, Canada
  • via JobGet
-
Job Description

We are looking for a Machine Learning Engineer to join our team to help build systems that accelerate the development and deployment of machine learning models, especially large language models (LLMs). You will partner closely with Machine Learning researchers and internal users to understand requirements and apply your own domain expertise to build high performance and reusable APIs.The ideal candidate is someone who has strong ML fundamentals and can also apply them in real production settings. In particular, this role has a core focus on optimizing inference and fine tuning for LLMs. They should also be comfortable with infrastructure and large scale system design, as well as diagnosing both model performance and system failures.You will:ResponsibilitiesArchitect/Enable distributed compute aligning workloads to Small/Mid/High end GPUs.Leverage appropriate storage hardware and data formats to improve read/re-read efficiency.Identify and remediate latency contributors esp. IO bottlenecks, Inefficient Data shuffling, under/over utilized compute.Scale models by employing Distributed training using Data / Model Parallelism techniques. Parallelize inference processing to improve prediction latency.Provide Subject Matter Expertise in Graph and Vector databases for a variety of use cases that include Knowledge Graphs, RAG etc.Implement LLM observability and monitoring solutions.Required Education and ExperienceDegree in Computer Science or EngineeringPrior Experience with:-Docker, Kubernetes, and containerization.Distributed systems.Databricks MLMachine Learning EngineeringCloud (Azur Preferred)Expert level – Python, SQLPreference will be given to candidates who in addition to required experience have:- Experience/Expertise with LLM Fine tuning, LLM Ops, Model Evaluation and Prompt EngineeringExperience (or knowledge of) Mosaic ML, Ray Framework.Experience with Lang Chain or LlamaIndexExperience with any vector database.Job Specifications:-Authorities, Impact, RiskInfluence the data, AI and cloud journey for the bank. Influence the Sustainability roadmap for the bank.ImpactRevenue generation thru New Business for Alternative DataInnovation6 years of AI, Big Data and cloud expertise3-4 years of Alternative data experienceRiskMitigate reputation risk thru AI driven Data Quality to ensure highest quality data and services are offered to clientsMandatory Skillsets:-2+ years of experience building machine learning training pipelines or inference services in a production setting.Experience with LLM deployment, fine tuning, training, prompt engineering, etc.Experience with LLM inference latency optimization techniques, e.g. kernel fusion, quantization, dynamic batching, etc.Experience with CUDA, model compilers, and other model-specific optimizations.PreferredExperience working with a cloud technology stack (eg. Azure or AWS).Experience building, deploying, and monitoring complex microservice architectures.Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g. terraform).Experience with LLMs, MLopsExperience with distributed notebook environments like DatabricksExperience building AI driven Data Quality frameworks and other data governance tools and capabilitiesExperience building meta data driven AI and statistical models for repeatable insight generationExperience building front to back data pipelines comprising of data ingestion , enrichment , data quality , Analytics and reportingExperience with Agile development methodologyExperience with company KPIs and back testing of alternative data factors against company KPIs.Experience with NLP techniques and transfer learning frameworks like BERTExperience with using HuggingFace Model Artifacts

;