Booking NL

Machine Learning Engineer (For independent contractors)

Posted May 7, 2026
Project ID: 12179-1
Location
Amsterdam, NH
Hours/week
40 hrs/week
Timeline
1 year
Starts: May 7, 2026
Ends: May 6, 2027
Payrate range
50 - 115 €/hr

About the role


As a Machine Learning Engineer in the ML Production (RS) team, you will design, build and operate the core backend services that power our ML inference platform at Booking.com that is used by the entire company. You’ll work mainly with JVM-based services, optimizing core ML inference capabilities, Kubernetes on-prem and AWS EKS, and Graphite/Grafana-based observability stack, ensuring our platform is reliable, efficient and easy for other teams to integrate with.



You’ll collaborate closely with ML engineers, scientists and platform teams to:


  • run and optimize model inference workloads,


  • evolve our cloud‑native and hybrid infrastructure,


  • and drive MLOps best practices for model deployment, monitoring and ML model lifecycle management.




Key responsibilities


  • Design, implement and operate high‑throughput, low‑latency ML serving services in Scala/Java and other JVM-based languages.


  • Profile and optimize CPU/GPU and memory usage for model inference services; run performance benchmarks, load tests and capacity experiments to keep latency and cost under control.


  • Build and maintain distributed systems for online and offline predictions, including APIs, async/batch jobs and client libraries.


  • Develop and run services on Kubernetes (AWS EKS/BKS), including containerization, deployment pipelines, autoscaling and rollout strategies.


  • Own and improve cron‑based and scheduled jobs (e.g. housekeeping, maintenance, batch predictions, data migrations) running in Kubernetes / cloud environments.


  • Implement robust observability (Graphite metrics, logging, alerts and Grafana dashboards) for inference services, experiments, and critical platform components.


  • Contribute to MLOps practices: model versioning, rollout strategies (shadowing, canaries, traffic shifting), health checks, automated tests and CI/CD for ML services.


  • Participate in on‑call and incident response, drive root‑cause analysis and implement long‑term reliability and resilience improvements.


  • Collaborate with ML practitioners and product teams to understand use cases, translate them into robust serving solutions, and provide guidance on best ways to use RS.


  • Contribute to technical design docs, runbooks and standards, and share knowledge through reviews, mentoring and internal talks.




Required qualifications


  • Solid professional experience (typically 5+ years) as an ML / Software engineer building and operating production services.


  • Familiarity with MLOps concepts and tooling: model deployment, monitoring, experimentation, CI/CD for ML services.


  • Strong programming skills in Java and/or Scala (or another JVM language) and good knowledge of concurrent programming and performance tuning.


  • Proven experience with distributed systems (e.g. microservices, RPC, caching, queues, streaming) and designing for reliability and scalability.


  • Good understanding of CPU/GPU/memory constraints, profiling, and performance benchmarking of server-side applications.


  • Hands-on experience running services on Kubernetes (preferably AWS EKS or similar managed K8s): containerization, deployments, rollbacks, autoscaling.


  • Experience writing and maintaining cron jobs / scheduled workloads (e.g. for batch predictions, housekeeping, data pipelines) in a production environment.


  • Practical experience with observability tooling, ideally Graphite for metrics and Grafana for dashboards and alerting.


  • Comfortable working in a Linux-based environment and with common cloud primitives (networking, load balancers, IAM, storage).


  • Strong communication skills and ability to collaborate with ML scientists, engineers and product stakeholders.


  • Continuously look for opportunities to improve how we and our users work, from tooling and automation to processes and workflows, and drive those improvements with conviction




Nice to have


  • Experience with ML serving platforms (e.g. custom platforms similar to RS, SageMaker endpoints, Ray or other AI clouds).


  • Experience with one or more popular ML serving engines (e.g. Triton, TorchServe, ONNX )


  • Experience with hybrid cloud or large bare‑metal + cloud deployments and related cost/performance trade‑offs.


  • Exposure to Spark or other distributed compute engines used for batch or async inference.


  • Familiarity with ML monitoring / model observability tools.


  • Background in Machine Learning Engineering (feature pipelines, model formats, etc.) or Site Reliability Engineering for data/ML platforms.

Similar projects

+ Search all projects