Now Book It logo

Production Support Engineer

Now Book It
Department:Product Design
Type:REMOTE
Region:Australia
Location:Sydney, New South Wales, Australia
Experience:Entry level
Estimated Salary:A$90,000 - A$130,000
Skills:
DATADOGAZUREAWSKUBERNETESEKS.NETC#TYPESCRIPTPOSTHOGSQLTERRAFORMCLAUDE CODE
Share this job:

Job Description

Posted on: June 2, 2026

About Now Book It

Now Book It is a hospitality SaaS platform serving 11,000+ venue operators across Australia, New Zealand, Canada, and the US. We help restaurants fill seats, manage demand, monetise events, and retain diners through a reservation and venue management platform built on flat subscription pricing, no per-diner commissions, and genuine venue ownership of guest data. We're mid-transformation: actively modernising our stack, embedding AI-augmented tooling across engineering, and building the platform foundations to support significant product growth.

Why this role matters

Production stability is not an afterthought at Now Book It — it's a commercial imperative. With 11,000+ venues relying on the platform to seat diners and run their businesses, every incident that goes undetected or unresolved slowly erodes the trust we've spent years building. This role exists to make sure that doesn't happen.

The Production Support Engineer owns the health of the live environment. You'll be the person with the deepest operational picture of what's actually happening in production at any given moment: what's degraded, what's trending the wrong way, what's a signal versus noise, and what needs to be escalated versus resolved quietly before anyone notices. That's a combination of technical depth, good judgement, and a low tolerance for leaving problems for someone else.

Critically, this role is designed to protect the throughput of the Core and Grow engineering pods. When production issues pull engineers off delivery work, velocity suffers and the transformation slows. Your job is to absorb that load, drive resolution, and hand back clean context so product engineers can stay focused. You'll have high autonomy to shape how monitoring, alerting, and incident response works at NBI — and the expectation that you'll use it.

Reports to: Platform, Data and Security Lead.

Key Responsibilities

Production Stability and Incident Management

  • Own the live environment: monitor platform health across services, proactively identify degradation patterns, and act before issues become incidents.
  • Lead investigation and resolution of production incidents, working across the stack from frontend behaviour to backend services, infrastructure, and third-party integrations.
  • Act as Level 3 escalation for the Australian customer support team, providing technical depth and clear resolution paths for issues that cannot be resolved at earlier tiers.
  • Maintain and improve runbooks, post-incident reviews, and resolution documentation so that knowledge compounds and repeat incidents are eliminated, not just closed.

Observability and Platform Health Reporting

  • Own and evolve NBI's observability posture using Datadog for infrastructure and application monitoring, and PostHog for product analytics and usage signals.
  • Design and maintain dashboards, alerts, and health reports that give the engineering leadership and support teams a clear, real-time picture of platform performance.
  • Identify gaps in current observability coverage and drive improvements proactively, without waiting to be asked.
  • Contribute to SLA and uptime reporting with accurate, well-contextualised data.

AI-Augmented Support Operations

  • Leverage AI tooling, including Claude Code, as a core part of how you work: accelerating investigation, identifying patterns across incidents, and drafting resolution paths faster than manual analysis allows.
  • Identify repeating support patterns that are candidates for automation or self-serve resolution, and work with engineering pods to reduce recurring load.
  • Bring a continuous improvement mindset: if something takes you an hour today, you should be working to reduce it to minutes.

About you

You're a technically grounded engineer who finds genuine satisfaction in keeping complex systems stable and well-understood. You operate well with autonomy, communicate clearly under pressure, and have a low tolerance for leaving ambiguity in production. You're curious enough to dig past the obvious cause of an incident, disciplined enough to write the post-mortem properly when it's over, and pragmatic enough to reach for AI tooling when it gets you to the answer faster.

Key Skills and Experience

Essential

  • ✅ Hands-on experience in a production support, site reliability, or platform engineering role within a SaaS environment
  • ✅ Proficiency with Datadog or a comparable observability platform (dashboards, alerting, log analysis, APM)
  • ✅ Experience investigating and resolving incidents across application, infrastructure, and integration layers
  • ✅ Solid understanding of cloud infrastructure across both Azure and AWS, including serverless patterns and containerised workloads (Kubernetes / EKS)
  • ✅ Comfortable reading and debugging .NET / C# codebases or TypeScript services without needing to be a specialist in either
  • ✅ Demonstrated use of AI tooling (such as Claude Code or equivalent) as a core part of engineering or support workflow, not an occasional aid
  • ✅ Strong written communication: clear incident updates, post-mortems, and escalation notes under pressure
  • ✅ Experience working directly with customer-facing support functions as a technical escalation tier

Highly Desirable

  • ✔ Experience with PostHog or equivalent product analytics tooling
  • ✔ Familiarity with React or React Native frontends sufficient for browser-side debugging
  • ✔ Working knowledge of SQL (MSSQL or PostgreSQL) for data-level investigation
  • ✔ Experience with Terraform or infrastructure-as-code for understanding environment configuration
  • ✔ Prior experience in a hospitality, marketplace, or high-availability consumer SaaS environment

What's in it for you

You'll own something real. This isn't a role where you watch dashboards and wait for tickets — you'll shape how NBI thinks about and operates its live environment, with the mandate and autonomy to build something genuinely better. You'll be working on a platform that restaurants across Australia, New Zealand, Canada, and the US depend on every service, every day.

You'll join the team during an active and deliberate transformation, which means the decisions you make now about observability, alerting, and incident response will become the foundation the business scales on. That's a meaningful technical legacy to contribute to.

The tooling is modern and improving: Datadog, PostHog, GitLab, Linear, Claude Code. The team is experienced, fully remote, and distributed across Australia's major cities.

Why join us

Now Book It is built around the belief that the best outcomes come when people have clear purpose, genuine autonomy, and the support to do their best work. We're a team that cares about craft, moves with intent, and is honest about what we're building toward.

We're committed to building a team that reflects the diversity of the communities and industries we serve. We welcome applications from people of all backgrounds, experiences, and perspectives.

Originally posted on LinkedIn

Apply now

Please let the company know that you found this position on our job board. This is a great way to support us, so we can keep posting cool jobs every day!

DesignRemoteJobs.com logo

DesignRemoteJobs.com

Get DesignRemoteJobs.com on your phone!