Telecom·Data Annotation Service·AU / NZ

Speech transcription and speaker diarisation at call-centre scale

Time-aligned transcripts, speaker labels, and acoustic-event tags for 12,000 hours of contact-centre audio – the training spine of an ASR + analytics stack.

Customer service agent wearing a headset at a contact-centre workstation
12K
Hours transcribed
4.1%
Word error rate

Challenge

A contact-centre analytics platform serving banks and telcos in Australia and New Zealand needed bespoke training data for an in-house ASR plus speaker diarisation stack – their off-the-shelf cloud transcription was struggling on accented English, code-mixing, and overlapping speech.

Privacy requirements ruled out exporting audio to a generic vendor, and their compliance team had blocked three previous proposals on data-residency grounds.

Approach

We provisioned an on-shore-processed annotation pipeline with audio held in the client's Sydney-region storage throughout, accessed by NDA-bound annotators through a hardened VDI. Annotation covered word-level timestamps, speaker labels, sentiment tags, and acoustic events (silence, hold music, IVR transitions).

A weekly calibration session with the client's ML team kept the labelling guidelines aligned with their evolving model behaviour.

Outcome

Delivered 12,000 hours of transcribed and diarised audio over six months at a measured WER of 4.1% on a 500-hour validation hold-out – beating the client's 5.5% acceptance threshold.

Their production ASR replaced the third-party cloud service three months ahead of plan, cutting per-minute transcription cost by 64% and unblocking a contractual analytics deliverable for a top-three Australian bank.

Let's build what's next

Share your challenge – AI, data, or infrastructure. We'll scope your project and put the right team on it.