Speaker diarization

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS.

Github

Speaker diarization is the process of partitioning an audio stream containing human speech into homogeneous segments according to the identity of each speaker. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity. It is used to answer the question “who spoke when?”

image.png

image.png

With the increasing number of broadcasts, meeting recordings and voice mail collected every year, speaker diarisation has received much attention by the speech community. Seaker diarization is an essential feature for a speech recognition system to enrich the transcription with speaker labels.

Speaker diarization is used to increase transcript readability and better understand what a conversation is about. Speaker diarization can help extract important points or action items from the conversation and identify who said what. It also helps to identify how many speakers were on the audio.

In this tutorial we consider how to build speaker diarization pipeline using pyannote.audio and OpenVINO. pyannote.audiois an open-source toolkit written in Python for speaker diarization. Based on PyTorch deep learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. You can find more information about pyannote pretrained models in model card, repo and paper.

Prerequisites

!pip install -r requirements.txt
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Ignoring torchaudio: markers 'sys_platform == "darwin"' don't match your environment
Collecting git+https://github.com/eaidova/pyannote-audio.git@hub0.10 (from -r requirements.txt (line 4))
  Cloning https://github.com/eaidova/pyannote-audio.git (to revision hub0.10) to /tmp/pip-req-build-3eu9xm1r
  Running command git clone --filter=blob:none --quiet https://github.com/eaidova/pyannote-audio.git /tmp/pip-req-build-3eu9xm1r
  Running command git checkout -b hub0.10 --track origin/hub0.10
  Switched to a new branch 'hub0.10'
  Branch 'hub0.10' set up to track remote branch 'hub0.10' from 'origin'.
  Resolved https://github.com/eaidova/pyannote-audio.git to commit 09d786a66604dde7f5e2f296effad9ab4176aee4
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... [?25l-  done
[?25hCollecting torchaudio==0.13.1+cpu
  Using cached https://download.pytorch.org/whl/cpu/torchaudio-0.13.1%2Bcpu-cp38-cp38-linux_x86_64.whl (4.0 MB)
Requirement already satisfied: torch==1.13.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torchaudio==0.13.1+cpu->-r requirements.txt (line 3)) (1.13.1+cpu)
Requirement already satisfied: typing-extensions in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch==1.13.1->torchaudio==0.13.1+cpu->-r requirements.txt (line 3)) (4.5.0)
Collecting asteroid-filterbanks<0.5,>=0.4
  Using cached asteroid_filterbanks-0.4.0-py3-none-any.whl (29 kB)
Collecting backports.cached_property
  Using cached backports.cached_property-1.0.2-py3-none-any.whl (6.1 kB)
Collecting einops<0.4.0,>=0.3
  Using cached einops-0.3.2-py3-none-any.whl (25 kB)
Collecting hmmlearn<0.3,>=0.2.7
  Using cached hmmlearn-0.2.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (217 kB)
Requirement already satisfied: huggingface_hub>=0.7 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.13.1)
Requirement already satisfied: networkx<3.0,>=2.6 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.8.2)
Collecting omegaconf<3.0,>=2.1
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting pyannote.core<5.0,>=4.4
  Using cached pyannote.core-4.5-py3-none-any.whl (60 kB)
Collecting pyannote.database<5.0,>=4.1.1
  Using cached pyannote.database-4.1.3-py3-none-any.whl (41 kB)
Collecting pyannote.metrics<4.0,>=3.2
  Using cached pyannote.metrics-3.2.1-py3-none-any.whl (51 kB)
Collecting pyannote.pipeline<3.0,>=2.3
  Using cached pyannote.pipeline-2.3-py3-none-any.whl (30 kB)
Collecting pytorch_lightning<1.7,>=1.5.4
  Using cached pytorch_lightning-1.6.5-py3-none-any.whl (585 kB)
Collecting pytorch_metric_learning<2.0,>=1.0.0
  Using cached pytorch_metric_learning-1.7.3-py3-none-any.whl (112 kB)
Collecting semver<3.0,>=2.10.2
  Using cached semver-2.13.0-py2.py3-none-any.whl (12 kB)
Collecting singledispatchmethod
  Using cached singledispatchmethod-1.0-py2.py3-none-any.whl (4.7 kB)
Collecting soundfile<0.11,>=0.10.2
  Using cached SoundFile-0.10.3.post1-py2.py3-none-any.whl (21 kB)
Collecting speechbrain<0.6,>=0.5.12
  Using cached speechbrain-0.5.13-py3-none-any.whl (498 kB)
Collecting torch_audiomentations>=0.11.0
  Using cached torch_audiomentations-0.11.0-py3-none-any.whl (47 kB)
Requirement already satisfied: torchmetrics<1.0,>=0.6 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.11.3)
Requirement already satisfied: numpy in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from asteroid-filterbanks<0.5,>=0.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.23.4)
Requirement already satisfied: scipy>=0.19 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from hmmlearn<0.3,>=0.2.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.9.1)
Requirement already satisfied: scikit-learn>=0.16 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from hmmlearn<0.3,>=0.2.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.2.2)
Requirement already satisfied: filelock in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.9.0)
Requirement already satisfied: packaging>=20.9 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (23.0)
Requirement already satisfied: pyyaml>=5.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (6.0)
Requirement already satisfied: requests in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.28.1)
Requirement already satisfied: tqdm>=4.42.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (4.65.0)
Collecting antlr4-python3-runtime==4.9.*
  Using cached antlr4_python3_runtime-4.9.3-py3-none-any.whl
Collecting sortedcontainers>=2.0.4
  Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting simplejson>=3.8.1
  Using cached simplejson-3.18.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (135 kB)
Requirement already satisfied: typer[all]>=0.2.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pyannote.database<5.0,>=4.1.1->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.7.0)
Requirement already satisfied: pandas>=0.19 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pyannote.database<5.0,>=4.1.1->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.3.5)
Collecting tabulate>=0.7.7
  Using cached tabulate-0.9.0-py3-none-any.whl (35 kB)
Requirement already satisfied: sympy>=1.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.11.1)
Requirement already satisfied: matplotlib>=2.0.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.5.2)
Collecting docopt>=0.6.2
  Using cached docopt-0.6.2-py2.py3-none-any.whl
Collecting optuna>=1.4
  Using cached optuna-3.1.0-py3-none-any.whl (365 kB)
Collecting pyDeprecate>=0.3.1
  Using cached pyDeprecate-0.3.2-py3-none-any.whl (10 kB)
Requirement already satisfied: tensorboard>=2.2.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.9.1)
Requirement already satisfied: protobuf<=3.20.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.19.6)
Requirement already satisfied: fsspec[http]!=2021.06.0,>=2021.05.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2023.3.0)
Requirement already satisfied: cffi>=1.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from soundfile<0.11,>=0.10.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.15.1)
Requirement already satisfied: joblib in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from speechbrain<0.6,>=0.5.12->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.2.0)
Collecting hyperpyyaml
  Using cached HyperPyYAML-1.1.0-py3-none-any.whl (15 kB)
Requirement already satisfied: sentencepiece in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from speechbrain<0.6,>=0.5.12->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.1.97)
Collecting julius<0.3,>=0.2.3
  Using cached julius-0.2.7-py3-none-any.whl
Collecting torch-pitch-shift>=1.2.2
  Using cached torch_pitch_shift-1.2.2-py3-none-any.whl (5.0 kB)
Requirement already satisfied: librosa>=0.6.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from torch_audiomentations>=0.11.0->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.10.0)
Requirement already satisfied: pycparser in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from cffi>=1.0->soundfile<0.11,>=0.10.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.21)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from fsspec[http]!=2021.06.0,>=2021.05.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.8.4)
Requirement already satisfied: msgpack>=1.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from librosa>=0.6.0->torch_audiomentations>=0.11.0->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.0.5)
Requirement already satisfied: decorator>=4.3.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from librosa>=0.6.0->torch_audiomentations>=0.11.0->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (5.1.1)
Collecting librosa>=0.6.0
  Using cached librosa-0.9.2-py3-none-any.whl (214 kB)
Requirement already satisfied: numba>=0.45.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from librosa>=0.6.0->torch_audiomentations>=0.11.0->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.56.4)
Requirement already satisfied: audioread>=2.1.9 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from librosa>=0.6.0->torch_audiomentations>=0.11.0->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.0.0)
Collecting resampy>=0.2.2
  Using cached resampy-0.4.2-py3-none-any.whl (3.1 MB)
Requirement already satisfied: pooch>=1.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from librosa>=0.6.0->torch_audiomentations>=0.11.0->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.7.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib>=2.0.0->pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.8.2)
Requirement already satisfied: fonttools>=4.22.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib>=2.0.0->pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (4.39.0)
Requirement already satisfied: pillow>=6.2.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib>=2.0.0->pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (9.4.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib>=2.0.0->pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.4.4)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib>=2.0.0->pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from matplotlib>=2.0.0->pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.11.0)
Collecting cmaes>=0.9.1
  Using cached cmaes-0.9.1-py3-none-any.whl (21 kB)
Requirement already satisfied: colorlog in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from optuna>=1.4->pyannote.pipeline<3.0,>=2.3->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (6.7.0)
Collecting alembic>=1.5.0
  Using cached alembic-1.10.2-py3-none-any.whl (212 kB)
Collecting sqlalchemy>=1.3.0
  Using cached SQLAlchemy-2.0.5.post1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB)
Requirement already satisfied: pytz>=2017.3 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pandas>=0.19->pyannote.database<5.0,>=4.1.1->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2022.7.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from scikit-learn>=0.16->hmmlearn<0.3,>=0.2.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.1.0)
Requirement already satisfied: mpmath>=0.19 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from sympy>=1.1->pyannote.metrics<4.0,>=3.2->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.3.0)
Requirement already satisfied: werkzeug>=1.0.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.2.3)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.4.6)
Requirement already satisfied: setuptools>=41.0.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (67.6.0)
Requirement already satisfied: markdown>=2.6.8 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.4.1)
Requirement already satisfied: absl-py>=0.4 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.4.0)
Requirement already satisfied: wheel>=0.26 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.38.4)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.8.1)
Requirement already satisfied: google-auth<3,>=1.6.3 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.16.2)
Requirement already satisfied: grpcio>=1.24.3 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.51.3)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.6.1)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.26.14)
Requirement already satisfied: certifi>=2017.4.17 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests->huggingface_hub>=0.7->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.4)
Collecting primePy>=1.3
  Using cached primePy-1.3-py3-none-any.whl (4.0 kB)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from typer[all]>=0.2.1->pyannote.database<5.0,>=4.1.1->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (8.1.3)
Collecting shellingham<2.0.0,>=1.3.0
  Using cached shellingham-1.5.0.post1-py2.py3-none-any.whl (9.4 kB)
Requirement already satisfied: colorama<0.5.0,>=0.4.3 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from typer[all]>=0.2.1->pyannote.database<5.0,>=4.1.1->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.4.6)
Collecting rich<13.0.0,>=10.11.0
  Using cached rich-12.6.0-py3-none-any.whl (237 kB)
Collecting ruamel.yaml>=0.17.8
  Using cached ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.8.2)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.3.3)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (4.0.2)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (6.0.4)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (22.2.0)
Collecting Mako
  Using cached Mako-1.2.4-py3-none-any.whl (78 kB)
Requirement already satisfied: importlib-metadata in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from alembic>=1.5.0->optuna>=1.4->pyannote.pipeline<3.0,>=2.3->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (6.0.0)
Requirement already satisfied: importlib-resources in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from alembic>=1.5.0->optuna>=1.4->pyannote.pipeline<3.0,>=2.3->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (5.12.0)
Requirement already satisfied: six>=1.9.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.16.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (4.9)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.2.8)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (1.3.1)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from numba>=0.45.1->librosa>=0.6.0->torch_audiomentations>=0.11.0->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.39.1)
Requirement already satisfied: platformdirs>=2.5.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pooch>=1.0->librosa>=0.6.0->torch_audiomentations>=0.11.0->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.1.0)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from rich<13.0.0,>=10.11.0->typer[all]>=0.2.1->pyannote.database<5.0,>=4.1.1->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.14.0)
Collecting commonmark<0.10.0,>=0.9.0
  Using cached commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
Collecting ruamel.yaml.clib>=0.2.6
  Using cached ruamel.yaml.clib-0.2.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (555 kB)
Collecting greenlet!=0.4.17
  Using cached greenlet-2.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (618 kB)
Requirement already satisfied: MarkupSafe>=2.1.1 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from werkzeug>=1.0.1->tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (2.1.2)
Requirement already satisfied: zipp>=0.5 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from importlib-metadata->alembic>=1.5.0->optuna>=1.4->pyannote.pipeline<3.0,>=2.3->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.15.0)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-358/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2.0->pytorch_lightning<1.7,>=1.5.4->pyannote.audio==2.0.1->-r requirements.txt (line 4)) (3.2.2)
Building wheels for collected packages: pyannote.audio
  Building wheel for pyannote.audio (setup.py) ... [?25l-  | done
[?25h  Created wheel for pyannote.audio: filename=pyannote.audio-2.0.1-py2.py3-none-any.whl size=385875 sha256=ffbfdb61f34bbe32f41f7d20bb88127cc2f6284985161419c10b1371c1b87893
  Stored in directory: /tmp/pip-ephem-wheel-cache-izbggpyx/wheels/ae/30/7f/ca7d7c12e5da1c4ccd533bc8c2b6472a829b3bf5b1effab96b
Successfully built pyannote.audio
Installing collected packages: sortedcontainers, singledispatchmethod, primePy, einops, docopt, commonmark, antlr4-python3-runtime, tabulate, simplejson, shellingham, semver, ruamel.yaml.clib, rich, pyDeprecate, omegaconf, Mako, greenlet, cmaes, backports.cached_property, torchaudio, sqlalchemy, soundfile, ruamel.yaml, pyannote.core, julius, asteroid-filterbanks, torch-pitch-shift, resampy, pytorch_metric_learning, pyannote.database, hyperpyyaml, hmmlearn, alembic, speechbrain, pyannote.metrics, optuna, librosa, torch_audiomentations, pytorch_lightning, pyannote.pipeline, pyannote.audio
  Attempting uninstall: rich
    Found existing installation: rich 13.3.2
    Uninstalling rich-13.3.2:
      Successfully uninstalled rich-13.3.2
  Attempting uninstall: soundfile
    Found existing installation: soundfile 0.12.1
    Uninstalling soundfile-0.12.1:
      Successfully uninstalled soundfile-0.12.1
  Attempting uninstall: librosa
    Found existing installation: librosa 0.10.0
    Uninstalling librosa-0.10.0:
      Successfully uninstalled librosa-0.10.0
  Attempting uninstall: pytorch_lightning
    Found existing installation: pytorch-lightning 1.9.4
    Uninstalling pytorch-lightning-1.9.4:
      Successfully uninstalled pytorch-lightning-1.9.4
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ppgan 2.1.0 requires librosa==0.8.1, but you have librosa 0.9.2 which is incompatible.
ppgan 2.1.0 requires numba==0.53.1, but you have numba 0.56.4 which is incompatible.
Successfully installed Mako-1.2.4 alembic-1.10.2 antlr4-python3-runtime-4.9.3 asteroid-filterbanks-0.4.0 backports.cached_property-1.0.2 cmaes-0.9.1 commonmark-0.9.1 docopt-0.6.2 einops-0.3.2 greenlet-2.0.2 hmmlearn-0.2.8 hyperpyyaml-1.1.0 julius-0.2.7 librosa-0.9.2 omegaconf-2.3.0 optuna-3.1.0 primePy-1.3 pyDeprecate-0.3.2 pyannote.audio-2.0.1 pyannote.core-4.5 pyannote.database-4.1.3 pyannote.metrics-3.2.1 pyannote.pipeline-2.3 pytorch_lightning-1.6.5 pytorch_metric_learning-1.7.3 resampy-0.4.2 rich-12.6.0 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 semver-2.13.0 shellingham-1.5.0.post1 simplejson-3.18.3 singledispatchmethod-1.0 sortedcontainers-2.4.0 soundfile-0.10.3.post1 speechbrain-0.5.13 sqlalchemy-2.0.5.post1 tabulate-0.9.0 torch-pitch-shift-1.2.2 torch_audiomentations-0.11.0 torchaudio-0.13.1+cpu

Prepare pipeline

Traditional Speaker Diarization systems can be generalized into a 5 step process: * Feature extraction: transform the raw waveform into audio features like mel spectrogram. * Voice activity detection: identify the chunks in the audio where some voice activity was observed. As we are not interested in silence and noise, we ignore those irrelevant chunks. * Speaker change detection: identify the speaker changepoints in the conversation present in the audio. * Speech turn representation: encode each subchunk by creating feature representations. * Speech turn clustering: cluster the subchunks based on their vector representation. Different clustering algorithms could be applied based on the availability of cluster count (k) and the embedding process of the previous step.

The final output will be the clusters of different subchunks from the audio stream. Each cluster can be given an anonymous identifier (speaker_a, ..) and then it can be mapped with the audio stream to create the speaker-aware audio timeline.

On the diagram you can see a typical speaker diarization pipeline:

diarization_pipeline

diarization_pipeline

From a simplified point of view, speaker diarization is a combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments based on speaker characteristics.

For instantiating speaker diarization pipeline with pyannote.audio library, we should import Pipeline class and use from_pretrained method by providing a path to the directory with pipeline configuration or identification from HuggingFace hub.

Note: This tutorial uses a non-official version of model philschmid/pyannote-speaker-diarization-endpoint, provided only for demo purposes. The original model (pyannote/speaker-diarization) requires you to accept the model license before downloading or using its weights, visit the pyannote/speaker-diarization to read accept the license before you proceed. To use this model, you must be a registered user in 🤗 Hugging Face Hub. You will need to use an access token for the code below to run. For more information on access tokens, please refer to this section of the documentation. You can log in on HuggingFace Hub in the notebook environment using the following code:

## login to huggingfacehub to get access to pretrained model
from huggingface_hub import notebook_login, whoami

try:
    whoami()
    print('Authorization token already provided')
except OSError:
    notebook_login()
from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained("philschmid/pyannote-speaker-diarization-endpoint")
2023-03-09 23:19:46.810884: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.

Load test audio file

import sys

sys.path.append("../utils")

from notebook_utils import download_file

test_data_url = "https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.wav"

sample_file = 'sample.wav'
download_file(test_data_url, 'sample.wav')
AUDIO_FILE = {'uri': sample_file.replace('.wav', ''), 'audio': sample_file}
sample.wav:   0%|          | 0.00/938k [00:00<?, ?B/s]
import librosa
import matplotlib.pyplot as plt
import librosa.display
import IPython.display as ipd


audio, sr = librosa.load(sample_file)
plt.figure(figsize=(14, 5))
librosa.display.waveshow(audio, sr=sr)

ipd.Audio(sample_file)