Document Entity Extraction with OpenVINO

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS. To run without installing anything, click the launch binder button.

Binder Github

This demo shows Named Entity Recognition (NER) from a text with OpenVINO. It uses a small BERT-large-like model distilled and quantized to INT8 on SQuAD v1.1 training set from a larger BERT-large model. The model comes from Open Model Zoo. Final part of this notebook includes live inference results from the inputs and templates. The notebook shows how to create the following pipeline:


import time
import json

import numpy as np
import tokens_bert as tokens

from openvino.runtime import Core
from openvino.runtime import Dimension

The model

Download the model

Use omz_downloader, which is a command-line tool from the openvino-dev package. It automatically creates a directory structure and downloads the selected model. If the model is already downloaded, this step is skipped.

You can download and use any of the following models: bert-large-uncased-whole-word-masking-squad-0001, bert-large-uncased-whole-word-masking-squad-int8-0001, bert-small-uncased-whole-word-masking-squad-0001, bert-small-uncased-whole-word-masking-squad-0002, bert-small-uncased-whole-word-masking-squad-int8-0002, just change the model name in the code below. Any of these models are already converted to OpenVINO Intermediate Representation (OpenVINO IR), so there is no need to use omz_converter.

# A directory where the model will be downloaded.
base_model_dir = "model"

# The desired precision.
precision = "FP16-INT8"

# A model name as named in Open Model Zoo.
model_name = "bert-small-uncased-whole-word-masking-squad-int8-0002"

model_path = f"model/intel/{model_name}/{precision}/{model_name}.xml"
model_weights_path = f"model/intel/{model_name}/{precision}/{model_name}.bin"

download_command = f"omz_downloader " \
                   f"--name {model_name} " \
                   f"--precision {precision} " \
                   f"--output_dir {base_model_dir} " \
                   f"--cache_dir {base_model_dir}"
! $download_command
################|| Downloading bert-small-uncased-whole-word-masking-squad-int8-0002 ||################

========== Downloading model/intel/bert-small-uncased-whole-word-masking-squad-int8-0002/vocab.txt

========== Downloading model/intel/bert-small-uncased-whole-word-masking-squad-int8-0002/FP16-INT8/bert-small-uncased-whole-word-masking-squad-int8-0002.xml

========== Downloading model/intel/bert-small-uncased-whole-word-masking-squad-int8-0002/FP16-INT8/bert-small-uncased-whole-word-masking-squad-int8-0002.bin

Load the model for Entity Extraction with Dynamic Shape

Input to entity extraction model refers to text with different content sizes such as dynamic input shapes. Hence:

  1. Input dimension with dynamic input shapes needs to be specified before loading entity extraction model.

  2. A dynamic shape is specified by assigning -1 to the input dimension or by setting the upper bound of the input dimension using, for example, Dimension(1,384)

In this notebook, the upper bound of the dynamic input and longest input text allowed is 384, that is 380 tokens for content + 1 for entity + 3 special (separation) tokens. It is highly recommended to assign dynamic shape, using Dimension(, upper bound) (in this case, Dimension(1, 384) ) so it will use memory more efficiently with OpenVINO 2022.1

# Initialize OpenVINO Runtime.
ie_core = Core()
# Read the network and corresponding weights from the file.
model = ie_core.read_model(model=model_path, weights=model_weights_path)

# Assign dynamic shapes to every input layer on the last dimension.
for input_layer in model.inputs:
    input_shape = input_layer.partial_shape
    input_shape[1] = Dimension(1, 384)
    model.reshape({input_layer: input_shape})

# Compile the model for CPU.
compiled_model = ie_core.compile_model(model=model, device_name="CPU")

# Get input names of nodes.
input_keys = list(compiled_model.inputs)

Input keys are the names of the network input nodes. In the case of the BERT-large-like model, there are 4 inputs.

[i.any_name for i in input_keys]
['input_ids', 'attention_mask', 'token_type_ids', 'position_ids']


NLP models usually take a list of tokens as a standard input. A token is a single word converted to some integer. To provide the proper input, you need the vocabulary for such mapping. You also define some special tokens like separators and a function to load the content. The content is loaded from a simple text.

# A path to a vocabulary file.
vocab_file_path = "data/vocab.txt"

# Create a dictionary with words and their indices.
vocab = tokens.load_vocab_file(vocab_file_path)

# Define special tokens.
cls_token = vocab["[CLS]"]
sep_token = vocab["[SEP]"]

# Set a confidence score threshold.
confidence_threshold = 0.4


The main input (input_ids) of used BERT model consists of two parts: entity tokens and context tokens, separated by some special tokens. You also need to provide: - attention_mask - a sequence of integer values representing the mask of valid values in the input, - token_type_ids- a sequence of integer values representing the segmentation of input_ids into entity and context, - position_ids- a sequence of integer values from 0 to length of input, extended by separation tokens, representing the position index for each input token.

For more information, refer to the Input section of BERT model documentation.

# A generator of a sequence of inputs.
def prepare_input(entity_tokens, context_tokens):
    input_ids = [cls_token] + entity_tokens + [sep_token] + \
        context_tokens + [sep_token]
    # 1 for any index.
    attention_mask = [1] * len(input_ids)
    # 0 for entity tokens, 1 for context part.
    token_type_ids = [0] * (len(entity_tokens) + 2) + \
        [1] * (len(context_tokens) + 1)

    # Create an input to feed the model.
    input_dict = {
        "input_ids": np.array([input_ids], dtype=np.int32),
        "attention_mask": np.array([attention_mask], dtype=np.int32),
        "token_type_ids": np.array([token_type_ids], dtype=np.int32),

    # Some models require additional position_ids.
    if "position_ids" in [i_key.any_name for i_key in input_keys]:
        position_ids = np.arange(len(input_ids))
        input_dict["position_ids"] = np.array([position_ids], dtype=np.int32)

    return input_dict


The results from the network are raw (logits). Use the softmax function to get the probability distribution. Then, find the best entity extraction in the current part of the context (the highest score) and return the score and the context range for the extracted entity.

def postprocess(output_start, output_end, entity_tokens,
                context_tokens_start_end, input_size):

    def get_score(logits):
        out = np.exp(logits)
        return out / out.sum(axis=-1)

    # Get start-end scores for the context.
    score_start = get_score(output_start)
    score_end = get_score(output_end)

    # Index of the first context token in a tensor.
    context_start_idx = len(entity_tokens) + 2
    # Index of last+1 context token in a tensor.
    context_end_idx = input_size - 1

    # Find the product of all start-end combinations to find the best one.
    max_score, max_start, max_end = find_best_entity_window(
        start_score=score_start, end_score=score_end,
        context_start_idx=context_start_idx, context_end_idx=context_end_idx

    # Convert to context text start-end index.
    max_start = context_tokens_start_end[max_start][0]
    max_end = context_tokens_start_end[max_end][1]

    return max_score, max_start, max_end

def find_best_entity_window(start_score, end_score,
                            context_start_idx, context_end_idx):
    context_len = context_end_idx - context_start_idx
    score_mat = np.matmul(
            (context_len, 1)),
            (1, context_len)),
    # reset candidates with end before start
    score_mat = np.triu(score_mat)
    # reset long candidates (>16 words)
    score_mat = np.tril(score_mat, 16)
    # find the best start-end pair
    max_s, max_e = divmod(score_mat.flatten().argmax(), score_mat.shape[1])
    max_score = score_mat[max_s, max_e]

    return max_score, max_s, max_e

First, create a list of tokens from the context and the entity. Then, find the best extracted entity by trying different parts of the context. The best extracted entity should come with the highest score.

def get_best_entity(entity, context, vocab):
    # Convert the context string to tokens.
    context_tokens, context_tokens_end = tokens.text_to_tokens(
        text=context.lower(), vocab=vocab)
    # Convert the entity string to tokens.
    entity_tokens, _ = tokens.text_to_tokens(text=entity.lower(), vocab=vocab)

    network_input = prepare_input(entity_tokens, context_tokens)
    input_size = len(context_tokens) + len(entity_tokens) + 3

    # OpenVINO inference.
    output_start_key = compiled_model.output("output_s")
    output_end_key = compiled_model.output("output_e")
    result = compiled_model(network_input)

    # Postprocess the result getting the score and context range for the answer.
    score_start_end = postprocess(output_start=result[output_start_key][0],

    # Return the part of the context, which is already an answer.
    return context[score_start_end[1]:score_start_end[2]], score_start_end[0]

Set the Entity Recognition Template

Only the entities which have prediction confidence score more than 0.4 will be captured in the final output. This can be changed by setting the confidence_threshold variable above. Sample entities supported by the application natural entity recognition template. - building, company, persons, city, state, height, floor and address

template = ["building", "company", "persons", "city",
            "state", "height", "floor", "address"]
def run_analyze_entities(context):
    print(f"Context: {context}\n", flush=True)

    if len(context) == 0:
        print("Error: Empty context or outside paragraphs")

    if len(context) > 380:
        print("Error: The context is too long for this particular model. "
              "Try with context shorter than 380 words.")

    # Measure the processing time.
    start_time = time.perf_counter()
    extract = []
    for field in template:
        entity_to_find = field + "?"
        entity, score = get_best_entity(entity=entity_to_find,
        if score >= confidence_threshold:
            extract.append({"Entity": entity, "Type": field,
                            "Score": f"{score:.2f}"})
    end_time = time.perf_counter()
    res = {"Extraction": extract, "Time": f"{end_time - start_time:.2f}s"}
    print("\nJSON Output:")
    print(json.dumps(res, sort_keys=False, indent=4))

Run on Simple Text

Sample 1

Change sources to your own text, supported by the template, to perform entity extraction. It supports only one input text at a time. Usually, you need to wait a few seconds for the entities to be extracted, but the longer the context, the longer the waiting time. The model is very limited and sensitive for the input and predefined template. The answer can depend on whether it is supported by the template or not. The model will try to extract entities even if they are not supported by the template. In such cases, random results will appear.

Sample source: Intel on Wikipedia

source_text = "Intel Corporation is an American multinational and technology" \
    " company headquartered in Santa Clara, California."
Context: Intel Corporation is an American multinational and technology company headquartered in Santa Clara, California.

JSON Output:
    "Extraction": [
            "Entity": "Intel Corporation",
            "Type": "company",
            "Score": "0.51"
            "Entity": "Intel",
            "Type": "persons",
            "Score": "0.45"
            "Entity": "Santa Clara",
            "Type": "city",
            "Score": "0.96"
            "Entity": "California",
            "Type": "state",
            "Score": "0.99"
    "Time": "0.07s"

Sample 2

Sample source: Intel on Wikipedia

source_text = "Intel was founded in Mountain View, California, " \
    "in 1968 by Gordon E. Moore, a chemist, and Robert Noyce, " \
    "a physicist and co-inventor of the integrated circuit."
Context: Intel was founded in Mountain View, California, in 1968 by Gordon E. Moore, a chemist, and Robert Noyce, a physicist and co-inventor of the integrated circuit.

JSON Output:
    "Extraction": [
            "Entity": "Intel",
            "Type": "company",
            "Score": "0.98"
            "Entity": "Gordon E. Moore, a chemist, and Robert Noyce",
            "Type": "persons",
            "Score": "0.83"
            "Entity": "Mountain View",
            "Type": "city",
            "Score": "0.79"
            "Entity": "California",
            "Type": "state",
            "Score": "0.98"
    "Time": "0.05s"

Sample 3

Sample source: a converted paragraph (from here)

source_text = "The Robert Noyce Building in Santa Clara, California, " \
    "is the headquarters for Intel Corporation. It was constructed in 1992 " \
    "and is located at 2200 Mission College Boulevard - 95054. It has an " \
    "estimated height of 22.20 meters and 6 floors above ground."
Context: The Robert Noyce Building in Santa Clara, California, is the headquarters for Intel Corporation. It was constructed in 1992 and is located at 2200 Mission College Boulevard - 95054. It has an estimated height of 22.20 meters and 6 floors above ground.

JSON Output:
    "Extraction": [
            "Entity": "Robert Noyce Building",
            "Type": "building",
            "Score": "0.46"
            "Entity": "Intel Corporation",
            "Type": "company",
            "Score": "0.62"
            "Entity": "Santa Clara",
            "Type": "city",
            "Score": "0.50"
            "Entity": "California",
            "Type": "state",
            "Score": "0.99"
            "Entity": "22.20 meters",
            "Type": "height",
            "Score": "0.72"
            "Entity": "2200 Mission College Boulevard - 95054",
            "Type": "address",
            "Score": "0.86"
    "Time": "0.05s"