Document Entity Extraction with OpenVINO

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS. To run without installing anything, click the launch binder button.

Binder Github

This demo shows named entity recognition from text with OpenVINO. We use small BERT-large-like model distilled and quantized to INT8 on SQuAD v1.1 training set from larger BERT-large model. The model comes from Open Model Zoo. At the bottom of this notebook, you will see live inference results from your inputs and templates. In the notebook we show how to create the following pipeline:


import time
import json

import numpy as np
import tokens_bert as tokens

from openvino.runtime import Core
from openvino.runtime import Dimension

The model

Download the model

We use omz_downloader, which is a command-line tool from the openvino-dev package. omz_downloader automatically creates a directory structure and downloads the selected model. If the model is already downloaded, this step is skipped.

You can download and use any of the following models: bert-large-uncased-whole-word-masking-squad-0001, bert-large-uncased-whole-word-masking-squad-int8-0001, bert-small-uncased-whole-word-masking-squad-0001, bert-small-uncased-whole-word-masking-squad-0002, bert-small-uncased-whole-word-masking-squad-int8-0002, just change the model name below. Any of these models are already converted to OpenVINO Intermediate Representation (IR), so there is no need to use omz_converter.

# directory where model will be downloaded
base_model_dir = "model"

# desired precision
precision = "FP16-INT8"

# model name as named in Open Model Zoo
model_name = "bert-small-uncased-whole-word-masking-squad-int8-0002"

model_path = f"model/intel/{model_name}/{precision}/{model_name}.xml"
model_weights_path = f"model/intel/{model_name}/{precision}/{model_name}.bin"

download_command = f"omz_downloader " \
                   f"--name {model_name} " \
                   f"--precision {precision} " \
                   f"--output_dir {base_model_dir} " \
                   f"--cache_dir {base_model_dir}"
! $download_command
################|| Downloading bert-small-uncased-whole-word-masking-squad-int8-0002 ||################

========== Downloading model/intel/bert-small-uncased-whole-word-masking-squad-int8-0002/vocab.txt

========== Downloading model/intel/bert-small-uncased-whole-word-masking-squad-int8-0002/FP16-INT8/bert-small-uncased-whole-word-masking-squad-int8-0002.xml

========== Downloading model/intel/bert-small-uncased-whole-word-masking-squad-int8-0002/FP16-INT8/bert-small-uncased-whole-word-masking-squad-int8-0002.bin

Load the model for Entity Extraction with Dynamic Shape

Input to entity extraction model refers to text with different content sizes, i.e, dynamic into shapes. Hence:

  1. Input dimension with dynamic input shapes needs to be specified before loading entity extraction model

  2. Dynamic shape is specified by assigning -1 to the input dimension or by setting the upper bound of the input dimension using, for example, Dimension(1,384)

In scope of this notebook, as we know the upper bound of the dynamic input and longest input text allowed is 384 i.e. 380 tokens for content + 1 for entity + 3 special (separation) tokens, it is more recommended to assign dynamic shape using Dimension(, upper bound) i.e. Dimension(1, 384) so it’ll use memory more efficiently with OpenVINO 2022.1

# initialize inference engine
ie_core = Core()
# read the network and corresponding weights from file
model = ie_core.read_model(model=model_path, weights=model_weights_path)

# assign dynamic shapes to every input layer on the last dimension
for input_layer in model.inputs:
    input_shape = input_layer.partial_shape
    input_shape[1] = Dimension(1, 384)
    model.reshape({input_layer: input_shape})

# compile the model for the CPU
compiled_model = ie_core.compile_model(model=model, device_name="CPU")

# get input names of nodes
input_keys = list(compiled_model.inputs)

Input keys are the names of the network input nodes. In the case of the BERT-large-like model, we have four inputs.

[i.any_name for i in input_keys]
['input_ids', 'attention_mask', 'token_type_ids', 'position_ids']


NLP models usually take a list of tokens as standard input. A token is a single word converted to some integer. To provide the proper input, we need the vocabulary for such mapping. We also define some special tokens like separators and a function to load the content. Content is loaded from simple text.

# path to vocabulary file
vocab_file_path = "data/vocab.txt"

# create dictionary with words and their indices
vocab = tokens.load_vocab_file(vocab_file_path)

# define special tokens
cls_token = vocab["[CLS]"]
sep_token = vocab["[SEP]"]

# set a confidence score threshold
confidence_threshold = 0.4


The main input (input_ids) to used BERT model consist of two parts: entity tokens and context tokens separated by some special tokens. We also need to provide: attention_mask, which is a sequence of integer values representing the mask of valid values in the input; token_type_ids, which is a sequence of integer values representing the segmentation of the input_ids into entity and context; position_ids, which is a sequence of integer values from 0 to length of input, extended by separation tokens, representing the position index for each input token. To know more about input, please read this.

# generator of a sequence of inputs
def prepare_input(entity_tokens, context_tokens):
    input_ids = [cls_token] + entity_tokens + [sep_token] + \
        context_tokens + [sep_token]
    # 1 for any index
    attention_mask = [1] * len(input_ids)
    # 0 for entity tokens, 1 for context part
    token_type_ids = [0] * (len(entity_tokens) + 2) + \
        [1] * (len(context_tokens) + 1)

    # create input to feed the model
    input_dict = {
        "input_ids": np.array([input_ids], dtype=np.int32),
        "attention_mask": np.array([attention_mask], dtype=np.int32),
        "token_type_ids": np.array([token_type_ids], dtype=np.int32),

    # some models require additional position_ids
    if "position_ids" in [i_key.any_name for i_key in input_keys]:
        position_ids = np.arange(len(input_ids))
        input_dict["position_ids"] = np.array([position_ids], dtype=np.int32)

    return input_dict


The results from the network are raw (logits). We need to use the softmax function to get the probability distribution. Then, we are looking for the best entity extraction in the current part of the context (the highest score) and we return the score and the context range for the extracted entity.

def postprocess(output_start, output_end, entity_tokens,
                context_tokens_start_end, input_size):

    def get_score(logits):
        out = np.exp(logits)
        return out / out.sum(axis=-1)

    # get start-end scores for context
    score_start = get_score(output_start)
    score_end = get_score(output_end)

    # index of first context token in tensor
    context_start_idx = len(entity_tokens) + 2
    # index of last+1 context token in tensor
    context_end_idx = input_size - 1

    # find product of all start-end combinations to find the best one
    max_score, max_start, max_end = find_best_entity_window(
        start_score=score_start, end_score=score_end,
        context_start_idx=context_start_idx, context_end_idx=context_end_idx

    # convert to context text start-end index
    max_start = context_tokens_start_end[max_start][0]
    max_end = context_tokens_start_end[max_end][1]

    return max_score, max_start, max_end

def find_best_entity_window(start_score, end_score,
                            context_start_idx, context_end_idx):
    context_len = context_end_idx - context_start_idx
    score_mat = np.matmul(
            (context_len, 1)),
            (1, context_len)),
    # reset candidates with end before start
    score_mat = np.triu(score_mat)
    # reset long candidates (>16 words)
    score_mat = np.tril(score_mat, 16)
    # find the best start-end pair
    max_s, max_e = divmod(score_mat.flatten().argmax(), score_mat.shape[1])
    max_score = score_mat[max_s, max_e]

    return max_score, max_s, max_e

Firstly, we need to create a list of tokens from the context and the entity. Then, we are looking for the best extracted entity by trying different parts of the context. The best extracted entity should come with the highest score.

def get_best_entity(entity, context, vocab):
    # convert context string to tokens
    context_tokens, context_tokens_end = tokens.text_to_tokens(
        text=context.lower(), vocab=vocab)
    # convert entity string to tokens
    entity_tokens, _ = tokens.text_to_tokens(text=entity.lower(), vocab=vocab)

    network_input = prepare_input(entity_tokens, context_tokens)
    input_size = len(context_tokens) + len(entity_tokens) + 3

    # openvino inference
    output_start_key = compiled_model.output("output_s")
    output_end_key = compiled_model.output("output_e")
    result = compiled_model(network_input)

    # postprocess the result getting the score and context range for the answer
    score_start_end = postprocess(output_start=result[output_start_key][0],

    # return the part of the context, which is already an answer
    return context[score_start_end[1]:score_start_end[2]], score_start_end[0]

Set the Entity Recognition Template

Only the entities which have prediction confidence score more than 0.4 will be captured in the final output. This can be set by setting the variable confidence_threshold above. Sample entities supported by the application natural entity recognition template. - building, company, persons, city, state, height, floor and address

template = ["building", "company", "persons", "city",
            "state", "height", "floor", "address"]
def run_analyze_entities(context):
    print(f"Context: {context}\n", flush=True)

    if len(context) == 0:
        print("Error: Empty context or outside paragraphs")

    if len(context) > 380:
        print("Error: The context is too long for this particular model. "
              "Try with context shorter than 380 words.")

    # measure processing time
    start_time = time.perf_counter()
    extract = []
    for field in template:
        entity_to_find = field + "?"
        entity, score = get_best_entity(entity=entity_to_find,
        if score >= confidence_threshold:
            extract.append({"Entity": entity, "Type": field,
                            "Score": f"{score:.2f}"})
    end_time = time.perf_counter()
    res = {"Extraction": extract, "Time": f"{end_time - start_time:.2f}s"}
    print("\nJSON Output:")
    print(json.dumps(res, sort_keys=False, indent=4))

Run on Simple Text

Sample 1

Change sources to your own text, supported by the template, to perform entity extraction. It supports only one input text at a time. Usually, you need to wait a few seconds for the entities to be extracted, but longer the context, longer would be the waiting time. The model is very limited and sensitive for the input and predefined template. The answer can depend on whether it is supported by the template or not. The model will try to extract entities even if not supported by the template, so in that case, you can see random results.

Sample source: Intel - Wikipedia (from here)

source_text = "Intel Corporation is an American multinational and technology" \
    " company headquartered in Santa Clara, California."
Context: Intel Corporation is an American multinational and technology company headquartered in Santa Clara, California.

JSON Output:
    "Extraction": [
            "Entity": "Intel Corporation",
            "Type": "company",
            "Score": "0.51"
            "Entity": "Intel",
            "Type": "persons",
            "Score": "0.45"
            "Entity": "Santa Clara",
            "Type": "city",
            "Score": "0.96"
            "Entity": "California",
            "Type": "state",
            "Score": "0.99"
    "Time": "0.07s"

Sample 2

Sample source: Intel - Wikipedia (from here)

source_text = "Intel was founded in Mountain View, California, " \
    "in 1968 by Gordon E. Moore, a chemist, and Robert Noyce, " \
    "a physicist and co-inventor of the integrated circuit."
Context: Intel was founded in Mountain View, California, in 1968 by Gordon E. Moore, a chemist, and Robert Noyce, a physicist and co-inventor of the integrated circuit.

JSON Output:
    "Extraction": [
            "Entity": "Intel",
            "Type": "company",
            "Score": "0.98"
            "Entity": "Gordon E. Moore, a chemist, and Robert Noyce",
            "Type": "persons",
            "Score": "0.83"
            "Entity": "Mountain View",
            "Type": "city",
            "Score": "0.79"
            "Entity": "California",
            "Type": "state",
            "Score": "0.98"
    "Time": "0.05s"

Sample 3

Sample source: Converted Paragraph (from here)

source_text = "The Robert Noyce Building in Santa Clara, California, " \
    "is the headquarters for Intel Corporation. It was constructed in 1992 " \
    "and is located at 2200 Mission College Boulevard - 95054. It has an " \
    "estimated height of 22.20 meters and 6 floors above ground."
Context: The Robert Noyce Building in Santa Clara, California, is the headquarters for Intel Corporation. It was constructed in 1992 and is located at 2200 Mission College Boulevard - 95054. It has an estimated height of 22.20 meters and 6 floors above ground.

JSON Output:
    "Extraction": [
            "Entity": "Robert Noyce Building",
            "Type": "building",
            "Score": "0.46"
            "Entity": "Intel Corporation",
            "Type": "company",
            "Score": "0.62"
            "Entity": "Santa Clara",
            "Type": "city",
            "Score": "0.50"
            "Entity": "California",
            "Type": "state",
            "Score": "0.99"
            "Entity": "22.20 meters",
            "Type": "height",
            "Score": "0.72"
            "Entity": "2200 Mission College Boulevard - 95054",
            "Type": "address",
            "Score": "0.86"
    "Time": "0.05s"