Adapters¶
Adapter is a class for converting raw network infer output to specific representation format which is suitable for the further postprocessors work and the metrics calculation. Adapters may have parameters available for configuration. The adapter and its parameters, if necessary, are set through the configuration file.
Describing how to set adapter in Configuration File¶
Adapters can be provided in launchers
section of configuration file for each launcher to use specific adapter.
You can use 2 ways to set adapter for topology:
Define adapter as a string.
adapter: classification
Define adapter as a dictionary, using
type:
for setting adapter name. This approach gives opportunity to set additional parameters for adapter if it is required.
adapter:
type: reid
grn_workaround: False
Supported Adapters¶
AccuracyChecker supports following set of adapters:
classification
- converting output of classification model toClassificationPrediction
representation.argmax_output
- identifier that model output is ArgMax layer.block
- process whole batch as a single data block.classification_output
- target output layer name.fixed_output
- inblock
mode enables gathering data from a part of full layer outputfixed_output_index
- index into layer output array to gather data fromlabel_as_array
- produce ClassificationPrediction’s label as array
segmentation
- converting output of semantic segmentation model toSeegmentationPrediction
representation.make_argmax
- allows applying argmax operation to output values.
segmentation_one_class
- converting output of semantic segmentation model toSeegmentationPrediction
representation. It is suitable for cases when model’s output is probability of belong each pixel to foreground class.threshold
- minimum probability threshold for valid class belonging.
anomaly_segmentation
- converting output of anomaly segmentation model toAnomalySeegmentationPrediction
representation.threshold
- minimum probability threshold for valid class belonging.
tiny_yolo_v1
- converting output of Tiny YOLO v1 model toDetectionPrediction
representation.reid
- converting output of reidentification model toReIdentificationPrediction
representation.grn_workaround
- enabling processing output with adding Global Region Normalization layer (Optional, defaultTrue
).joining_method
- method used to join embeddings (optional, supported methods aresum
andconcatenation
, default -sum
).target_out
- target output layer name (Optional, if not provided first in the model will be used).keep_shape
- allow keeping initial shape for predicted embedding (Optional, defaultFalse
, it means that model output will be flattenized).
yolo_v2
- converting output of YOLO v2 family models toDetectionPrediction
representation.classes
- number of detection classes (default 20).anchors
- anchor values provided as comma-separated list or one of precomputed:yolo_v2
-[1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071]
,tiny_yolo_v2
-[1.08, 1.19, 3.42, 4.41, 6.63, 11.38, 9.42, 5.11, 16.62, 10.52]
coords
- number of bbox coordinates (default 4).num
- num parameter from DarkNet configuration file (default 5).cells
- number of cells across width and height (default 13).raw_output
- enabling additional preprocessing for raw YOLO output format (defaultFalse
).output_format
- setting output layer format:BHW
- boxes first (default, also default for generated IRs).HWB
- boxes last. Applicable only if network output not 3D (4D with batch) tensor.
yolo_v3
- converting output of YOLO v3 family models toDetectionPrediction
representation.classes
- number of detection classes (default 80).anchors
- anchor values provided as comma-separated list or precomputed:yolo_v3
-[10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0, 326.0]
tiny_yolo_v3
-[10.0, 14.0, 23.0, 27.0, 37.0, 58.0, 81.0, 82.0, 135.0, 169.0, 344.0, 319.0]
coords
- number of bbox coordinates (default 4).num
- num parameter from DarkNet configuration file (default 3).anchor_mask
- mask for used anchors for each output layer (Optional, if not provided default way for selecting anchors will be used.)threshold
- minimal objectness score value for valid detections (default 0.001).outputs
- the list of output layers names.raw_output
- enabling additional preprocessing for raw YOLO output format (defaultFalse
).output_format
- setting output layer format - boxes first (BHW
)(default, also default for generated IRs), boxes last (HWB
). Applicable only if network output not 3D (4D with batch) tensor.cells
- sets grid size for each layer, accordingoutputs
filed. Works only withdo_reshape=True
or when output tensor dimensions not equal 3.do_reshape
- forces reshape output tensor to [B,Cy,Cx] or [Cy,Cx,B] format, depending onoutput_format
value ([B,Cy,Cx] by default). You may need to specifycells
value.transpose
- transpose output tensor to specified format (optional).multiple_labels
- allow multiple labels for detection objects (defaultFalse
).
yolo_v3_onnx
- converting output of ONNX Yolo V3 model toDetectionPrediction
.boxes_out
- the name of layer with bounding boxesscores_out
- the name of output layer with detection scores for each class and box pair.indices_out
- the name of output layer with indices triplets (class_id, score_id, bbox_id).
yolo_v3_tf2
- converting output of TensorFlow 2 Yolo V3 with embedded box decoding toDetectionPrediction
.outputs
- the list of output layers names.score_threshold
- minimal accepted score for valid boxes (Optional, default 0).
yolo_v5
- converting output of YOLO v5 family models toDetectionPrediction
representation. The parameters are the same as for theyolo_v3
models.yolof
- converting output of YOLOF model toDetectionPrediction
representation. The parameters are the same as for theyolo_v3
models.yolor
- converting output of YOLOR model toDetectionPrediction
representation.output_name
- name of output layer.threshold
- minimal objectness score value for valid detections (Optional, default 0.001).num
- num parameter from DarkNet configuration file (Optional, default 5).
yolox
- converting output of YOLOX model toDetectionPrediction
representation.output_name
- name of output layer (Optional).threshold
- minimal objectness score value for valid detections (Optional, default 0.001).num
- num parameter from DarkNet configuration file (Optional, default 5).
yolo_v8_detection
- converting output of YOLO v8 family pretrained for object detection toDetectionPrediction
.conf_threshold
- minimal confidence for filtering valid detections (Optional, default 0.25).multi_label
- allow to use multiple labels for the same box coordinates (Optional, default True).
lpr
- converting output of license plate recognition model toCharacterRecognitionPrediction
representation.aocr
- converting output of attention-ocr model toCharacterRecognitionPrediction
.output_blob
- name of output layer with predicted labels or string (Optional, if not provided, first founded output will be used).labels
- optional, list of supported tokens for decoding raw labels (Optional, default configuration is ascii charmap, this parameter ignored if you have decoding part in the model).eos_index
- index of end of string token in labels. (Optional, default 2, ignored if you have decoding part in the model).to_lower_case
- allow converting decoded characters to lower case (Optional, default isTrue
).
ppocr
- converting PaddlePaddle CRNN-like model output toCharacterRecognitionPrediction
.vocabulary_file
- file with recognition symbols for decoding.remove_duplicates
- allow removing of duplicated symbols (Optional, default value -True
).
ssd
- converting output of SSD model toDetectionPrediction
representation.ssd_mxnet
- converting output of SSD-based models from MXNet framework toDetectionPrediction
representation.pytorch_ssd_decoder
- converts output of SSD model from PyTorch without embedded decoder.scores_out
- name of output layer with bounding boxes scores.boxes_out
- name of output layer with bounding boxes coordinates.confidence_threshold
- lower bound for valid boxes scores (optional, default 0.05).nms_threshold
- overlap threshold for NMS (optional, default 0.5).keep_top_k
- maximal number of boxes which should be kept (optional, default 200).feat_size
- features size in format [feature_width, feature_height], …do_softmax
- boolean flag which says should be softmax applied to detection scores or not. (Optional, default True)
ssd_onnx
- converting output of SSD-based model from PyTorch with NonMaxSuppression layer.labels_out
- name of output layer with labels or regular expression for it searching.scores_out
- name of output layer with scores or regular expression for it searching. Optional, can be not provided, if your model has concatenation of scores with box coordinates.bboxes_out
- name of output layer with bboxes or regular expression for it searching.
ssd_tf
- converting output of SSD-based model from TensorFlow framework toDetectionPrediction
representation.labels_out
- name of output layer with labels or regular expression for it searching.scores_out
- name of output layer with scores or regular expression for it searching.bboxes_out
- name of output layer with bboxes or regular expression for it searching.
tf_object_detection
- converting output of detection models from TensorFlow object detection API toDetectionPrediction
.classes_out
- name of output layer with predicted classes.boxes_out
- name of output layer with predicted boxes coordinates in format [y0, x0, y1, x1].scores_out
- name of output layer with detection scores.num_detections_out
- name of output layer which contains the number of valid detections.
faster_rcnn_onnx
- converts output of ONNX Faster RCNN model toDetectionPrediction
labels_out
- name of output layer with labels, optional if labels concatenated with boxes and scores (only boxes output provided and it has shape [N, 6]).scores_out
- name of output layer with scores, optional if scores concatenated with boxes (boxes output has shape [N, 5]).bboxes_out
- name of output layer with bboxes.
retinanet
- converting output of RetinaNet-based model.loc_out
- name of output layer with bounding box deltas.class_out
- name of output layer with classification probabilities.
retinanet_multihead
- converting output of RetinaNet model with multiple level outputs.boxes_outputs
- list of outputs with boxes.class_outputs
- list of outputs with class probabilities. Important note: the number of boxes outputs and class outputs should be equal.ratios
- the list of ratios for anchor generation (Optional, default [1.0, 2.0, 0.5]).pre_nms_top_k
- keep top k boxes before NMS applied (Optional, default 1000).post_nms_top_k
- final number of detections after NMS applied (Optional, default 100).nms_threshold
- threshold for NMS (Optional, default 0.5).min_conf
- minimal confidence threshold for detections (Optional, default 0.05).
retinanet_tf2
- converting output of RetinaNet-based model from TensorFlow 2 official implementation.boxes_outputs
- list of outputs with boxes.class_outputs
- list of outputs with class probabilities. Important note: the number of boxes outputs and class outputs should be equal.aspect_ratios
- the list of aspect ratios for anchor generation (Optional, default [1.0, 2.0, 0.5]).min_level
- minimal pyramid level (Optional, default 3).max_level
- maximal pyramid level (Optional, default 7).num_scales
- number of anchor scales (Optional, default 3).anchor_size
- size of anchor box (Optional, default 4).pre_nms_top_k
- keep top k boxes before NMS applied (Optional, default 5000).total_size
- final number of detections after NMS applied (Optional, default 100).nms_threshold
- threshold for NMS (Optional, default 0.5).score_threshold
- minimal confidence threshold for detections (Optional, default 0.05).
rfcn_class_agnostic
- convert output of Caffe RFCN model with agnostic bounding box regression approach.cls_out
- the name of output layer with detected probabilities for each class. The layer shape is [num_boxes, num_classes], wherenum_boxes
is number of predicted boxes,num_classes
- number of classes in the dataset including background.bbox_out
- the name of output layer with detected boxes deltas. The layer shape is [num_boxes, 8] wherenum_boxes
is number of predicted boxes, 8 (4 for background + 4 for foreground) bounding boxes coordinates.roid_out
- the name of output layer with regions of interest.
ppdetection
- converts output of PaddlePaddle object detection models toDetectionPrediction
.boxes_out
- the name of output layer with predicted boxes in format [[label
,score
,x_min
,y_min
,x_max
,y_max
] …num_boxes_out
- the name of output layer with number of predicted boxes for each image in batch.
face_person_detection
- converting face person detection model output with 2 detection outputs toContainerPredition
, where value of parametersface_out
andperson_out
are used for identificationDetectionPrediction
in container.face_out
- face detection output layer name.person_out
- person detection output layer name.
person_attributes
- converting person attributes recognition model output toMultiLabelRecognitionPrediction
.attributes_recognition_out
- output layer name with attributes scores. (optional, used if your model has more than one outputs).
vehicle_attributes
- converting vehicle attributes recognition model output toContainerPrediction
where value of parameterscolor_out
andtype_out
are used for identificationClassificationPrediction
in container.color_out
- vehicle color attribute output layer name.type_out
- vehicle type attribute output layer name.
head_pose
- converting head pose estimation model output toContainerPrediction
where names of parametersangle_pitch
,angle_yaw
andangle_roll
are used for identificationRegressionPrediction
in container.angle_pitch
- output layer name for pitch angle.angle_yaw
- output layer name for yaw angle.angle_roll
- output layer name for roll angle.
age_gender
- converting age gender recognition model output toContainerPrediction
withClassificationPrediction
namedgender
for gender recognition,ClassificationPrediction
namedage_classification
andRegressionPrediction
namedage_error
for age recognition.age_out
- output layer name for age recognition.gender_out
- output layer name for gender recognition.
age_recognition
- converting age recognition model output toContainerPrediction
withClassificationPrediction
namedage_classification
andRegressionPrediction
namedage_error
for age recognition.age_out
- output layer name for age recognition (Optional).
action_detection
- converting output of model for person detection and action recognition tasks toContainerPrediction
withDetectionPrediction
for class agnostic metric calculation andActionDetectionPrediction
for action recognition. The representations in container have namesclass_agnostic_prediction
andaction_prediction
respectively.priorbox_out
- name of layer containing prior boxes in SSD format.loc_out
- name of layer containing box coordinates in SSD format.main_conf_out
- name of layer containing detection confidences.add_conf_out_prefix
- prefix for generation name of layers containing action confidences if topology has several following layers or layer name.add_conf_out_count
- number of layers with action confidences (optional, you can not provide this argument if action confidences contained in one layer).num_action_classes
- number classes for action recognition.detection_threshold
- minimal detection confidences level for valid detections.actions_scores_threshold
- minimal actions confidences level for valid detections.action_scale
- scale for correct action score calculation.
image_processing
- converting output of network for single image processing toImageProcessingPrediction
.reverse_channels
- allow switching output image channels e.g. RGB to BGR (Optional. Default value is False).mean
- value or list channel-wise values which should be added to result for getting values in range [0, 255] (Optional, default 0)std
- value or list channel-wise values on which result should be multiplied for getting values in range [0, 255] (Optional, default 255) Important Usuallymean
andstd
are the same which used in preprocessing, here they are used for reverting these preprocessing operations. The order of actions:
Multiply on
std
Add
mean
Reverse channels if this option enabled.
target_out
- target model output layer name in case when model has several outputs.
super_resolution
- converting output of single image super resolution network toSuperResolutionPrediction
.reverse_channels
- allow switching output image channels e.g. RGB to BGR (Optional. Default value is False).mean
- value or list channel-wise values which should be added to result for getting values in range [0, 255] (Optional, default 0)std
- value or list channel-wise values on which result should be multiplied for getting values in range [0, 255] (Optional, default 255)cast_to_uint8
- perform casting output image pixels to [0, 255] range. Important Usuallymean
andstd
are the same which used in preprocessing, here they are used for reverting these preprocessing operations. The order of actions:
Multiply on
std
Add
mean
Reverse channels if this option enabled.
target_out
- super resolution model output layer name in case when model has several outputs.
multi_target_super_resolution
- converting output super resolution network with multiple outputs toContainerPrediction
withSuperResolutionPrediction
for each output.reverse_channels
- allow switching output image channels e.g. RGB to BGR (Optional. Default value is False).mean
- value or list channel-wise values which should be added to result for getting values in range [0, 255] (Optional, default 0)std
- value or list channel-wise values on which result should be multiplied for getting values in range [0, 255] (Optional, default 255)cast_to_uint8
- perform casting output image pixels to [0, 255] range. Important Usuallymean
andstd
are the same which used in preprocessing, here they are used for reverting these preprocessing operations. The order of actions:
Multiply on
std
Add
mean
Reverse channels if this option enabled.
target_mapping
- dictionary where keys are a meaningful name for solved task which will be used as keys insideConverterPrediction
, values - output layer names.
super_resolution_yuv
- converts output of super resolution model, which return output in YUV format, toSuperResolutionPrediction
. Each output layer contains only 1 channel.y_output
- Y channel output layer.u_output
- U channel output layer.v_output
- V channel output layer.target_color
- taret color space for super resolution image -bgr
andrgb
are supported. (Optional, defaultbgr
).
landmarks_regression
- converting output of model for landmarks regression toFacialLandmarksPrediction
orHandLandmarksPrediction
.landmarks_out
- landmarks output layer.landmarks_step
- number of coordinates per landmark (optional, default2
).is_hand_landmarks
- allows conversion toHandLandmarksPrediction
instead ofFacialLandmarksPrediction
(optional, defaultFalse
).
pixel_link_text_detection
- converting output of PixelLink like model for text detection toTextDetectionPrediction
.pixel_class_out
- name of layer containing information related to text/no-text classification for each pixel.pixel_link_out
- name of layer containing information related to linkage between pixels and their neighbors.pixel_class_confidence_threshold
- confidence threshold for valid segmentation mask (Optional, default 0.8).pixel_link_confidence_threshold
- confidence threshold for valid pixel links (Optional, default 0.8).min_area
- minimal area for valid text prediction (Optional, default 0).min_height
- minimal height for valid text prediction (Optional, default 0).
ctpn_text_detection
- converting output of CTPN like model for text detection toTextDetectionPrediction
.cls_prob_out
- name of output layer with class probabilities.bbox_pred_out
- name of output layer with predicted boxes.min_size
- minimal valid detected text proposals size (Optional, default 8).min_ratio
- minimal width / height ratio for valid text line (Optional, default 0.5).line_min_score
- minimal confidence for text line (Optional, default 0.9).text_proposals_width
- minimal width for text proposal (Optional, default 16).min_num_proposals
- minimal number for text proposals (Optional, default 2).pre_nms_top_n
- saved top n proposals before NMS applying (Optional, default 12000).post_nms_top_n
- saved top n proposals after NMS applying (Optional, default 1000).nms_threshold
- overlap threshold for NMS (Optional, default 0.7).
east_text_detection
- converting output of EAST like model for text detection toTextDetectionPrediction
.score_map_out
- the name of output layer which contains score map.geometry_map_out
- the name of output layer which contains geometry map.score_map_threshold
- threshold for score map (Optional, default 0.8).nms_threshold
- threshold for text boxes NMS (Optional, default 0.2).box_threshold
- minimal confidence threshold for text boxes (Optional, default 0.1).
craft_text_detection
- converting output of CRAFT like model for text detection toTextDetectionPrediction
.score_out
- the name of output layer which contains score map.text_threshold
- text confidence threshold (Optional, default 0.7).link_threshold
- link confidence threshold (Optional, default 0.4).low_text
- text low-bound score (Optional, default 0.4).
ppocr_det
- converting output PPOCR text detection model toTextDetectionPrediction
threshold
- segmentation bitmap threshold (Optional, default 0.3).box_threshold
- predicted boxes filtering by confidence threshold (Optional, detault 0.7).max_candidates
- maximum detected candidates for considering (Optional, default 1000).unclip_ratio
- unclip ratio (Optional, default 2).min_size
- minimum box size (Optional, default 3).
facial_landmarks_detection
- converting output of model for face landmark detection toFacialLandmarksHeatMapPrediction
.human_pose_estimation
- converting output of model for human pose estimation toPoseEstimationPrediction
.part_affinity_fields_out
- name of output layer with keypoints pairwise relations (part affinity fields).keypoints_heatmap_out
- name of output layer with keypoints heatmaps. The output layers can be omitted if model has only one output layer - concatenation of this 2.
human_pose_estimation_openpose
- converting output of OpenPose-like model for human pose estimation toPoseEstimationPrediction
.part_affinity_fields_out
- name of output layer with keypoints pairwise relations (part affinity fields).keypoints_heatmap_out
- name of output layer with keypoints heatmaps.upscale_factor
- upscaling factor for heatmaps and part affinity fields before post-processing.
human_pose_estimation_ae
- converting output of Associative Embedding-like model for human pose estimation toPoseEstimationPrediction
.heatmaps_out
- name of output layer with keypoints heatmaps.nms_heatmaps_out
- name of output layer with keypoints heatmaps after non-maximum suppression.embeddings_out
- name of output layer with embedding (tag) maps.
beam_search_decoder
- realization CTC Beam Search decoder for symbol sequence recognition, converting model output toCharacterRecognitionPrediction
.beam_size
- size of the beam to use during decoding (default 10).blank_label
- index of the CTC blank label.softmaxed_probabilities
- indicator that model uses softmax for output layer (default False).logits_output
- Name of the output layer of the network to use in decodercustom_label_map
- Alphabet as a dict of strings. Must include blank symbol for CTC algorithm (Optional, if provided in dataset_meta or vocabulary_file).vocabulary_file
- file with model vocab, represented as txt file, where each label is located on own line (Optional).
ctc_greedy_search_decoder
- realization CTC Greedy Search decoder for symbol sequence recognition, converting model output toCharacterRecognitionPrediction
.blank_label
- index of the CTC blank label (default 0).logits_output
- Name of the output layer of the network (Optional).custom_label_map
- Alphabet as a dict of strings. Must include blank symbol for CTC algorithm (Optional, if provided in dataset_meta or vocabulary_file).vocabulary_file
- file with model vocab, represented as txt file, where each label is located on own line (Optional).shift_labels
- shift label map ids on 1 if it represented without blank label on zero position (Optional, default False).
simple_decoder
- the easiest decoder for text recognition models, converts indices of classes to given letters, slices output on the first entry ofeos_label
eos_label
- label which should finish decoding (Optional, default[s]
).start_label
- label which should start decoding (Optional).custom_label_map
- label map (if not provided by the dataset meta).start_index
- start index in predicted data (Optional, default 0).do_lower
- allows converting predicted data to lower case (Optional, default False).vocabulary_file
- file with decoding labels (Optional).
ctc_beam_search_decoder
- Python implementation of CTC beam search decoder without LM for speech recognition.ctc_greedy_decoder
- CTC greedy decoder for speech recognition.ctc_beam_search_decoder_with_lm
- Python implementation of CTC beam search decoder with n-gram language model in kenlm binary format for speech recognition.beam_size
- Size of the beam to use during decoding (default 10).logarithmic_prob
- Set to “True” to indicate that network gives natural-logarithmic probabilities. Default is False for plain probabilities (after softmax).probability_out
- Name of the network’s output with character probabilities (required)alphabet
- Alphabet as list of strings. Include an empty string for the CTC blank symbol. Default is space + 26 English letters + apostrophe + blank.sep
- Word separator character. Use an empty string for character-based LM. Default is space.lm_file
- Path to LM in binary kenlm format, relative to model_attributes or models. Default is beam search without LM.lm_alpha
- LM alpha: weight factor for LM score (required when using LM)lm_beta
- LM beta: score bonus for each additional word, in log_e units (required when using LM)lm_oov_score
- Replace LM score for out-of-vocabulary words with this value (default -1000, ignored without LM)lm_vocabulary_offset
- Start of vocabulary strings section in the LM file. Default is to not filter candidate words using vocabulary (ignored without LM)lm_vocabulary_length
- Size in bytes of vocabulary strings section in the LM file (ignored without LM)
fast_ctc_beam_search_decoder_with_lm
- CTC beam search decoder with n-gram language model in kenlm binary format for speech recognition, depends onctcdecode_numpy
Python module located in the<omz_dir>/demos/speech_recognition_deepspeech_demo/python/ctcdecode-numpy/
directory.beam_size
- Size of the beam to use during decoding (default 10).logarithmic_prob
- Set to “True” to indicate that network gives natural-logarithmic probabilities. Default is False for plain probabilities (after softmax).probability_out
- Name of the network’s output with character probabilities (required)alphabet
- Alphabet as list of strings. Include an empty string for the CTC blank sybmol. Default is space + 26 English letters + apostrophe + blank.sep
- Set to the empty string for character-based LM. Default is space.lm_file
- Path to LM in binary kenlm format, relative to model_attributes or models. Default is beam search without LM.lm_alpha
- LM alpha: weight factor for LM score (required when using LM)lm_beta
- LM beta: score bonus for each additional word, in log_e units (required when using LM)
wav2vec
- decodes output Wav2Vec model toCharacterRecognitionPrediction
.alphabet
- list of supported tokens for conversion token_ids.pad_token
- token, which represents padding in an alphabet (wav2vec uses this token as CTC-blank) Optional, default<pad>
.words_delimeter
- token, which represents delimiter between words in sequence. Optional, default|
.group_tokens
- allow replacing repeated tokens by one. Optional, defaultTrue
.lower_case
- allow converting result to lower case. Optional, defaultFalse
.cleanup_whitespaces
- allow merge extra whitespaces to one. Optional, defaultTrue
.
gaze_estimation
- converting output of gaze estimation model toGazeVectorPrediction
.hit_ratio_adapter
- converting output NCF model toHitRatioPrediction
.brain_tumor_segmentation
- converting output of brain tumor segmentation model toBrainTumorSegmentationPrediction
.segmentation_out
- segmentation output layer name. (Optional, if not provided default first output blob will be used).make_argmax
- allows applying argmax operation to output values. (default -False
)label_order
- sets mapping from output classes to dataset classes. For example:label_order: [3,1,2]
means that class with id 3 from model’s output matches with class with id 1 from dataset, class with id 1 from model’s output matches with class with id 2 from dataset, class with id 2 from model’s output matches with class with id 3 from dataset.
nmt
- converting output of neural machine translation model toMachineTranslationPrediction
.vocabulary_file
- file which contains vocabulary for encoding model predicted indexes to words (e. g. vocab.bpe.32000.de). Path can be prefixed with--models
arguments.eos_index
- index end of string symbol in vocabulary (Optional, used in cases when launcher does not support dynamic output shape for cut off empty prediction).
bert_question_answering_embedding
- converting output of BERT model trained to produce embedding vectors toQuestionAnsweringEmbeddingPrediction
.narnmt
- converting output of non-autoregressive neural machine translation model toMachineTranslationPrediction
.vocabulary_file
- file which contains vocabulary for encoding model predicted indexes to words (e. g. vocab.json). Path can be prefixed with--models
arguments.merges_file
- file which contains merges for encoding model predicted indexes to words (e. g. merges.txt). Path can be prefixed with--models
arguments.output_name
- name of model’s output layer if need (optional).sos_symbol
- string representation of start_of_sentence symbol (default=<s>
).eos_symbol
- string representation of end_of_sentence symbol (default=</s>
).pad_symbol
- string representation of pad symbol (default=<pad>
).remove_extra_symbols
- remove sos/eos/pad symbols from predicted string (default=True)
bert_question_answering
- converting output of BERT model trained to solve question answering task toQuestionAnsweringPrediction
.bidaf_question_answering
- converting output of BiDAF model trained to solve question answering task toQuestionAnsweringPrediction
.start_pos_output
- name of output layer with answer start position.end_pos_output
- name of output layer with answer end position.
bert_classification
- converting output of BERT model trained for text classification task toClassificationPrediction
.num_classes
- number of predicted classes.classification_out
- name of output layer with classification probabilities. (Optional, if not provided default first output blob will be used).single_score
- highlight that model return single value representing class id or probability belonging to class 1 in binary classification case (Optional, defaultFalse
).
bert_ner
- converting output of BERT model trained for named entity recognition task toSequenceClassificationPrediction
.classification_out
- name of output layer with classification probabilities. (Optional, if not provided default first output blob will be used).
human_pose_estimation_3d
- converting output of model for 3D human pose estimation toPoseEstimation3dPrediction
.features_3d_out
- name of output layer with 3D coordinates maps.keypoints_heatmap_out
- name of output layer with keypoints heatmaps.part_affinity_fields_out
- name of output layer with keypoints pairwise relations (part affinity fields).
ctdet
- converting output of CenterNet object detection model toDetectionPrediction
.center_heatmap_out
- name of output layer with center points heatmaps.width_height_out
- name of the output layer with object sizes.regression_out
- name of the regression output with the offset prediction.
mask_rcnn
- converting raw outputs of Mask-RCNN to combination ofDetectionPrediction
andCoCoInstanceSegmentationPrediction
.classes_out
- name of output layer with information about classes (optional, if your model has detection_output layer as output).scores_out
- name of output layer with bbox scores (optional, if your model has detection_output layer as output).boxes_out
- name of output layer with bboxes (optional, if your model has detection_output layer as output).raw_masks_out
- name of output layer with raw instances masks.num_detections_out
- name of output layer with number valid detections (used in MaskRCNN models trained with TF Object Detection API).detection_out
- SSD-like detection output layer name (optional, if your model has scores_out, boxes_out and classes_out).
mask_rcnn_with_text
- converting raw outputs of Mask-RCNN with additional Text Recognition head toTextDetectionPrediction
.classes_out
- name of output layer with information about classes.scores_out
- name of output layer with bbox scores.boxes_out
- name of output layer with bboxes.raw_masks_out
- name of output layer with raw instances masks.texts_out
- name of output layer with texts.confidence_threshold
- confidence threshold that is used to filter out detected instances.
yolact
- converting raw outputs of Yolact model to combination ofDetectionPrediction
andCoCoInstanceSegmentationPrediction
.loc_out
- name of output layer which contains box locations, optional if boxes decoding embedded into model.prior_out
- name of output layer which contains prior boxes, optional if boxes decoding embedded into model.boxes_out
- name of output layer which contains decoded output boxes, optional if model hasprior
aloc
outputs for boxes decoding.conf_out
- name of output layer which contains confidence scores for all classes for each box.mask_out
- name of output layer which contains instance masks.proto_out
- name of output layer which contains proto for masks calculation.confidence_threshold
- confidence threshold that is used to filter out detected instances (Optional, default 0.05).max_detections
- maximum detection used for metrics calculation (Optional, default 100).
class_agnostic_detection
- converting ‘boxes’ [n, 5] output of detection model toDetectionPrediction
representation.output_blob
- name of output layer with bboxes.scale
- scalar value or list with 2 values to normalize bbox coordinates.
mono_depth
- converting output of monocular depth estimation model toDepthEstimationPrediction
.inpainting
- converting output of Image Inpainting model toImageInpaintingPrediction
representation.style_transfer
- converting output of Style Transfer model toStyleTransferPrediction
representation.retinaface
- converting output of RetinaFace model toDetectionPrediction
or representation container withDetectionPrediction
,AttributeDetectionPrediction
,FacialLandmarksPrediction
(depends on provided set of outputs)scores_outputs
- the list of names for output layers with face detection score in order belonging to 32-, 16-, 8-strides.bboxes_outputs
- the list of names for output layers with face detection boxes in order belonging to 32-, 16-, 8-strides.landmarks_outputs
- the list of names for output layers with predicted facial landmarks in order belonging to 32-, 16-, 8-strides (optional, if not provided, onlyDetectionPrediction
will be generated).type_scores_outputs
- the list of names for output layers with attributes detection score in order belonging to 32-, 16-, 8-strides (optional, if not provided, onlyDetectionPrediction
will be generated).nms_threshold
- overlap threshold for NMS (optional, default 0.5).keep_top_k
- maximal number of boxes which should be kept (optional).include_boundaries
- allows including boundaries for NMS (optional, default False).
retinaface-pytorch
- converting output of RetinaFace PyTorch model toDetectionPrediction
or representation container withDetectionPrediction
,FacialLandmarksPrediction
(depends on provided set of outputs)scores_output
- name for output layer with face detection score.bboxes_output
- name for output layer with face detection boxes.landmarks_output
- name for output layer with predicted facial landmarks (optional, if not provided, onlyDetectionPrediction
will be generated).nms_threshold
- overlap threshold for NMS (optional, default 0.4).keep_top_k
- maximal number of boxes which should be kept (optional, default 750).include_boundaries
- allows including boundaries for NMS (optional, default False).confidence_threshold
- confidence threshold that is used to filter out detected instances (optional, default 0.02).
faceboxes
- converting output of FaceBoxes model toDetectionPrediction
representation.scores_out
- name of output layer with bounding boxes scores.boxes_out
- name of output layer with bounding boxes coordinates.
prnet
- converting output of PRNet model for 3D landmarks regression task toFacialLandmarks3DPrediction
landmarks_ids_file
- the file with indices for landmarks extraction from position heatmap. (Optional, default values defined here)
person_vehicle_detection
- converts output of person vehicle detection model toDetectionPrediction
representation. Adapter merges scores, groups predictions into people and vehicles, and assigns labels accordingly.iou_threshold
- IOU threshold value for NMS operation.
face_detection
- converts output of face detection model toDetectionPrediction
representation. Operation is performed by mapping model output to the defined anchors, window scales, window translates, and window lengths to generate a list of face candidates.score_threshold
- Score threshold value used to discern whether a face is valid.layer_names
- Target output layer base names.anchor_sizes
- Anchor sizes for each base output layer.window_scales
- Window scales for each base output layer.window_lengths
- Window lengths for each base output layer.
face_detection_refinement
- converts output of face detection refinement model toDetectionPrediction
representation. Adapter refines candidates generated in previous stage model.threshold
- Score threshold to determine as valid face candidate.
attribute_classification
- converts output of attributes classification model toContainerPrediction
which contains multipleClassificationPrediction
for attributes with their scores.output_layer_map
- dictionary where keys are output layer names of attribute classification model and values are the names of attributes.
regression
- converting output of regression model toRegressionPrediction
representation.keep_shape
- allow keeping shape of predicted multi dimension array (Optional, default False).
multi_output_regression
- converting raw output features toRegressionPrediction
for regression with gt data.output
- list of target output names.ignore_batch
- whether ignore the output batch size. When processing online video streams, the output batch size is ignored. Default is False.
mixed
- converts outputs of any model toContainerPrediction
which contains multiple types of predictions.adapters
- Dict where key is an output name and value is adapter config map includingoutput_blob
key to associate the output of model and this adapter.
person_vehilce_detection_refinement
- converts output of person vehicle detection refinement model toDetectionPrediction
representation. Adapter refines proposals generated in previous stage model.head_detection
- converts output of head detection model toDetectionPrediction
representation. Operation is performed by mapping model output to the defined anchors, window scales, window translates, and window lengths to generate a list of head candidates.score_threshold
- Score threshold value used to discern whether a face is valid.anchor_sizes
- Anchor sizes for each base output layer.window_scales
- Window scales for each base output layer.window_lengths
- Window lengths for each base output layer.
face_recognition_quality_assessment
- converts output of face recognition quality assessment model toQualityAssessmentPrediction
representation.duc_segmentation
- converts output of DUC semantic segmentation model toDUCSegmentationAdapter
representationds_rate
- Specifies downsample rate.cell_width
- Specifies cell width to extract predictions.label_num
- Specifies number of output label classes.
stacked_hourglass
- converts output of Stacked Hourglass Networks for single human pose estimation toPoseEstimationPrediction
.score_map_output
- the name of output layers for getting score map (Optional, default output blob will be used if not provided).
dna_seq_beam_search
- converts output of DNA sequencing model toDNASequencePrediction
using beam search decoding.beam_size
- beam size for CTC Beam Search (Optional, default 5).threshold
- beam cut threshold (Optional, default 1e-3).output_blob
- name of output layer with sequence prediction (Optional, will be automatically selected from model if not provided).
dna_seq_crf_beam_search
- converts output of DNA sequencing CRF model toDNASequencePrediction
using beam search decoding.output_blob
- name of output layer with sequence prediction (Optional, will be automatically selected from model if not provided).
pwcnet
- converts output of PWCNet network toOpticalFlowPrediction
.flow_out
- target output layer name.
salient_object_detection
- converts output of salient object detection model toSalientRegionPrediction
salient_map_output
- target output layer for getting salience map (Optional, if not provided default output blob will be used).
two_stage_detection
- converts output of 2-stage detector toDetectionPrediction
.boxes_out
- output with bounding boxes in the format BxNx[x_min, y_min, width, height], where B - network batch size, N - number of detected boxes.cls_out
- output with classification probabilities in format [BxNxC], where B - network batch size, N - number of detected boxes, C - number of classed.
dumb_decoder
- converts audio recognition model output toCharacterRecognitionPrediction
.alphabet
- list of supported tokens. You can also usevocabulary_file
if vocabulary is very large, txt file with accepted tokens list (each token should be located on own line represented as token_id).blank_token_id
- token_id for blank token (Optional, used for blank label filtering after decoding).eos_token_id
- token_id for end of string (Optional, used for eos token filtering after decoding).replace_underscore
- allow replacing undescrore symbol to white space after decoding.uppercase
- produce prediction in uppercase, default isTrue
.
detr
- converts output of DETR models family toDetectionPrediction
.scores_out
- output layer name with detection scores logits.boxes_out
- output layer name with detection boxes coordinates in [Cx,Cy,W, H] format, where Cx - x coordinate of box center, Cy - y coordinate of box center, W, H - width and height respectively.
ultra_lightweight_face_detection
- converts output of Ultra-Lightweight Face Detection models toDetectionPrediction
representation.scores_out
- name of output layer with bounding boxes scores.boxes_out
- name of output layer with bounding boxes coordinates.score_threshold
- minimal accepted score for valid boxes (Optional, default 0.7).
trimap
- converts greyscale model output toImageProcessingPrediction
. Replaces pixel values in cut and keep zones with 0 and 1 respectively. All other postprocessing inherited fromimage_processing
adapter.background_matting
- converts output of background matting model toBackgroundMattingPrediction
.noise_suppression
- converts output of audio denoising model toNoiseSuppressionPrediction
.output_blob
- name of output layer with processed signal (Optional, if not provided, first found output from model will be used).
kaldi_latgen_faster_mapped
- decodes output Kaldi* automatic speech recognition model using lattice generation approach with transition model toCharcterRecognitionPrediction
. Important note This adapter requires Kaldi* installation (we recommend to use67db30cc
commit) and providing path to directory with compiled executable apps:latgen-faster-mapped
,lattice-scale
,lattice-add-penalty
,lattice-best-path
. Path directory can be provided using--kaldi_bin_dir
commandline argument orKALDI_BIN_DIR
environment variable.fst_file
- Weighted Finite-State Transducers (WFST) state graph file. *words_file
- words table file.transition_model_file
- transition model file.beam
- beam size (Optional, default1
).lattice_beam
- lattice beam size (Optional, default1
).allow_partial
- allow partial decoding (Optional, defaultFalse
).acoustic_scale
- acoustic scale for decoding (Optional, default0.1
).min_active
- min active paths for decoding (Optional, default200
).max_active
- max active paths for decoding (Optional, default7000
).inverse_acoustic_scale
- inverse acoustic scale for lattice scaling (Optional, default0
).word_insertion_penalty
- add word insertion penalty to the lattice. Penalties are negative log-probs, base e, and are added to the language model’ part of the cost (Optional,0
).
kaldi_feat_regression
- converts output features from kaldi models toRegressionPrediction
with merging whole matrix features and making deprocessing according context window size, if it is necessary.target_out
- name of target output layer for regression (Optional, if not provided, the first output will be used).flattenize
- make output features flatten. (Optional, defaultFalse
).
quantiles_predictor
- converts output of Time Series Forecasting models toTimeSeriesForecastingQuantilesPrediction
.quantiles
- predictions[i]->quantile[i] mapping.output_name
- name of output node to convert.
mask_to_binary_classification
- converts output of model represented as segmentation mask toArgMaxClassificationPrediction
. Class label calculated as comparision maximal probability in mask with given threshold.threshold
- probability threshold for label 1 (Optional, default 0.5).
ssd_multilabel
- converting output of SSD-based model where multiple labels can correspond to one box toDetectionPrediction
representation.scores_out
- name of output layer with bounding boxes scores.boxes_out
- name of output layer with bounding boxes coordinates.confidence_threshold
- lower bound for valid boxes scores (optional, default 0.01).nms_threshold
- overlap threshold for NMS (optional, default 0.45).keep_top_k
- maximal number of boxes which should be kept during NMS (optional, default 200).diff_coord_order
- ordering convention of coordinates differs from the commonly used format [x0, y0, x1, y1]. If value is True, the format of coordinates is [y0, x0, y1, x1] (optional, default False).max_detections
- maximal number of boxes which should be kept (optional).
background_matting_with_pha_and_fgr
- converts output of background matting models which predicts foreground and alpha toBackgroundMattingPrediction
:alpha_out
- name of output layer with alpha.foreground_out
- name of output layer with foreground.
nanodet
- converting output of NanoDet models family toDetectionPrediction
representation.num_classes
- number of predicted classes (optional, default 80).confidence_threshold
- lower bound for valid boxes scores (optional, default 0.05).nms_threshold
- overlap threshold for NMS (optional, default 0.6).max_detections
- maximal number of boxes which should be kept (optional, default 100).reg_max
- maximal value of integral set (optional, default 7).strides
- strides of input multi-level feature maps (optional, default [8, 16, 32]).is_legacy
- using a legacy NanoDet model (optional, default False).
palm_detection
- converting output of palm detection model toDetectionPrediction
representation.scores_out
- name ofscores
model output.boxes_out
- name ofboxes
model output.num_anchor_layers
- number of layers for anchors calculation (optional, default4
).strides
- strides of input multi-level feature maps (optional, default[8, 16, 16, 16]
).min_scale
- minimal scale for anchors calculation (optional, default0.1484375
).max_scale
- maximal scale for anchors calculation (optional, default0.75
).input_size_width
- width of a model input image (optional, default128
).input_size_height
- height of a model input image (optional, default128
).reduce_boxes_in_lowest_layer
- reduce size of anchors in lowest layer (optional, defaultFalse
).aspect_ratios
- Aspect ratios for multi-level feature maps (optional, default[1]
).inteprolated_scale_aspect_ratio
- aspect ratio for interpolated scale (optional, default1
).fixed_anchor_size
- produces anchors with fixed size (optional, default ‘True’).sigmoid_score
- score output is sigmoid (optional, default ‘True’).score_clipping_thresh
- score clipping threshold (optional, default100
).reverse_output_order
-boxes
output data order is (x,y) instead of (y,x) (optional, defaultTrue
).keypoint_coord_offset
- offset of keypoints coordinates inboxes
output (optional, default4
).num_keypoints
- Number of keypoints inboxes
output(optional, default7
).num_values_per_keypoint
- Number of coordinates per keypoint (optional, default2
).scales
- detection box scales for x,y,w,h. (optional, default[128, 128, 128, 128]
).min_score_thresh
- lower bound for valid boxes scores (optional, default0.5
).apply_exp_on_box_size
- box sizes is argument of exponent (optional, defaultFalse
).num_classes
- number of detection classes (optional, default1
).