StringTensorUnpack#
Versioned name: StringTensorUnpack-15
Category: Type
Short description: StringTensorUnpack operation transforms a given batch of strings into three tensors - two storing begin and end indices of the strings and another containing the concatenated string data, respectively.
Detailed description
Consider an input
string tensor containing values ["Intel", "OpenVINO"]
.
The operator will transform the tensor into three outputs:
- begins = [0, 5]
begins[0]
is equal to 0, because the first string starts at the beginning index.begins[1]
is equal to 5, because length of the string “Intel” is equal to 5.begins.shape
is equal to [2], because theinput
is a batch of 2 strings.
- ends = [5, 13]
ends[0]
is equal to 5, because length of the string “Intel” is equal to 5.ends[1]
is equal to 13, because length of the string “OpenVINO” is 8, and it needs to be summed up with length of the string “Intel”.ends.shape
is equal to[2]
, because theinput
is a batch of 2 strings.
- symbols = “IntelOpenVINO”
symbols
contains concatenated string data encoded in utf-8 bytes, interpretable usingbegins
andends
.symbols.shape
is equal to[13]
, because it’s the length of concatenatedinput
strings.
When defining begins and ends, the notation [a, b)
is used. This means that the range starts with a
and includes all values up to,
but not including, b
. That is why in the example given the length of “IntelOpenVINO” is 12, but ends vector contains 13.
Inputs
1:
data
- ND tensor of type string. Required.
Outputs
1:
begins
- ND tensor of non-negative integer numbers of type int32 and of the same shape asdata
input.2:
ends
- ND tensor of non-negative integer numbers of type int32 and of the same shape asdata
input.3:
symbols
- 1D tensor of concatenated strings data encoded in utf-8 bytes, of type u8 and size equal to the sum of the lengths of each string from thedata
input.
Examples
Example 1: 1D input
For input = ["Intel", "OpenVINO"]
<layer ... type="StringTensorUnpack" ... >
<input>
<port id="0" precision="STRING">
<dim>2</dim> <!-- batch of strings -->
</port>
</input>
<output>
<port id="0" precision="I32">
<dim>2</dim> <!-- begins = [0, 5] -->
</port>
<port id="1" precision="I32">
<dim>2</dim> <!-- ends = [5, 13] -->
</port>
<port id="2" precision="U8">
<dim>13</dim> <!-- symbols = "IntelOpenVINO" encoded in an utf-8 array -->
</port>
</output>
</layer>
Example 2: input with an empty string
For input = ["OMZ", "", "GenAI", " ", "2024"]
<layer ... type="StringTensorUnpack" ... >
<input>
<port id="0" precision="STRING">
<dim>5</dim> <!-- batch of strings -->
</port>
</input>
<output>
<port id="0" precision="I32">
<dim>2</dim> <!-- begins = [0, 3, 3, 8, 9] -->
</port>
<port id="1" precision="I32">
<dim>2</dim> <!-- ends = [3, 3, 8, 9, 13] -->
</port>
<port id="2" precision="U8">
<dim>13</dim> <!-- symbols = "OMZGenAI 2024" encoded in an utf-8 array -->
</port>
</output>
</layer>
Example 3: 2D input
For input = [["Intel", "OpenVINO"], ["OMZ", "GenAI"]]
<layer ... type="StringTensorUnpack" ... >
<input>
<port id="0" precision="STRING">
<dim>2</dim>
<dim>2</dim>
</port>
</input>
<output>
<port id="0" precision="I32">
<dim>2</dim> <!-- begins = [[0, 5], [13, 16]] -->
<dim>2</dim>
</port>
<port id="1" precision="I32">
<dim>2</dim> <!-- ends = [[5, 13], [16, 21]] -->
<dim>2</dim>
</port>
<port id="2" precision="U8">
<dim>21</dim> <!-- symbols = "IntelOpenVINOOMZGenAI" encoded in an utf-8 array -->
</port>
</output>
</layer>