Horizontal Text Detection in Real-Time#
This demo presents a use case with a client written in python which captures camera frames and performs text spotting analysis via gRPC requests to OVMS. The client visualizes the results as a boxes depicted on the original image frames using OpenCV in real-time. The client can work efficiently also over slow internet connection with long latency thanks to image data compression and parallel execution for multiple frames.
Download horizontal text detection model from OpenVINO Model Zoo#
curl -L --create-dir https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/horizontal-text-detection-0001/FP32/horizontal-text-detection-0001.bin -o horizontal-text-detection-0001/1/horizontal-text-detection-0001.bin https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/horizontal-text-detection-0001/FP32/horizontal-text-detection-0001.xml -o horizontal-text-detection-0001/1/horizontal-text-detection-0001.xml
tree horizontal-text-detection-0001
horizontal-text-detection-0001
└── 1
├── horizontal-text-detection-0001.bin
└── horizontal-text-detection-0001.xml
Start the OVMS container:#
docker run -d -u $(id -u):$(id -g) -v $(pwd)/horizontal-text-detection-0001:/model -p 9000:9000 openvino/model_server:latest \
--model_path /model --model_name text --port 9000 --layout NHWC:NCHW
Run the client#
Clone the repository and enter horizontal_text_detection directory
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server/demos/horizontal_text_detection/python
Install required packages:
pip3 install -r requirements.txt
Start the client
python3 horizontal_text_detection.py --grpc_address localhost --grpc_port 9000
You can also change the camera ID:
python3 horizontal_text_detection.py --grpc_address localhost --grpc_port 9000 --video_source 0
Or choose to work with video file as well:
python3 horizontal_text_detection.py --grpc_address localhost --grpc_port 9000 --video_source ~/video.mp4
Example output:
Initializing requesting thread index: 0
Initializing requesting thread index: 1
Initializing requesting thread index: 2
Initializing requesting thread index: 3
Launching requesting thread index: 0
Launching requesting thread index: 1
Launching requesting thread index: 2
Launching requesting thread index: 3
ThreadID: 0; Current FPS: 31.25; Average FPS: 25.64; Average latency: 140.98ms
ThreadID: 1; Current FPS: 31.23; Average FPS: 25.67; Average latency: 136.36ms
ThreadID: 2; Current FPS: 29.41; Average FPS: 25.70; Average latency: 130.88ms
ThreadID: 3; Current FPS: 30.30; Average FPS: 25.73; Average latency: 135.65ms
...
NOTE: Video source is cropped to 704x704 resolution to match model input size.
Recognize Detected Text with OCR Pipeline#
Optical Character Recognition (OCR) pipeline based on horizontal text detection model, text recognition combined with a custom node implementation can be used with the same python script used before. OCR pipeline provides location of detected text boxes on the image and additionally recognized text for each box.
Prepare workspace to run the demo#
To successfully deploy OCR pipeline you need to have a workspace that contains:
horizontal text detection and text recognition models
Custom node for image processing
Configuration file
Clone the repository and enter horizontal_text_detection directory
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server/demos/horizontal_text_detection/python
You can prepare the workspace that contains all the above by just running make
command.
Since custom node used in this demo is included in OpenVINO Model Server image you can either use the custom node from the image, or build one.
If you just want to quickly run this demo and use already compiled custom node, run:
make
Directory structure (without custom node)#
Once the make
procedure is finished, you should have workspace
directory ready with the following content.
workspace/
├── config.json
├── horizontal-text-detection-0001
│ └── 1
│ ├── horizontal-text-detection-0001.bin
│ └── horizontal-text-detection-0001.xml
└── text-recognition-0014
└── 1
├── text-recognition-0014.bin
└── text-recognition-0014.xml
If you modified the custom node or for some other reason, you want to have it compiled and then attached to the container, run:
make BUILD_CUSTOM_NODE=true BASE_OS=ubuntu
Directory structure (with custom node)#
Once the make
procedure is finished, you should have workspace
directory ready with the following content.
workspace/
├── config.json
├── horizontal-text-detection-0001
│ └── 1
│ ├── horizontal-text-detection-0001.bin
│ └── horizontal-text-detection-0001.xml
├── lib
│ └── libcustom_node_horizontal_ocr.so
└── text-recognition-0014
└── 1
├── text-recognition-0014.bin
└── text-recognition-0014.xml
Deploying OVMS#
Deploy OVMS with faces analysis pipeline using the following command:
docker run -p 9000:9000 -d -v ${PWD}/workspace:/workspace openvino/model_server --config_path /workspace/config.json --port 9000
Sending Request to the Server#
Install python dependencies:
pip3 install -r requirements.txt
Start the client
python3 horizontal_text_detection.py --grpc_address localhost --grpc_port 9000 --use_case ocr
You can also change the camera ID:
python3 horizontal_text_detection.py --grpc_address localhost --grpc_port 9000 --use_case ocr --video_source 0
Or choose to work with video file as well:
python3 horizontal_text_detection.py --grpc_address localhost --grpc_port 9000 --use_case ocr --video_source ~/video.mp4
Example output:
Initializing requesting thread index: 0
Initializing requesting thread index: 1
Initializing requesting thread index: 2
Initializing requesting thread index: 3
Launching requesting thread index: 0
Launching requesting thread index: 1
Launching requesting thread index: 2
Launching requesting thread index: 3
ThreadID: 0; Current FPS: 31.25; Average FPS: 25.64; Average latency: 140.98ms
ThreadID: 1; Current FPS: 31.23; Average FPS: 25.67; Average latency: 136.36ms
ThreadID: 2; Current FPS: 29.41; Average FPS: 25.70; Average latency: 130.88ms
ThreadID: 3; Current FPS: 30.30; Average FPS: 25.73; Average latency: 135.65ms
...
RTSP Client#
Build docker image containing rtsp client along with its dependencies The rtsp client app needs to have access to RTSP stream to read from and write to.
Example rtsp server mediamtx
docker run --rm -d -p 8080:8554 -e RTSP_PROTOCOLS=tcp bluenviron/mediamtx:latest
Then write to the server using ffmpeg, example using video or camera
ffmpeg -stream_loop -1 -i ./video.mp4 -f rtsp -rtsp_transport tcp rtsp://localhost:8080/channel1
ffmpeg -f dshow -i video="HP HD Camera" -f rtsp -rtsp_transport tcp rtsp://localhost:8080/channel1
Build the docker image with the python client for video stream reading an remote analysis:
docker build ../../common/stream_client/ -t rtsp_client
Start the client#
Command
docker run -v $(pwd):/workspace rtsp_client --help
usage: client.py [-h] [--grpc_address GRPC_ADDRESS]
[--input_stream INPUT_STREAM] [--output_stream OUTPUT_STREAM]
[--model_name MODEL_NAME] [--width WIDTH] [--height HEIGHT]
[--input_name INPUT_NAME] [--verbose] [--benchmark]
[--limit_stream_duration LIMIT_STREAM_DURATION]
[--limit_frames LIMIT_FRAMES]
options:
-h, --help show this help message and exit
--grpc_address GRPC_ADDRESS
Specify url to grpc service
--input_stream INPUT_STREAM
Url of input rtsp stream
--output_stream OUTPUT_STREAM
Url of output rtsp stream
--model_name MODEL_NAME
Name of the model
--width WIDTH Width of model's input image
--height HEIGHT Height of model's input image
--input_name INPUT_NAME
Name of the model's input
--verbose Should client dump debug information
--benchmark Should client collect processing times
--limit_stream_duration LIMIT_STREAM_DURATION
Limit how long client should run
--limit_frames LIMIT_FRAMES
Limit how many frames should be processed
Usage example
docker run --network="host" -v $(pwd):/workspace rtsp_client --grpc_address localhost:9000 --input_stream 'rtsp://localhost:8080/channel1' --output_stream 'rtsp://localhost:8080/channel2'
Then read rtsp stream using ffplay
ffplay -pixel_format yuv420p -video_size 704x704 -rtsp_transport tcp rtsp://localhost:8080/channel2
One might as well use prerecorded video and schedule it for inference. Replace horizontal_text.mp4 with your video file.
docker run --network="host" -v $(pwd):/workspace rtsp_client --grpc_address localhost:9000 --input_stream 'workspace/horizontal_text.mp4' --output_stream 'workspace/output.mp4'