使用 darknet-ocr 进行文字识别

几个月前测试过一个ocr识别框架——darknet-ocr，当时是CPU跑了下，速度10s左右，识别效果良好，但速度有点儿慢，当时手头没有GPU，使用在线的GPU平台测试GPU版本，不知道哪里出了问题，用GPU比CPU还要耗时（Issues），就放到了一边儿。

这两天有个小需求，于是想到它，听说同事那有个1080Ti的服务器在机房闲着，系统也挂掉了，正好重装了 Ubuntu 18.04 系统，安装了 CUDA 和 CUDNN，使用 GPU 版本测试了一下，识别图片差不多3、4s左右出结果，可以接受，以下是部署记录。

环境信息

服务器：Dell T630

系统：Ubuntu 18.04

PS：这个 T630 服务器安装 Ubuntu 的时候，我是原始镜像写入U盘，F2 进入系统设置 BIOS 启动为 UEFI 模式，再次启动的时候 F11 进入选择U盘启动安装即可进入安装系统页面，我安装的是 Server 版本，之前安装 Desktop 版本遇到个问题，就是系统装好后，显示器花屏无法使用

PS2：以下操作我使用 sudo -i 切换到了 root 用户执行

更新系统

# 更新软件包列表
apt update

# 更新系统并安装一些依赖包
apt upgrade && apt install bash build-essential ca-certificates cmake coreutils gcc g++ git gettext libavc1394-dev libc6-dev libffi-dev libpng-dev libwebp-dev make musl openssl python3 python3-dev python3-pip unzip zlib1g-dev libsm6 libxext6 libxrender-dev

安装显卡驱动

# 没找到 ubuntu-drivers devices，安装
apt install ubuntu-drivers-common

# 查看驱动列表
ubuntu-drivers devices

输出

root@t630:/home/dong# ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0 ==
modalias : pci:v000010DEd00001B06sv000010DEsd0000120Fbc03sc00i00
vendor   : NVIDIA Corporation
model    : GP102 [GeForce GTX 1080 Ti]
driver   : nvidia-driver-440-server - distro non-free
driver   : nvidia-driver-455 - distro non-free recommended
driver   : nvidia-driver-450 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-418-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

推荐下载带有 recommend 标志的驱动

# 安装驱动
apt install nvidia-driver-455 nvidia-settings nvidia-prime

安装后需要重启系统

# 验证驱动安装情况
nvidia-smi

输出

root@t630@t630:~$ nvidia-smi
Fri Dec  4 06:47:13 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 22%   33C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |
| 22%   37C    P0    61W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 22%   38C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:84:00.0 Off |                  N/A |
| 20%   35C    P0    60W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

补充：在上方能看到 “CUDA Version: 11.1”，这里对 CUDA 的版本进行一些说明。

1，CUDA 有两种API，分别是运行时API 和驱动API，即所谓的 Runtime API 与 Driver API

nvidia-smi 输出的结果是指 CUDA Driver API 版本，nvcc 对应的为 CUDA Runtime API 版本

2，在安装CUDA 时候会安装3大组件，分别是 NVIDIA 驱动、toolkit 和 samples

NVIDIA 驱动是用来控制 GPU 硬件；

toolkit 里面包括 nvcc 编译器等；

samples 或者说 SDK 里面包括很多样例程序包括查询设备、带宽测试等等

上面说的 CUDA Driver API是依赖于 NVIDIA 驱动安装的，而CUDA Runtime API 是通过CUDA toolkit 安装的。

安装 CUDA

我最先安装的 CUDA 10.2 版本，编译 darknet 报错，于是切换为 CUDA 10.0 版本，错误消失。

下载地址：https://developer.nvidia.com/cuda-toolkit-archive

下载路径：[CUDA 10.0 - Linux - x86_64 - Ubuntu - runfile(local)]

sh cuda_10.0.130_410.48_linux.run

安装时会有一些选项

询问是否添加显卡驱动（否）
是否创建 /usr/local/cuda 软链接（是）

安装完成后，将如下内容添加到 ~/.bashrc 文件底部

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

使配置生效

source ~/.bashrc

查看 CUDA 版本号

nvcc --version

# 或者
cat /usr/local/cuda/version.txt

安装 cuDNN

NVIDIA cuDNN 是用于深度神经网络的GPU加速库

下载地址：https://developer.nvidia.com/rdp/cudnn-download （需要注册才能下载）

没特别需要注意的，就是找到 CUDA 对应版本的 CUDNN 进行下载，我这里下载的是 deb 包

# 安装 10.0 版本 CUDNN
dpkg -i libcudnn7_7.6.5.32-1+cuda10.0_amd64.deb
dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.0_amd64.deb
dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64.deb

确认 cuDNN 是否安装好

cp -r /usr/src/cudnn_samples_v7/ $HOME
cd  $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN

输出

root@t630:~/cudnn_samples_v7/mnistCUDNN# ./mnistCUDNN 
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 7.5.0
There are 4 CUDA capable devices on your machine :
device 0 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
device 1 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=1
device 2 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=2
device 3 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=3
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.021504 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.040960 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.044032 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.074752 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.075744 time requiring 207360 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.018432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.029728 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.043008 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.072704 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.073600 time requiring 207360 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

看到 Test passed! 说明 cuDNN 安装成功了

编译 OpenCV

下载地址：https://codeload.github.com/opencv/opencv/zip/4.0.1

将下载的 opencv.4.0.1.zip 上传至服务器

同时，编译过程中脚本会从 Github 下载 ippicv，我这儿直接下载不下来，需要提前下载好

下载地址：https://raw.githubusercontent.com/opencv/opencv_3rdparty/ippicv/master_20180723/ippicv/ippicv_2019_lnx_intel64_general_20180723.tgz

# 解压缩
unzip opencv.4.0.1.zip && cd opencv.4.0.1

# 将下载的 ippicv 放置在服务器一个目录下（如：file:/home/dong/Downloads/）

# 编辑 cmake 脚本修改 ippicv 下载路径
vim 3rdparty/ippicv/ippicv.cmake

# 将47行的 "https://raw.githubusercontent.com/opencv/opencv_3rdparty/${IPPICV_COMMIT}ippicv/" 修改为 ippicv 刚才放置的路径，本例也就是 "file:/home/dong/Downloads/"

# 编译 opencv (当前在 opencv.4.0.1 目录下)
mkdir build && cd build

cmake -D CMAKE_BUILD_TYPE=RELEASE \
      -D CMAKE_INSTALL_PREFIX=/usr/local \
      -D BUILD_opencv_python2=OFF \
      -D BUILD_opencv_python3=ON  \
      -D PYTHON3_EXCUTABLE=/usr/bin/python3 \
      -D PYTHON3_INCLUDE_DIR=/usr/include/python3.6m  \
      -D PYTHON3_LIBRARY=/usr/lib/x86_64-linux-gnu/libpython3.6m.so \
      -D PYTHON_NUMPY_PATH=/usr/local/lib/python3.6/dist-packages/numpy/ .. \
      && make -j2 && make install

编译完成后查看 opencv 版本信息

/usr/local/bin/opencv_version

输出

4.0.1

使用 darknet-ocr

编译 darknet

# 克隆
git clone https://github.com/chineseocr/darknet-ocr.git

# 进入项目目录
cd darknet-ocr

# 编译 darknet
cd darknet && cp  Makefile-GPU Makefile && make

放置模型文件

访问 http://59.110.234.163:9990/static/models/darknet-ocr/models/ocr/chinese/ 下载 ocr.weights 放置到项目的 models/ocr/chinese/ 目录
访问 http://59.110.234.163:9990/static/models/darknet-ocr/models/text/ 下载 text.weights 放置到项目的 models/text/ 目录

Tree 目录结构

.
├── ocr
│   ├── chinese
│   │   ├── ocr.cfg
│   │   ├── ocr.json
│   │   └── ocr.weights
│   ├── ...
└── text
    ├── README.md
    ├── text.cfg
    └── text.weights

修改 config.py 配置文件

GPU=True

安装依赖

pip3 install -r requirements.txt -i  http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

运行服务

python3 app.py 8080

输出

root@t630:/home/dong/easyocr# python3 server.py 
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
policy: Using default 'constant'
max_batches: Using default '0'
layer     filters    size                  input                output
    0 conv     64  3 x 3 / 1   256 x  32 x   1   ->   256 x  32 x  64  0.009 BFLOPs
    1 max          2 x 2 / 2 x 2   256 x  32 x  64   ->   128 x  16 x  64
    2 conv    128  3 x 3 / 1   128 x  16 x  64   ->   128 x  16 x 128  0.302 BFLOPs
    3 max          2 x 2 / 2 x 2   128 x  16 x 128   ->    64 x   8 x 128
    4 conv    256  3 x 3 / 1    64 x   8 x 128   ->    64 x   8 x 256  0.302 BFLOPs
    5 conv    256  3 x 3 / 1    64 x   8 x 256   ->    64 x   8 x 256  0.604 BFLOPs
    6 max          2 x 2 / 2 x 1    64 x   8 x 256   ->    63 x   4 x 256
Unused field: 'strideW = 1'
Unused field: 'strideH = 2'
    7 conv    512  3 x 3 / 1    63 x   4 x 256   ->    63 x   4 x 512  0.595 BFLOPs
    8 conv    512  3 x 3 / 1    63 x   4 x 512   ->    63 x   4 x 512  1.189 BFLOPs
    9 max          2 x 2 / 2 x 1    63 x   4 x 512   ->    62 x   2 x 512
Unused field: 'strideW = 1'
Unused field: 'strideH = 2'
   10 conv    512  2 x 2 / 1    62 x   2 x 512   ->    61 x   1 x 512  0.128 BFLOPs
   11 conv  11316  1 x 1 / 1    61 x   1 x 512   ->    61 x   1 x11316  0.707 BFLOPs
Loading weights from models/ocr/chinese/ocr.weights...Done!
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
policy: Using default 'constant'
max_batches: Using default '0'
layer     filters    size                  input                output
    0 conv     64  3 x 3 / 1    32 x  32 x   3   ->    32 x  32 x  64  0.004 BFLOPs
    1 conv     64  3 x 3 / 1    32 x  32 x  64   ->    32 x  32 x  64  0.075 BFLOPs
    2 max          2 x 2 / 2 x 2    32 x  32 x  64   ->    16 x  16 x  64
    3 conv    128  3 x 3 / 1    16 x  16 x  64   ->    16 x  16 x 128  0.038 BFLOPs
    4 conv    128  3 x 3 / 1    16 x  16 x 128   ->    16 x  16 x 128  0.075 BFLOPs
    5 max          2 x 2 / 2 x 2    16 x  16 x 128   ->     8 x   8 x 128
    6 conv    256  3 x 3 / 1     8 x   8 x 128   ->     8 x   8 x 256  0.038 BFLOPs
    7 conv    256  3 x 3 / 1     8 x   8 x 256   ->     8 x   8 x 256  0.075 BFLOPs
    8 conv    256  3 x 3 / 1     8 x   8 x 256   ->     8 x   8 x 256  0.075 BFLOPs
    9 max          2 x 2 / 2 x 2     8 x   8 x 256   ->     4 x   4 x 256
   10 conv    512  3 x 3 / 1     4 x   4 x 256   ->     4 x   4 x 512  0.038 BFLOPs
   11 conv    512  3 x 3 / 1     4 x   4 x 512   ->     4 x   4 x 512  0.075 BFLOPs
   12 conv    512  3 x 3 / 1     4 x   4 x 512   ->     4 x   4 x 512  0.075 BFLOPs
   13 max          2 x 2 / 2 x 2     4 x   4 x 512   ->     2 x   2 x 512
   14 conv    512  3 x 3 / 1     2 x   2 x 512   ->     2 x   2 x 512  0.019 BFLOPs
   15 conv    512  3 x 3 / 1     2 x   2 x 512   ->     2 x   2 x 512  0.019 BFLOPs
   16 conv    512  3 x 3 / 1     2 x   2 x 512   ->     2 x   2 x 512  0.019 BFLOPs
   17 conv    512  3 x 3 / 1     2 x   2 x 512   ->     2 x   2 x 512  0.019 BFLOPs
   18 conv     40  1 x 1 / 1     2 x   2 x 512   ->     2 x   2 x  40  0.000 BFLOPs
Loading weights from models/text/text.weights...Done!

显示加载模型文件完成，接下来就可以使用浏览器访问 http://[服务器IP地址]:8080/text 打开 Demo 测试效果。

补充

我在测试的时候为了便于接口调用测试，写了个接口，传入 Base64 编码图片返回识别结果。

# OCR 测试接口
@app.route('/ocr/text_extract', methods=['POST', 'OPTIONS'])
def text_extract():

    params = request.get_json()
    base64_image = params.get("base64_image")
    pil_image = base64_to_PIL(base64_image)
    np_image = np.array(pil_image)
    np_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)
    data = text_ocr(np_image, scale, maxScale, TEXT_LINE_SCORE)

    return jsonify({ "message": "success", "code": 200, "data": data })

接口 Post JSON 数据

{
    "base64_image": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
}

参考