使用 darknet-ocr 进行文字识别

Published: 2020-12-05

Tags: OCR



这两天有个小需求,于是想到它,听说同事那有个1080Ti的服务器在机房闲着,系统也挂掉了,正好重装了 Ubuntu 18.04 系统,安装了 CUDA 和 CUDNN,使用 GPU 版本测试了一下,识别图片差不多3、4s左右出结果,可以接受,以下是部署记录。


服务器:Dell T630

系统:Ubuntu 18.04

PS:这个 T630 服务器安装 Ubuntu 的时候,我是原始镜像写入U盘,F2 进入系统设置 BIOS 启动为 UEFI 模式,再次启动的时候 F11 进入选择U盘启动安装即可进入安装系统页面,我安装的是 Server 版本,之前安装 Desktop 版本遇到个问题,就是系统装好后,显示器花屏无法使用

PS2:以下操作我使用 sudo -i 切换到了 root 用户执行


# 更新软件包列表
apt update

# 更新系统并安装一些依赖包
apt upgrade && apt install bash build-essential ca-certificates cmake coreutils gcc g++ git gettext libavc1394-dev libc6-dev libffi-dev libpng-dev libwebp-dev make musl openssl python3 python3-dev python3-pip unzip zlib1g-dev libsm6 libxext6 libxrender-dev 


# 没找到 ubuntu-drivers devices,安装
apt install ubuntu-drivers-common

# 查看驱动列表
ubuntu-drivers devices


root@t630:/home/dong# ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0 ==
modalias : pci:v000010DEd00001B06sv000010DEsd0000120Fbc03sc00i00
vendor   : NVIDIA Corporation
model    : GP102 [GeForce GTX 1080 Ti]
driver   : nvidia-driver-440-server - distro non-free
driver   : nvidia-driver-455 - distro non-free recommended
driver   : nvidia-driver-450 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-418-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

推荐下载带有 recommend 标志的驱动

# 安装驱动
apt install nvidia-driver-455 nvidia-settings nvidia-prime


# 验证驱动安装情况


root@t630@t630:~$ nvidia-smi
Fri Dec  4 06:47:13 2020       
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 22%   33C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
|   1  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |
| 22%   37C    P0    61W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
|   2  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 22%   38C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
|   3  GeForce GTX 108...  Off  | 00000000:84:00.0 Off |                  N/A |
| 20%   35C    P0    60W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

补充:在上方能看到 “CUDA Version: 11.1”,这里对 CUDA 的版本进行一些说明。

1,CUDA 有两种API,分别是运行时API 和 驱动API,即所谓的 Runtime API 与 Driver API

nvidia-smi 输出的结果是指 CUDA Driver API 版本,nvcc 对应的为 CUDA Runtime API 版本

2,在安装CUDA 时候会安装3大组件,分别是 NVIDIA 驱动、toolkit 和 samples

  • NVIDIA 驱动是用来控制 GPU 硬件;

  • toolkit 里面包括 nvcc 编译器等;

  • samples 或者说 SDK 里面包括很多样例程序包括查询设备、带宽测试等等

上面说的 CUDA Driver API是依赖于 NVIDIA 驱动 安装的,而CUDA Runtime API 是通过CUDA toolkit 安装的。


我最先安装的 CUDA 10.2 版本,编译 darknet 报错,于是切换为 CUDA 10.0 版本,错误消失。


下载路径:[CUDA 10.0 - Linux - x86_64 - Ubuntu - runfile(local)]

sh cuda_10.0.130_410.48_linux.run


  • 询问是否添加显卡驱动(否)
  • 是否创建 /usr/local/cuda 软链接(是)

安装完成后,将如下内容添加到 ~/.bashrc 文件底部

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}


source ~/.bashrc

查看 CUDA 版本号

nvcc --version

# 或者
cat /usr/local/cuda/version.txt

安装 cuDNN

NVIDIA cuDNN 是用于深度神经网络的GPU加速库

下载地址:https://developer.nvidia.com/rdp/cudnn-download (需要注册才能下载)

没特别需要注意的,就是找到 CUDA 对应版本的 CUDNN 进行下载,我这里下载的是 deb 包

# 安装 10.0 版本 CUDNN
dpkg -i libcudnn7_7.6.5.32-1+cuda10.0_amd64.deb
dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.0_amd64.deb
dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64.deb

确认 cuDNN 是否安装好

cp -r /usr/src/cudnn_samples_v7/ $HOME
cd  $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make


root@t630:~/cudnn_samples_v7/mnistCUDNN# ./mnistCUDNN 
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 7.5.0
There are 4 CUDA capable devices on your machine :
device 0 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
device 1 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=1
device 2 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=2
device 3 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=3
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.021504 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.040960 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.044032 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.074752 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.075744 time requiring 207360 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.018432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.029728 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.043008 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.072704 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.073600 time requiring 207360 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

看到 Test passed! 说明 cuDNN 安装成功了

编译 OpenCV


将下载的 opencv.4.0.1.zip 上传至服务器

同时,编译过程中脚本会从 Github 下载 ippicv,我这儿直接下载不下来,需要提前下载好


# 解压缩
unzip opencv.4.0.1.zip && cd opencv.4.0.1

# 将下载的 ippicv 放置在服务器一个目录下(如:file:/home/dong/Downloads/)

# 编辑 cmake 脚本修改 ippicv 下载路径
vim 3rdparty/ippicv/ippicv.cmake

# 将47行的 "https://raw.githubusercontent.com/opencv/opencv_3rdparty/${IPPICV_COMMIT}ippicv/" 修改为 ippicv 刚才放置的路径,本例也就是 "file:/home/dong/Downloads/"

# 编译 opencv (当前在 opencv.4.0.1 目录下)
mkdir build && cd build

      -D CMAKE_INSTALL_PREFIX=/usr/local \
      -D BUILD_opencv_python2=OFF \
      -D BUILD_opencv_python3=ON  \
      -D PYTHON3_EXCUTABLE=/usr/bin/python3 \
      -D PYTHON3_INCLUDE_DIR=/usr/include/python3.6m  \
      -D PYTHON3_LIBRARY=/usr/lib/x86_64-linux-gnu/libpython3.6m.so \
      -D PYTHON_NUMPY_PATH=/usr/local/lib/python3.6/dist-packages/numpy/ .. \
      && make -j2 && make install

编译完成后查看 opencv 版本信息




使用 darknet-ocr

编译 darknet

# 克隆
git clone https://github.com/chineseocr/darknet-ocr.git

# 进入项目目录
cd darknet-ocr

# 编译 darknet
cd darknet && cp  Makefile-GPU Makefile && make


Tree 目录结构

├── ocr
│   ├── chinese
│   │   ├── ocr.cfg
│   │   ├── ocr.json
│   │   └── ocr.weights
│   ├── ...
└── text
    ├── README.md
    ├── text.cfg
    └── text.weights

修改 config.py 配置文件



pip3 install -r requirements.txt -i  http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com


python3 app.py 8080


root@t630:/home/dong/easyocr# python3 server.py 
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
policy: Using default 'constant'
max_batches: Using default '0'
layer     filters    size                  input                output
    0 conv     64  3 x 3 / 1   256 x  32 x   1   ->   256 x  32 x  64  0.009 BFLOPs
    1 max          2 x 2 / 2 x 2   256 x  32 x  64   ->   128 x  16 x  64
    2 conv    128  3 x 3 / 1   128 x  16 x  64   ->   128 x  16 x 128  0.302 BFLOPs
    3 max          2 x 2 / 2 x 2   128 x  16 x 128   ->    64 x   8 x 128
    4 conv    256  3 x 3 / 1    64 x   8 x 128   ->    64 x   8 x 256  0.302 BFLOPs
    5 conv    256  3 x 3 / 1    64 x   8 x 256   ->    64 x   8 x 256  0.604 BFLOPs
    6 max          2 x 2 / 2 x 1    64 x   8 x 256   ->    63 x   4 x 256
Unused field: 'strideW = 1'
Unused field: 'strideH = 2'
    7 conv    512  3 x 3 / 1    63 x   4 x 256   ->    63 x   4 x 512  0.595 BFLOPs
    8 conv    512  3 x 3 / 1    63 x   4 x 512   ->    63 x   4 x 512  1.189 BFLOPs
    9 max          2 x 2 / 2 x 1    63 x   4 x 512   ->    62 x   2 x 512
Unused field: 'strideW = 1'
Unused field: 'strideH = 2'
   10 conv    512  2 x 2 / 1    62 x   2 x 512   ->    61 x   1 x 512  0.128 BFLOPs
   11 conv  11316  1 x 1 / 1    61 x   1 x 512   ->    61 x   1 x11316  0.707 BFLOPs
Loading weights from models/ocr/chinese/ocr.weights...Done!
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
policy: Using default 'constant'
max_batches: Using default '0'
layer     filters    size                  input                output
    0 conv     64  3 x 3 / 1    32 x  32 x   3   ->    32 x  32 x  64  0.004 BFLOPs
    1 conv     64  3 x 3 / 1    32 x  32 x  64   ->    32 x  32 x  64  0.075 BFLOPs
    2 max          2 x 2 / 2 x 2    32 x  32 x  64   ->    16 x  16 x  64
    3 conv    128  3 x 3 / 1    16 x  16 x  64   ->    16 x  16 x 128  0.038 BFLOPs
    4 conv    128  3 x 3 / 1    16 x  16 x 128   ->    16 x  16 x 128  0.075 BFLOPs
    5 max          2 x 2 / 2 x 2    16 x  16 x 128   ->     8 x   8 x 128
    6 conv    256  3 x 3 / 1     8 x   8 x 128   ->     8 x   8 x 256  0.038 BFLOPs
    7 conv    256  3 x 3 / 1     8 x   8 x 256   ->     8 x   8 x 256  0.075 BFLOPs
    8 conv    256  3 x 3 / 1     8 x   8 x 256   ->     8 x   8 x 256  0.075 BFLOPs
    9 max          2 x 2 / 2 x 2     8 x   8 x 256   ->     4 x   4 x 256
   10 conv    512  3 x 3 / 1     4 x   4 x 256   ->     4 x   4 x 512  0.038 BFLOPs
   11 conv    512  3 x 3 / 1     4 x   4 x 512   ->     4 x   4 x 512  0.075 BFLOPs
   12 conv    512  3 x 3 / 1     4 x   4 x 512   ->     4 x   4 x 512  0.075 BFLOPs
   13 max          2 x 2 / 2 x 2     4 x   4 x 512   ->     2 x   2 x 512
   14 conv    512  3 x 3 / 1     2 x   2 x 512   ->     2 x   2 x 512  0.019 BFLOPs
   15 conv    512  3 x 3 / 1     2 x   2 x 512   ->     2 x   2 x 512  0.019 BFLOPs
   16 conv    512  3 x 3 / 1     2 x   2 x 512   ->     2 x   2 x 512  0.019 BFLOPs
   17 conv    512  3 x 3 / 1     2 x   2 x 512   ->     2 x   2 x 512  0.019 BFLOPs
   18 conv     40  1 x 1 / 1     2 x   2 x 512   ->     2 x   2 x  40  0.000 BFLOPs
Loading weights from models/text/text.weights...Done!

显示加载模型文件完成,接下来就可以使用浏览器访问 http://[服务器IP地址]:8080/text 打开 Demo 测试效果。


我在测试的时候为了便于接口调用测试,写了个接口,传入 Base64 编码图片返回识别结果。

# OCR 测试接口
@app.route('/ocr/text_extract', methods=['POST', 'OPTIONS'])
def text_extract():

    params = request.get_json()
    base64_image = params.get("base64_image")
    pil_image = base64_to_PIL(base64_image)
    np_image = np.array(pil_image)
    np_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)
    data = text_ocr(np_image, scale, maxScale, TEXT_LINE_SCORE)

    return jsonify({ "message": "success", "code": 200, "data": data })

接口 Post JSON 数据

    "base64_image": "..."


  1. Ubuntu 18.04 LTS+GTX1080Ti+CUDA10.0 深度学习主机环境搭建
  2. darknet-ocr README.md