几个月前测试过一个ocr识别框架——darknet-ocr,当时是CPU跑了下,速度10s左右,识别效果良好,但速度有点儿慢,当时手头没有GPU,使用在线的GPU平台测试GPU版本,不知道哪里出了问题,用GPU比CPU还要耗时(Issues),就放到了一边儿。
这两天有个小需求,于是想到它,听说同事那有个1080Ti的服务器在机房闲着,系统也挂掉了,正好重装了 Ubuntu 18.04 系统,安装了 CUDA 和 CUDNN,使用 GPU 版本测试了一下,识别图片差不多3、4s左右出结果,可以接受,以下是部署记录。
环境信息
服务器:Dell T630
系统:Ubuntu 18.04
PS:这个 T630 服务器安装 Ubuntu 的时候,我是原始镜像写入U盘,F2 进入系统设置 BIOS 启动为 UEFI 模式,再次启动的时候 F11 进入选择U盘启动安装即可进入安装系统页面,我安装的是 Server 版本,之前安装 Desktop 版本遇到个问题,就是系统装好后,显示器花屏无法使用
PS2:以下操作我使用 sudo -i 切换到了 root 用户执行
更新系统
# 更新软件包列表
apt update
# 更新系统并安装一些依赖包
apt upgrade && apt install bash build-essential ca-certificates cmake coreutils gcc g++ git gettext libavc1394-dev libc6-dev libffi-dev libpng-dev libwebp-dev make musl openssl python3 python3-dev python3-pip unzip zlib1g-dev libsm6 libxext6 libxrender-dev
安装显卡驱动
# 没找到 ubuntu-drivers devices,安装
apt install ubuntu-drivers-common
# 查看驱动列表
ubuntu-drivers devices
输出
root@t630:/home/dong# ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0 ==
modalias : pci:v000010DEd00001B06sv000010DEsd0000120Fbc03sc00i00
vendor : NVIDIA Corporation
model : GP102 [GeForce GTX 1080 Ti]
driver : nvidia-driver-440-server - distro non-free
driver : nvidia-driver-455 - distro non-free recommended
driver : nvidia-driver-450 - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
推荐下载带有 recommend 标志的驱动
# 安装驱动
apt install nvidia-driver-455 nvidia-settings nvidia-prime
安装后需要重启系统
# 验证驱动安装情况
nvidia-smi
输出
root@t630@t630:~$ nvidia-smi
Fri Dec 4 06:47:13 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 22% 33C P0 59W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 22% 37C P0 61W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:83:00.0 Off | N/A |
| 22% 38C P0 59W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:84:00.0 Off | N/A |
| 20% 35C P0 60W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
补充:在上方能看到 “CUDA Version: 11.1”,这里对 CUDA 的版本进行一些说明。
1,CUDA 有两种API,分别是运行时API 和 驱动API,即所谓的 Runtime API 与 Driver API
nvidia-smi 输出的结果是指 CUDA Driver API 版本,nvcc 对应的为 CUDA Runtime API 版本
2,在安装CUDA 时候会安装3大组件,分别是 NVIDIA 驱动、toolkit 和 samples
NVIDIA 驱动是用来控制 GPU 硬件;
toolkit 里面包括 nvcc 编译器等;
samples 或者说 SDK 里面包括很多样例程序包括查询设备、带宽测试等等
上面说的 CUDA Driver API是依赖于 NVIDIA 驱动 安装的,而CUDA Runtime API 是通过CUDA toolkit 安装的。
安装 CUDA
我最先安装的 CUDA 10.2 版本,编译 darknet 报错,于是切换为 CUDA 10.0 版本,错误消失。
下载地址:https://developer.nvidia.com/cuda-toolkit-archive
下载路径:[CUDA 10.0 - Linux - x86_64 - Ubuntu - runfile(local)]
sh cuda_10.0.130_410.48_linux.run
安装时会有一些选项
- 询问是否添加显卡驱动(否)
- 是否创建 /usr/local/cuda 软链接(是)
安装完成后,将如下内容添加到 ~/.bashrc
文件底部
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
使配置生效
source ~/.bashrc
查看 CUDA 版本号
nvcc --version
# 或者
cat /usr/local/cuda/version.txt
安装 cuDNN
NVIDIA cuDNN 是用于深度神经网络的GPU加速库
下载地址:https://developer.nvidia.com/rdp/cudnn-download (需要注册才能下载)
没特别需要注意的,就是找到 CUDA 对应版本的 CUDNN 进行下载,我这里下载的是 deb 包
# 安装 10.0 版本 CUDNN
dpkg -i libcudnn7_7.6.5.32-1+cuda10.0_amd64.deb
dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.0_amd64.deb
dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64.deb
确认 cuDNN 是否安装好
cp -r /usr/src/cudnn_samples_v7/ $HOME
cd $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN
输出
root@t630:~/cudnn_samples_v7/mnistCUDNN# ./mnistCUDNN
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 7.5.0
There are 4 CUDA capable devices on your machine :
device 0 : sms 28 Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
device 1 : sms 28 Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=1
device 2 : sms 28 Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=2
device 3 : sms 28 Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11178, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=3
Using device 0
Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.021504 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.040960 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.044032 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.074752 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.075744 time requiring 207360 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.018432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.029728 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.043008 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.072704 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.073600 time requiring 207360 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
看到 Test passed!
说明 cuDNN 安装成功了
编译 OpenCV
下载地址:https://codeload.github.com/opencv/opencv/zip/4.0.1
将下载的 opencv.4.0.1.zip 上传至服务器
同时,编译过程中脚本会从 Github 下载 ippicv,我这儿直接下载不下来,需要提前下载好
# 解压缩
unzip opencv.4.0.1.zip && cd opencv.4.0.1
# 将下载的 ippicv 放置在服务器一个目录下(如:file:/home/dong/Downloads/)
# 编辑 cmake 脚本修改 ippicv 下载路径
vim 3rdparty/ippicv/ippicv.cmake
# 将47行的 "https://raw.githubusercontent.com/opencv/opencv_3rdparty/${IPPICV_COMMIT}ippicv/" 修改为 ippicv 刚才放置的路径,本例也就是 "file:/home/dong/Downloads/"
# 编译 opencv (当前在 opencv.4.0.1 目录下)
mkdir build && cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D BUILD_opencv_python2=OFF \
-D BUILD_opencv_python3=ON \
-D PYTHON3_EXCUTABLE=/usr/bin/python3 \
-D PYTHON3_INCLUDE_DIR=/usr/include/python3.6m \
-D PYTHON3_LIBRARY=/usr/lib/x86_64-linux-gnu/libpython3.6m.so \
-D PYTHON_NUMPY_PATH=/usr/local/lib/python3.6/dist-packages/numpy/ .. \
&& make -j2 && make install
编译完成后查看 opencv 版本信息
/usr/local/bin/opencv_version
输出
4.0.1
使用 darknet-ocr
编译 darknet
# 克隆
git clone https://github.com/chineseocr/darknet-ocr.git
# 进入项目目录
cd darknet-ocr
# 编译 darknet
cd darknet && cp Makefile-GPU Makefile && make
放置模型文件
- 访问 http://59.110.234.163:9990/static/models/darknet-ocr/models/ocr/chinese/ 下载 ocr.weights 放置到项目的 models/ocr/chinese/ 目录
- 访问 http://59.110.234.163:9990/static/models/darknet-ocr/models/text/ 下载 text.weights 放置到项目的 models/text/ 目录
Tree 目录结构
.
├── ocr
│ ├── chinese
│ │ ├── ocr.cfg
│ │ ├── ocr.json
│ │ └── ocr.weights
│ ├── ...
└── text
├── README.md
├── text.cfg
└── text.weights
修改 config.py 配置文件
GPU=True
安装依赖
pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
运行服务
python3 app.py 8080
输出
root@t630:/home/dong/easyocr# python3 server.py
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
policy: Using default 'constant'
max_batches: Using default '0'
layer filters size input output
0 conv 64 3 x 3 / 1 256 x 32 x 1 -> 256 x 32 x 64 0.009 BFLOPs
1 max 2 x 2 / 2 x 2 256 x 32 x 64 -> 128 x 16 x 64
2 conv 128 3 x 3 / 1 128 x 16 x 64 -> 128 x 16 x 128 0.302 BFLOPs
3 max 2 x 2 / 2 x 2 128 x 16 x 128 -> 64 x 8 x 128
4 conv 256 3 x 3 / 1 64 x 8 x 128 -> 64 x 8 x 256 0.302 BFLOPs
5 conv 256 3 x 3 / 1 64 x 8 x 256 -> 64 x 8 x 256 0.604 BFLOPs
6 max 2 x 2 / 2 x 1 64 x 8 x 256 -> 63 x 4 x 256
Unused field: 'strideW = 1'
Unused field: 'strideH = 2'
7 conv 512 3 x 3 / 1 63 x 4 x 256 -> 63 x 4 x 512 0.595 BFLOPs
8 conv 512 3 x 3 / 1 63 x 4 x 512 -> 63 x 4 x 512 1.189 BFLOPs
9 max 2 x 2 / 2 x 1 63 x 4 x 512 -> 62 x 2 x 512
Unused field: 'strideW = 1'
Unused field: 'strideH = 2'
10 conv 512 2 x 2 / 1 62 x 2 x 512 -> 61 x 1 x 512 0.128 BFLOPs
11 conv 11316 1 x 1 / 1 61 x 1 x 512 -> 61 x 1 x11316 0.707 BFLOPs
Loading weights from models/ocr/chinese/ocr.weights...Done!
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
policy: Using default 'constant'
max_batches: Using default '0'
layer filters size input output
0 conv 64 3 x 3 / 1 32 x 32 x 3 -> 32 x 32 x 64 0.004 BFLOPs
1 conv 64 3 x 3 / 1 32 x 32 x 64 -> 32 x 32 x 64 0.075 BFLOPs
2 max 2 x 2 / 2 x 2 32 x 32 x 64 -> 16 x 16 x 64
3 conv 128 3 x 3 / 1 16 x 16 x 64 -> 16 x 16 x 128 0.038 BFLOPs
4 conv 128 3 x 3 / 1 16 x 16 x 128 -> 16 x 16 x 128 0.075 BFLOPs
5 max 2 x 2 / 2 x 2 16 x 16 x 128 -> 8 x 8 x 128
6 conv 256 3 x 3 / 1 8 x 8 x 128 -> 8 x 8 x 256 0.038 BFLOPs
7 conv 256 3 x 3 / 1 8 x 8 x 256 -> 8 x 8 x 256 0.075 BFLOPs
8 conv 256 3 x 3 / 1 8 x 8 x 256 -> 8 x 8 x 256 0.075 BFLOPs
9 max 2 x 2 / 2 x 2 8 x 8 x 256 -> 4 x 4 x 256
10 conv 512 3 x 3 / 1 4 x 4 x 256 -> 4 x 4 x 512 0.038 BFLOPs
11 conv 512 3 x 3 / 1 4 x 4 x 512 -> 4 x 4 x 512 0.075 BFLOPs
12 conv 512 3 x 3 / 1 4 x 4 x 512 -> 4 x 4 x 512 0.075 BFLOPs
13 max 2 x 2 / 2 x 2 4 x 4 x 512 -> 2 x 2 x 512
14 conv 512 3 x 3 / 1 2 x 2 x 512 -> 2 x 2 x 512 0.019 BFLOPs
15 conv 512 3 x 3 / 1 2 x 2 x 512 -> 2 x 2 x 512 0.019 BFLOPs
16 conv 512 3 x 3 / 1 2 x 2 x 512 -> 2 x 2 x 512 0.019 BFLOPs
17 conv 512 3 x 3 / 1 2 x 2 x 512 -> 2 x 2 x 512 0.019 BFLOPs
18 conv 40 1 x 1 / 1 2 x 2 x 512 -> 2 x 2 x 40 0.000 BFLOPs
Loading weights from models/text/text.weights...Done!
显示加载模型文件完成,接下来就可以使用浏览器访问 http://[服务器IP地址]:8080/text 打开 Demo 测试效果。
补充
我在测试的时候为了便于接口调用测试,写了个接口,传入 Base64 编码图片返回识别结果。
# OCR 测试接口
@app.route('/ocr/text_extract', methods=['POST', 'OPTIONS'])
def text_extract():
params = request.get_json()
base64_image = params.get("base64_image")
pil_image = base64_to_PIL(base64_image)
np_image = np.array(pil_image)
np_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)
data = text_ocr(np_image, scale, maxScale, TEXT_LINE_SCORE)
return jsonify({ "message": "success", "code": 200, "data": data })
接口 Post JSON 数据
{
"base64_image": "..."
}
参考