PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms.
The following will introduce each part separately, and introduce how to add the modules required for the new algorithm.
Data loading and processing are composed of different modules, which complete the image reading, data augment and label production. This part is under ppocr/data. The explanation of each file and folder are as follows:
ppocr/data/
├── imaug # Scripts for image reading, data augment and label production
│ ├── label_ops.py # Modules that transform the label
│ ├── operators.py # Modules that transform the image
│ ├──.....
├── __init__.py
├── lmdb_dataset.py # The dataset that reads the lmdb
└── simple_dataset.py # Read the dataset saved in the form of `image_path\tgt`
PaddleOCR has a large number of built-in image operation related modules. For modules that are not built-in, you can add them through the following steps:
class MyModule:
def __init__(self, *args, **kwargs):
# your init code
pass
def __call__(self, data):
img = data['image']
label = data['label']
# your process code
data['image'] = img
data['label'] = label
return data
All different modules of data processing are executed by sequence, combined and executed in the form of a list in the config file. Such as:
# angle class data process
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- MyModule:
args1: args1
args2: args2
- KeepKeys:
keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under ppocr/modeling. The data entering the network will pass through these four parts in sequence(transforms->backbones-> necks->heads).
├── architectures # Code for building network
├── transforms # Image Transformation Module
├── backbones # Feature extraction module
├── necks # Feature enhancement module
└── heads # Output module
PaddleOCR has built-in commonly used modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in, you can add them through the following steps, the four parts are added in the same steps, take backbones as an example:
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class MyBackbone(nn.Layer):
def __init__(self, *args, **kwargs):
super(MyBackbone, self).__init__()
# your init code
self.conv = nn.xxxx
def forward(self, inputs):
# your network forward
y = self.conv(inputs)
return y
After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:
Architecture:
model_type: rec
algorithm: CRNN
Transform:
name: MyTransform
args1: args1
args2: args2
Backbone:
name: MyBackbone
args1: args1
Neck:
name: MyNeck
args1: args1
Head:
name: MyHead
args1: args1
Post-processing realizes decoding network output to obtain text box or recognized text. This part is under ppocr/postprocess. PaddleOCR has built-in post-processing modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For components that are not built-in, they can be added through the following steps:
import paddle
class MyPostProcess:
def __init__(self, *args, **kwargs):
# your init code
pass
def __call__(self, preds, label=None, *args, **kwargs):
if isinstance(preds, paddle.Tensor):
preds = preds.numpy()
# you preds decode code
preds = self.decode_preds(preds)
if label is None:
return preds
# you label decode code
label = self.decode_label(label)
return preds, label
def decode_preds(self, preds):
# you preds decode code
pass
def decode_label(self, preds):
# you label decode code
pass
After the post-processing module is added, you only need to configure it in the configuration file to use, such as:
PostProcess:
name: MyPostProcess
args1: args1
args2: args2
The loss function is used to calculate the distance between the network output and the label. This part is under ppocr/losses. PaddleOCR has built-in loss function modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in modules, you can add them through the following steps:
import paddle
from paddle import nn
class MyLoss(nn.Layer):
def __init__(self, **kwargs):
super(MyLoss, self).__init__()
# you init code
pass
def __call__(self, predicts, batch):
label = batch[1]
# your loss code
loss = self.loss(input=predicts, label=label)
return {'loss': loss}
After the loss function module is added, you only need to configure it in the configuration file to use it, such as:
Loss:
name: MyLoss
args1: args1
args2: args2
Metric is used to calculate the performance of the network on the current batch. This part is under ppocr/metrics. PaddleOCR has built-in evaluation modules related to algorithms such as detection, classification and recognition. For modules that do not have built-in modules, you can add them through the following steps:
class MyMetric(object):
def __init__(self, main_indicator='acc', **kwargs):
# main_indicator is used for select best model
self.main_indicator = main_indicator
self.reset()
def __call__(self, preds, batch, *args, **kwargs):
# preds is out of postprocess
# batch is out of dataloader
labels = batch[1]
cur_correct_num = 0
cur_all_num = 0
# you metric code
self.correct_num += cur_correct_num
self.all_num += cur_all_num
return {'acc': cur_correct_num / cur_all_num, }
def get_metric(self):
"""
return metrics {
'acc': 0,
'norm_edit_dis': 0,
}
"""
acc = self.correct_num / self.all_num
self.reset()
return {'acc': acc}
def reset(self):
# reset metric
self.correct_num = 0
self.all_num = 0
After the metric module is added, you only need to configure it in the configuration file to use it, such as:
Metric:
name: MyMetric
main_indicator: acc
The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under ppocr/optimizer. PaddleOCR has built-in
Commonly used optimizer modules such as Momentum
, Adam
and RMSProp
, common regularization modules such as Linear
, Cosine
, Step
and Piecewise
, and common learning rate decay modules such as L1Decay
and L2Decay
.
Modules without built-in can be added through the following steps, take optimizer
as an example:
from paddle import optimizer as optim
class MyOptim(object):
def __init__(self, learning_rate=0.001, *args, **kwargs):
self.learning_rate = learning_rate
def __call__(self, parameters):
# It is recommended to wrap the built-in optimizer of paddle
opt = optim.XXX(
learning_rate=self.learning_rate,
parameters=parameters)
return opt
After the optimizer module is added, you only need to configure it in the configuration file to use, such as:
Optimizer:
name: MyOptim
args1: args1
args2: args2
lr:
name: Cosine
learning_rate: 0.001
regularizer:
name: 'L2'
factor: 0