教你如何用Pytorch搭建數(shù)英混合驗(yàn)證碼圖片識(shí)別模型

更新時(shí)間：2024年04月19日 12:37:40 作者：西北一條蟲(chóng)

大家都知道checkpoints存放的是模型文件，data存放的是數(shù)據(jù)集，本文給大家分享如何利用Pytorch搭建數(shù)英混合驗(yàn)證碼圖片識(shí)別模型包括普通卷積模塊，深度可分離卷積模塊，空間通道注意力模塊，殘差模塊，感興趣的朋友跟隨小編一起看看吧

一、數(shù)據(jù)集生成（create_data.py）

利用captcha模塊，大小寫(xiě)26位字母和0-9十個(gè)數(shù)字共62個(gè)字符，以每個(gè)字符為開(kāi)頭、后三位字符隨機(jī)選取的方式生成500張圖片，一共大約62*500張圖片數(shù)據(jù)集。

import os
import random
import sys
from captcha.image import ImageCaptcha
from tqdm import tqdm
# 用于生成驗(yàn)證碼的字符集
content_eng = '0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm'
content_numb = '0123456789'
char_set_eng = list(content_eng)
char_set_numb = list(content_numb)
# 驗(yàn)證碼的長(zhǎng)度，每個(gè)驗(yàn)證碼由4個(gè)數(shù)字組成
CAPTCHA_LEN = 4
# 驗(yàn)證碼圖片的存放路徑
CAPTCHA_IMAGE_PATH = 'data/numb'
CAPTCHA_IMAGE_ENG_PATH = 'data/en'
def create_captcha(captcha_text, path):
    image = ImageCaptcha()
    img = image.generate_image(captcha_text)
    ImageCaptcha.create_noise_dots(img, color='yellow', width=3, number=30)
    ImageCaptcha.create_noise_curve(img, color='blue')
    img.save(path)
# 生成英文和數(shù)字驗(yàn)證碼圖片
def generate_en_captcha_image(charSet=char_set_eng, captchaImgPath=CAPTCHA_IMAGE_ENG_PATH, numbs=500):
    k = 0
    total = 1
    char_list = list(charSet)
    char_dict = dict(zip(range(len(char_list)), char_list))
    charSetLen = len(charSet)
    if not os.path.exists(captchaImgPath):
        os.makedirs(captchaImgPath)
    for i in range(charSetLen):
        total += numbs
    for i in tqdm(range(charSetLen)):
        for _ in range(numbs):
            chars = random.choices(char_list, k=3)
            captcha_text = str(char_list[i]) + ''.join(chars)
            file_path = captchaImgPath + captcha_text + '.jpg'
            try:
                create_captcha(captcha_text, file_path)
            except:
                pass
            k += 1

二、數(shù)據(jù)預(yù)處理 (utils.py)

讀取圖片并灰度化，將圖片長(zhǎng)寬統(tǒng)一成 [60, 160]，并進(jìn)行數(shù)據(jù)增強(qiáng)

class CaptchaSet(Dataset):
    def __init__(self, mode='train', root_path='data/en', split_size=0.8, size=[60, 160], seed=666, char_set='en'):
        super(CaptchaSet, self).__init__()
        self.paths = os.listdir(root_path)
        random.seed(seed)
        random.shuffle(self.paths)
        self.images = [os.path.join(root_path, img) for img in self.paths]
        self.labels = [img.split('.')[0] for img in self.paths]
        if char_set == 'en':
            chars = '0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm'
            self.char_list = list(chars)
        if char_set == 'numb':
            chars = '0123456789'
            self.char_list = list(chars)
        self.char_dict = dict(zip(self.char_list, range(len(self.char_list))))
        idxs = int(len(self.images)*split_size)
        if mode == 'train':
            self.images = self.images[:idxs]
            self.labels = self.labels[:idxs]
        if mode == 'val':
            self.images = self.images[idxs:]
            self.labels = self.labels[idxs:]
        self.transform = transforms.Compose([
            lambda x: Image.open(x).convert('RGB'),
            transforms.Grayscale(),
            transforms.RandomRotation(0.1),
            transforms.RandomAffine(0.1),
            transforms.Resize(size),
            transforms.ToTensor(),
        ])
    def __getitem__(self, idx):
        img = self.images[idx]
        img = self.transform(img)
        label = self.labels[idx]
        label = [int(self.char_dict[i]) for i in label]
        # label = [int(i) for i in list(label)]
        label = torch.Tensor(label).long()
        return img, label
    def __len__(self):
        return len(self.images)

三、模型搭建 (models.py)

數(shù)據(jù)經(jīng)過(guò)模型的輸入輸出形狀如下

數(shù)據(jù)輸入維度：[batchsize, 1, h, w] # h, w 代表圖片的長(zhǎng)和寬

數(shù)據(jù)輸出維度：[batchsize, 4, n_classes] # n_classes 代碼字符類(lèi)別數(shù)量

模型中構(gòu)造了普通卷積模塊，深度可分離卷積模塊，空間通道注意力模塊，殘差模塊。

利用空間通道注意力學(xué)習(xí)字符的分布位置，最后直接輸出每個(gè)字符的類(lèi)別。

各個(gè)模塊代碼如下:

1)普通卷積模塊

class ConvBlock(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1):
        super(ConvBlock, self).__init__()
        self.sequential = nn.Sequential(
            nn.Conv2d(
                in_channels=in_ch,
                out_channels=out_ch,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding),
            nn.InstanceNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )
    def forward(self, x):
        x = self.sequential(x)
        return x

2) 深度可分離卷積模塊

class DepthConv(nn.Module):
    def __init__(self, in_ch, kernel_size=3, stride=1, padding=1):
        super(DepthConv, self).__init__()
        self.depth_conv = nn.Conv2d(in_ch,
                                    in_ch,
                                    kernel_size,
                                    stride,
                                    padding,
                                    groups=in_ch,
                                    )
    def forward(self, x):
        x = self.depth_conv(x)
        return x
class DepthConvBlock(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1):
        super(DepthConvBlock, self).__init__()
        self.depth = DepthConv(in_ch,
                               kernel_size=kernel_size,
                               stride=stride,
                               padding=padding)
        self.sequential = nn.Sequential(
            nn.Conv2d(in_channels=in_ch,
                      out_channels=out_ch,
                      kernel_size=1,
                      stride=1,
                      padding=0),
            nn.InstanceNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )
    def forward(self, x):
        x = self.depth(x)
        x = self.sequential(x)
        return x

3) 空間通道注意力模塊：

class ChannelAttention(nn.Module):
    '''
    func: 實(shí)現(xiàn)通道Attention.
    parameters:
        in_channels: input的通道數(shù), input.size = (batch,channel,w,h) if batch_first else (channel,batch,,w,h)
        reduction: 默認(rèn)4. 即在FC的時(shí),存在in_channels --> in_channels//reduction --> in_channels的轉(zhuǎn)換
        batch_first: 默認(rèn)True.如input為channel_first，則batch_first = False
    '''
    def __init__(self, in_channels, reduction=4, batch_first=True):
        super(ChannelAttention, self).__init__()
        self.batch_first = batch_first
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.sharedMLP = nn.Sequential(
            nn.Conv2d(in_channels, in_channels // reduction, kernel_size=1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels // reduction, in_channels, kernel_size=1, bias=False),
        )
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        if not self.batch_first:
            x = x.permute(1, 0, 2, 3)
        avgout = self.sharedMLP(self.avg_pool(x)) #size = (batch,in_channels,1,1)
        maxout = self.sharedMLP(self.max_pool(x)) #size = (batch,in_channels,1,1)
        w = self.sigmoid(avgout + maxout) #通道權(quán)重  size = (batch,in_channels,1,1)
        out = x * w.expand_as(x) #返回通道注意力后的值 size = (batch,in_channels,w,h)
        if not self.batch_first:
            out = out.permute(1, 0, 2, 3) #size = (channel,batch,w,h)
        return out
class SpatialAttention(nn.Module):
    '''
    func: 實(shí)現(xiàn)空間Attention.
    parameters:
        kernel_size: 卷積核大小, 可選3,5,7,
        batch_first: 默認(rèn)True.如input為channel_first，則batch_first = False
    '''
    def __init__(self, kernel_size=3, batch_first = True):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3, 5, 7), "kernel size must be 3 or 7"
        padding = kernel_size // 2
        self.batch_first = batch_first
        self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        if not self.batch_first:
            x = x.permute(1, 0, 2, 3)  #size = (batch,channels,w,h)
        avgout = torch.mean(x, dim=1, keepdim=True) #size = (batch,1,w,h)
        maxout, _ = torch.max(x, dim=1, keepdim=True)  #size = (batch,1,w,h)
        x1 = torch.cat([avgout, maxout], dim=1)    #size = (batch,2,w,h)
        x1 = self.conv(x1)    #size = (batch,1,w,h)
        w = self.sigmoid(x1)   #size = (batch,1,w,h)
        out = x * w            #size = (batch,channels,w,h)
        if not self.batch_first:
            out = out.permute(1, 0, 2, 3) #size = (channels,batch,w,h)
        return out
class CBAtten_Res(nn.Module):
    '''
    func:channel attention + spatial attention + resnet
    parameters:
        in_channels: input的通道數(shù), input.size = (batch,in_channels,w,h) if batch_first else (in_channels,batch,,w,h);
        out_channels: 輸出的通道數(shù)
        kernel_size: 默認(rèn)3, 可選[3,5,7]
        stride: 默認(rèn)2, 即改變out.size --> (batch,out_channels,w/stride, h/stride).
                一般情況下，out_channels = in_channels * stride
        reduction: 默認(rèn)4. 即在通道atten的FC的時(shí),存在in_channels --> in_channels//reduction --> in_channels的轉(zhuǎn)換
        batch_first:默認(rèn)True.如input為channel_first，則batch_first = False
    '''
    def __init__(self, in_channels, out_channels, kernel_size=3,
                 stride=2, reduction=4, batch_first=True):
        super(CBAtten_Res, self).__init__()
        self.batch_first = batch_first
        self.reduction = reduction
        self.padding = kernel_size // 2
        #h/2, w/2
        self.max_pool = nn.MaxPool2d(3, stride=stride, padding=self.padding)
        self.conv_res = nn.Conv2d(in_channels, out_channels,
                                  kernel_size=1,
                                  stride=1,
                                  bias=True)
        #h/2, w/2
        self.conv1 = nn.Conv2d(in_channels, out_channels,
                               kernel_size=kernel_size,
                               stride=stride,
                               padding=self.padding,
                               bias=True)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.ca = ChannelAttention(out_channels, reduction=self.reduction,
                                   batch_first=self.batch_first)
        self.sa = SpatialAttention(kernel_size=kernel_size,
                                   batch_first=self.batch_first)
    def forward(self, x):
        if not self.batch_first:
            x = x.permute(1, 0, 2, 3)  #size = (batch,in_channels,w,h)
        residual = x
        out = self.conv1(x)   #size = (batch,out_channels,w/stride,h/stride)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.ca(out)
        out = self.sa(out)  #size = (batch,out_channels,w/stride,h/stride)
        residual = self.max_pool(residual)  #size = (batch,in_channels,w/stride,h/stride)
        residual = self.conv_res(residual)  #size = (batch,out_channels,w/stride,h/stride)
        out += residual                       #殘差
        out = self.relu(out)                    #size = (batch,out_channels,w/stride,h/stride)
        if not self.batch_first:
            out = out.permute(1, 0, 2, 3)       #size = (out_channels,batch,w/stride,h/stride)
        return out

4）殘差模塊

class IRBlock(nn.Module):
    """
    IRB殘差塊: ConvBlock, DepthWiseConv, InstanceNorm2d, LeakyReLU, Conv2d, InstanceNorm2d
    rate: 輸入通道數(shù)乘以rate，要變換的通道數(shù)
    輸入與輸出維度保持不變
    """
    def __init__(self, in_ch, rate=2, kernel_size=1, stride=1, padding=0):
        super(IRBlock, self).__init__()
        res_ch = in_ch * rate
        self.conv1 = ConvBlock(in_ch, res_ch, kernel_size=kernel_size, stride=stride, padding=padding)
        self.dw1 = DepthConv(res_ch)
        self.sequential = nn.Sequential(
            nn.InstanceNorm2d(res_ch),
            nn.LeakyReLU(),
            nn.Conv2d(res_ch, in_ch, kernel_size=1, stride=1, padding=0),
            nn.InstanceNorm2d(in_ch)
        )
        self.down_conv = False
        if stride > 1:
            self.down_conv = nn.Conv2d(in_ch, in_ch, kernel_size=kernel_size, stride=stride, padding=padding)
    def forward(self, x):
        out = self.conv1(x)
        out = self.dw1(out)
        if self.down_conv:
            x = self.down_conv(x)
        out = self.sequential(out) + x
        return out

5）利用各個(gè)模塊搭建模型

class Net1(nn.Module):
    def __init__(self, in_ch=1, out_ch=4, n_classes=10):
        super(Net1, self).__init__()
        self.sequential = nn.Sequential(
            ConvBlock(in_ch, 64, kernel_size=3, stride=1, padding=1),          # [b, 1, 160, 60]
            ConvBlock(64, 64, kernel_size=1, stride=1, padding=0),         # /2
            CBAtten_Res(64, 64, kernel_size=3, reduction=1, stride=2),
            ConvBlock(64, 128, kernel_size=3, stride=1, padding=1),
            DepthConvBlock(128, 128, kernel_size=1, stride=1, padding=0),
            ConvBlock(128, 128, kernel_size=3, stride=1, padding=1),         # /2
            CBAtten_Res(128, 128, kernel_size=3, reduction=1, stride=2),
            ConvBlock(128, 256, kernel_size=1, stride=1, padding=0),
            IRBlock(256, 2),
            IRBlock(256, 2),
            IRBlock(256, 2),
            IRBlock(256, 2),
            ConvBlock(256, 256, kernel_size=1, stride=1, padding=0),
            CBAtten_Res(256, 256, kernel_size=3, reduction=1, stride=2),
            ConvBlock(256, 512, kernel_size=3, stride=1, padding=1),
            DepthConvBlock(512, 512, kernel_size=1, stride=1, padding=0),
            CBAtten_Res(512, 512, kernel_size=3, reduction=1, stride=1),
        )
        self.avg = nn.AdaptiveMaxPool2d((6, 16))        # [b, 512, 16, 6]
        self.linear1 = nn.Linear(96, out_ch)
        self.linear2 = nn.Linear(512, n_classes)
        self.drop = nn.Dropout(0.3)
        self.softmax = nn.Softmax(dim=2)
    def forward(self, x):
        out = self.sequential(x)
        out = self.avg(out)             # [b, 512, 16, 6]
        b, c, h, w = out.size()
        out = out.view((b, c, -1))          # [b, 512, 96]
        out = self.drop(out)
        out = self.linear1(out)              # [b, 4, 10]
        out = torch.transpose(out, 1, 2)
        out = self.linear2(out)
        out = self.softmax(out)
        return out
    def initialize(self):
        for m in self.modules():
            if isinstance(m, nn.Linear):
                nn.init.normal_(m.weight.data)
                nn.init.zeros_(m.bias.data)
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight.data)
                nn.init.zeros_(m.bias.data)

模型參數(shù)量，權(quán)重占比信息：

四、模型訓(xùn)練（trian.py）

Loss：采用交叉熵?fù)p失，對(duì)每個(gè)位置預(yù)測(cè)的字符分別計(jì)算交叉熵，最后求和。

def loss3d(input, target, criteon):
    total_loss = torch.Tensor([0.])
    total_loss = total_loss.to(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
    total_loss = total_loss[0]
    for idx, _ in enumerate(range(len(input))):
        pred = input[idx]
        label = target[idx]
        loss = criteon(pred, label)
        total_loss += loss
    return total_loss / len(input)

訓(xùn)練代碼如下:

def train(net_path, n_classes=62, epochs=50, batch_size=32, lr=1e-4, root_path='data/en'):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    if os.path.exists(net_path):    
        net_dict = torch.load(net_path)
        model = net_dict['model']
        best_acc = net_dict['best_acc']
    else:
        model = Net1(n_classes).to(device)
        best_acc = 0
    char_set = os.path.split(root_path)[-1]
    train_set = CaptchaSet(mode='train', root_path=root_path, char_set=char_set)
    train_laoder = DataLoader(train_set, batch_size=batch_size, shuffle=True)
    val_set = CaptchaSet(mode='val', root_path=root_path, char_set=char_set)
    val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False)
    model = model.to(device)
    criteon = nn.CrossEntropyLoss().to(device)
    optim = torch.optim.Adam(model.parameters(), lr=lr)
    vis = Visdom()
    char_dict = train_set.char_dict
    char_dict = {str(key): value for value, key in char_dict.items()}
    for epoch in tqdm(range(1, epochs+1)):
        train_correct = 0
        train_result = 0
        val_correct = 0
        val_result = 0
        model.train()
        for i, (data, label) in enumerate(train_laoder):
            data, label = data.to(device), label.to(device)
            pred = model(data)
            # pred = pred[0]
            # label = label[0]
            train_loss = loss3d(pred, label, criteon)
            optim.zero_grad()
            train_loss.backward()
            optim.step()
            preds = torch.argmax(pred, dim=2)
            correct, result = calculate(preds, label)
            train_correct += correct
            train_result += result
            if i % 100 == 0:
                print('epoch:%s, step: %s, train_loss: %s' % (epoch, i, train_loss.mean().detach().cpu().item()))
        train_acc = train_correct / train_result
        model.eval()
        for data, label in val_loader:
            data, label = data.to(device), label.to(device)
            pred = model(data)
            val_loss = loss3d(pred, label, criteon)
            preds = torch.argmax(pred, dim=2)
            correct, result = calculate(preds, label)
            val_correct += correct
            val_result += result
        val_acc = val_correct / val_result
        if val_acc > best_acc:
            best_acc = val_acc
            net_dict = {
                'model': model,
                'char_dict': char_dict,
                'best_acc': best_acc,
            }
            torch.save(net_dict, 'best_net.h5')
        print('epoch: %s, train_loss: %s, train_acc: %s, val_loss: %s, val_acc: %s, best_acc: %s' % (epoch,
                                                                                                     train_loss.detach().cpu().item(),
                                                                                                     train_acc,
                                                                                                     val_loss.detach().cpu().item(),
                                                                                                     val_acc,
                                                                                                     best_acc
                                                                                                 ))
        data = data*255
        vis.images(data[:8], win='x')
        pred_text = preds[:8]
        pred_text = [[char_dict[str(char.item())] for char in chars] for chars in pred_text.detach().cpu()]
        label_text = label[:8]
        label_text = [[char_dict[str(char.item())] for char in chars] for chars in label_text.detach().cpu()]
        vis.text(str(pred_text), win='y')
        vis.text(str(label_text), win='true')
        net_dict = {
            'model': model,
            'char_dict': char_dict,
            'best_acc': best_acc,
        }
        torch.save(net_dict, 'net.h5')

經(jīng)過(guò)訓(xùn)練，在大小寫(xiě)識(shí)別錯(cuò)誤也算錯(cuò)誤的情況下，準(zhǔn)確度在百分之90以上，如果忽略大小寫(xiě)，則準(zhǔn)確度會(huì)更高。純數(shù)字驗(yàn)證碼識(shí)別準(zhǔn)確度在百分之98以上。

五、模型應(yīng)用 (predict.py)

python predict.py  -f data/en/0A3s.jpg

識(shí)別結(jié)果:

到此這篇關(guān)于教你如何用Pytorch搭建數(shù)英混合驗(yàn)證碼圖片識(shí)別模型的文章就介紹到這了,更多相關(guān)Pytorch數(shù)英驗(yàn)證碼內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: