欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

教你如何用Pytorch搭建數(shù)英混合驗證碼圖片識別模型

 更新時間:2024年04月19日 12:37:40   作者:西北一條蟲  
大家都知道checkpoints存放的是模型文件,data存放的是數(shù)據集,本文給大家分享如何利用Pytorch搭建數(shù)英混合驗證碼圖片識別模型包括普通卷積模塊,深度可分離卷積模塊,空間通道注意力模塊,殘差模塊,感興趣的朋友跟隨小編一起看看吧

項目結構如下

checkpoints存放的是模型文件,data存放的是數(shù)據集

一、數(shù)據集生成(create_data.py)

利用captcha模塊,大小寫26位字母和0-9十個數(shù)字共62個字符,以每個字符為開頭、后三位字符隨機選取的方式生成500張圖片,一共大約62*500張圖片數(shù)據集。

import os
import random
import sys
from captcha.image import ImageCaptcha
from tqdm import tqdm
# 用于生成驗證碼的字符集
content_eng = '0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm'
content_numb = '0123456789'
char_set_eng = list(content_eng)
char_set_numb = list(content_numb)
# 驗證碼的長度,每個驗證碼由4個數(shù)字組成
CAPTCHA_LEN = 4
# 驗證碼圖片的存放路徑
CAPTCHA_IMAGE_PATH = 'data/numb'
CAPTCHA_IMAGE_ENG_PATH = 'data/en'
def create_captcha(captcha_text, path):
    image = ImageCaptcha()
    img = image.generate_image(captcha_text)
    ImageCaptcha.create_noise_dots(img, color='yellow', width=3, number=30)
    ImageCaptcha.create_noise_curve(img, color='blue')
    img.save(path)
# 生成英文和數(shù)字驗證碼圖片
def generate_en_captcha_image(charSet=char_set_eng, captchaImgPath=CAPTCHA_IMAGE_ENG_PATH, numbs=500):
    k = 0
    total = 1
    char_list = list(charSet)
    char_dict = dict(zip(range(len(char_list)), char_list))
    charSetLen = len(charSet)
    if not os.path.exists(captchaImgPath):
        os.makedirs(captchaImgPath)
    for i in range(charSetLen):
        total += numbs
    for i in tqdm(range(charSetLen)):
        for _ in range(numbs):
            chars = random.choices(char_list, k=3)
            captcha_text = str(char_list[i]) + ''.join(chars)
            file_path = captchaImgPath + captcha_text + '.jpg'
            try:
                create_captcha(captcha_text, file_path)
            except:
                pass
            k += 1

二、數(shù)據預處理 (utils.py)

讀取圖片并灰度化,將圖片長寬統(tǒng)一成 [60, 160],并進行數(shù)據增強

class CaptchaSet(Dataset):
    def __init__(self, mode='train', root_path='data/en', split_size=0.8, size=[60, 160], seed=666, char_set='en'):
        super(CaptchaSet, self).__init__()
        self.paths = os.listdir(root_path)
        random.seed(seed)
        random.shuffle(self.paths)
        self.images = [os.path.join(root_path, img) for img in self.paths]
        self.labels = [img.split('.')[0] for img in self.paths]
        if char_set == 'en':
            chars = '0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm'
            self.char_list = list(chars)
        if char_set == 'numb':
            chars = '0123456789'
            self.char_list = list(chars)
        self.char_dict = dict(zip(self.char_list, range(len(self.char_list))))
        idxs = int(len(self.images)*split_size)
        if mode == 'train':
            self.images = self.images[:idxs]
            self.labels = self.labels[:idxs]
        if mode == 'val':
            self.images = self.images[idxs:]
            self.labels = self.labels[idxs:]
        self.transform = transforms.Compose([
            lambda x: Image.open(x).convert('RGB'),
            transforms.Grayscale(),
            transforms.RandomRotation(0.1),
            transforms.RandomAffine(0.1),
            transforms.Resize(size),
            transforms.ToTensor(),
        ])
    def __getitem__(self, idx):
        img = self.images[idx]
        img = self.transform(img)
        label = self.labels[idx]
        label = [int(self.char_dict[i]) for i in label]
        # label = [int(i) for i in list(label)]
        label = torch.Tensor(label).long()
        return img, label
    def __len__(self):
        return len(self.images)

三、模型搭建 (models.py)

數(shù)據經過模型的輸入輸出形狀如下

數(shù)據輸入維度:[batchsize, 1, h, w] # h, w 代表圖片的長和寬

數(shù)據輸出維度:[batchsize, 4, n_classes] # n_classes 代碼字符類別數(shù)量

模型中構造了普通卷積模塊,深度可分離卷積模塊,空間通道注意力模塊,殘差模塊。

利用空間通道注意力學習字符的分布位置,最后直接輸出每個字符的類別。

各個模塊代碼如下:

1)普通卷積模塊

class ConvBlock(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1):
        super(ConvBlock, self).__init__()
        self.sequential = nn.Sequential(
            nn.Conv2d(
                in_channels=in_ch,
                out_channels=out_ch,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding),
            nn.InstanceNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )
    def forward(self, x):
        x = self.sequential(x)
        return x

2) 深度可分離卷積模塊

class DepthConv(nn.Module):
    def __init__(self, in_ch, kernel_size=3, stride=1, padding=1):
        super(DepthConv, self).__init__()
        self.depth_conv = nn.Conv2d(in_ch,
                                    in_ch,
                                    kernel_size,
                                    stride,
                                    padding,
                                    groups=in_ch,
                                    )
    def forward(self, x):
        x = self.depth_conv(x)
        return x
class DepthConvBlock(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1):
        super(DepthConvBlock, self).__init__()
        self.depth = DepthConv(in_ch,
                               kernel_size=kernel_size,
                               stride=stride,
                               padding=padding)
        self.sequential = nn.Sequential(
            nn.Conv2d(in_channels=in_ch,
                      out_channels=out_ch,
                      kernel_size=1,
                      stride=1,
                      padding=0),
            nn.InstanceNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )
    def forward(self, x):
        x = self.depth(x)
        x = self.sequential(x)
        return x

3) 空間通道注意力模塊:

class ChannelAttention(nn.Module):
    '''
    func: 實現(xiàn)通道Attention.
    parameters:
        in_channels: input的通道數(shù), input.size = (batch,channel,w,h) if batch_first else (channel,batch,,w,h)
        reduction: 默認4. 即在FC的時,存在in_channels --> in_channels//reduction --> in_channels的轉換
        batch_first: 默認True.如input為channel_first,則batch_first = False
    '''
    def __init__(self, in_channels, reduction=4, batch_first=True):
        super(ChannelAttention, self).__init__()
        self.batch_first = batch_first
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.sharedMLP = nn.Sequential(
            nn.Conv2d(in_channels, in_channels // reduction, kernel_size=1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels // reduction, in_channels, kernel_size=1, bias=False),
        )
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        if not self.batch_first:
            x = x.permute(1, 0, 2, 3)
        avgout = self.sharedMLP(self.avg_pool(x)) #size = (batch,in_channels,1,1)
        maxout = self.sharedMLP(self.max_pool(x)) #size = (batch,in_channels,1,1)
        w = self.sigmoid(avgout + maxout) #通道權重  size = (batch,in_channels,1,1)
        out = x * w.expand_as(x) #返回通道注意力后的值 size = (batch,in_channels,w,h)
        if not self.batch_first:
            out = out.permute(1, 0, 2, 3) #size = (channel,batch,w,h)
        return out
class SpatialAttention(nn.Module):
    '''
    func: 實現(xiàn)空間Attention.
    parameters:
        kernel_size: 卷積核大小, 可選3,5,7,
        batch_first: 默認True.如input為channel_first,則batch_first = False
    '''
    def __init__(self, kernel_size=3, batch_first = True):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3, 5, 7), "kernel size must be 3 or 7"
        padding = kernel_size // 2
        self.batch_first = batch_first
        self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        if not self.batch_first:
            x = x.permute(1, 0, 2, 3)  #size = (batch,channels,w,h)
        avgout = torch.mean(x, dim=1, keepdim=True) #size = (batch,1,w,h)
        maxout, _ = torch.max(x, dim=1, keepdim=True)  #size = (batch,1,w,h)
        x1 = torch.cat([avgout, maxout], dim=1)    #size = (batch,2,w,h)
        x1 = self.conv(x1)    #size = (batch,1,w,h)
        w = self.sigmoid(x1)   #size = (batch,1,w,h)
        out = x * w            #size = (batch,channels,w,h)
        if not self.batch_first:
            out = out.permute(1, 0, 2, 3) #size = (channels,batch,w,h)
        return out
class CBAtten_Res(nn.Module):
    '''
    func:channel attention + spatial attention + resnet
    parameters:
        in_channels: input的通道數(shù), input.size = (batch,in_channels,w,h) if batch_first else (in_channels,batch,,w,h);
        out_channels: 輸出的通道數(shù)
        kernel_size: 默認3, 可選[3,5,7]
        stride: 默認2, 即改變out.size --> (batch,out_channels,w/stride, h/stride).
                一般情況下,out_channels = in_channels * stride
        reduction: 默認4. 即在通道atten的FC的時,存在in_channels --> in_channels//reduction --> in_channels的轉換
        batch_first:默認True.如input為channel_first,則batch_first = False
    '''
    def __init__(self, in_channels, out_channels, kernel_size=3,
                 stride=2, reduction=4, batch_first=True):
        super(CBAtten_Res, self).__init__()
        self.batch_first = batch_first
        self.reduction = reduction
        self.padding = kernel_size // 2
        #h/2, w/2
        self.max_pool = nn.MaxPool2d(3, stride=stride, padding=self.padding)
        self.conv_res = nn.Conv2d(in_channels, out_channels,
                                  kernel_size=1,
                                  stride=1,
                                  bias=True)
        #h/2, w/2
        self.conv1 = nn.Conv2d(in_channels, out_channels,
                               kernel_size=kernel_size,
                               stride=stride,
                               padding=self.padding,
                               bias=True)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.ca = ChannelAttention(out_channels, reduction=self.reduction,
                                   batch_first=self.batch_first)
        self.sa = SpatialAttention(kernel_size=kernel_size,
                                   batch_first=self.batch_first)
    def forward(self, x):
        if not self.batch_first:
            x = x.permute(1, 0, 2, 3)  #size = (batch,in_channels,w,h)
        residual = x
        out = self.conv1(x)   #size = (batch,out_channels,w/stride,h/stride)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.ca(out)
        out = self.sa(out)  #size = (batch,out_channels,w/stride,h/stride)
        residual = self.max_pool(residual)  #size = (batch,in_channels,w/stride,h/stride)
        residual = self.conv_res(residual)  #size = (batch,out_channels,w/stride,h/stride)
        out += residual                       #殘差
        out = self.relu(out)                    #size = (batch,out_channels,w/stride,h/stride)
        if not self.batch_first:
            out = out.permute(1, 0, 2, 3)       #size = (out_channels,batch,w/stride,h/stride)
        return out

4) 殘差模塊

class IRBlock(nn.Module):
    """
    IRB殘差塊: ConvBlock, DepthWiseConv, InstanceNorm2d, LeakyReLU, Conv2d, InstanceNorm2d
    rate: 輸入通道數(shù)乘以rate,要變換的通道數(shù)
    輸入與輸出維度保持不變
    """
    def __init__(self, in_ch, rate=2, kernel_size=1, stride=1, padding=0):
        super(IRBlock, self).__init__()
        res_ch = in_ch * rate
        self.conv1 = ConvBlock(in_ch, res_ch, kernel_size=kernel_size, stride=stride, padding=padding)
        self.dw1 = DepthConv(res_ch)
        self.sequential = nn.Sequential(
            nn.InstanceNorm2d(res_ch),
            nn.LeakyReLU(),
            nn.Conv2d(res_ch, in_ch, kernel_size=1, stride=1, padding=0),
            nn.InstanceNorm2d(in_ch)
        )
        self.down_conv = False
        if stride > 1:
            self.down_conv = nn.Conv2d(in_ch, in_ch, kernel_size=kernel_size, stride=stride, padding=padding)
    def forward(self, x):
        out = self.conv1(x)
        out = self.dw1(out)
        if self.down_conv:
            x = self.down_conv(x)
        out = self.sequential(out) + x
        return out

5)利用各個模塊搭建模型

class Net1(nn.Module):
    def __init__(self, in_ch=1, out_ch=4, n_classes=10):
        super(Net1, self).__init__()
        self.sequential = nn.Sequential(
            ConvBlock(in_ch, 64, kernel_size=3, stride=1, padding=1),          # [b, 1, 160, 60]
            ConvBlock(64, 64, kernel_size=1, stride=1, padding=0),         # /2
            CBAtten_Res(64, 64, kernel_size=3, reduction=1, stride=2),
            ConvBlock(64, 128, kernel_size=3, stride=1, padding=1),
            DepthConvBlock(128, 128, kernel_size=1, stride=1, padding=0),
            ConvBlock(128, 128, kernel_size=3, stride=1, padding=1),         # /2
            CBAtten_Res(128, 128, kernel_size=3, reduction=1, stride=2),
            ConvBlock(128, 256, kernel_size=1, stride=1, padding=0),
            IRBlock(256, 2),
            IRBlock(256, 2),
            IRBlock(256, 2),
            IRBlock(256, 2),
            ConvBlock(256, 256, kernel_size=1, stride=1, padding=0),
            CBAtten_Res(256, 256, kernel_size=3, reduction=1, stride=2),
            ConvBlock(256, 512, kernel_size=3, stride=1, padding=1),
            DepthConvBlock(512, 512, kernel_size=1, stride=1, padding=0),
            CBAtten_Res(512, 512, kernel_size=3, reduction=1, stride=1),
        )
        self.avg = nn.AdaptiveMaxPool2d((6, 16))        # [b, 512, 16, 6]
        self.linear1 = nn.Linear(96, out_ch)
        self.linear2 = nn.Linear(512, n_classes)
        self.drop = nn.Dropout(0.3)
        self.softmax = nn.Softmax(dim=2)
    def forward(self, x):
        out = self.sequential(x)
        out = self.avg(out)             # [b, 512, 16, 6]
        b, c, h, w = out.size()
        out = out.view((b, c, -1))          # [b, 512, 96]
        out = self.drop(out)
        out = self.linear1(out)              # [b, 4, 10]
        out = torch.transpose(out, 1, 2)
        out = self.linear2(out)
        out = self.softmax(out)
        return out
    def initialize(self):
        for m in self.modules():
            if isinstance(m, nn.Linear):
                nn.init.normal_(m.weight.data)
                nn.init.zeros_(m.bias.data)
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight.data)
                nn.init.zeros_(m.bias.data)

模型參數(shù)量,權重占比信息:

四、模型訓練 (trian.py)

Loss:采用交叉熵損失,對每個位置預測的字符分別計算交叉熵,最后求和。

def loss3d(input, target, criteon):
    total_loss = torch.Tensor([0.])
    total_loss = total_loss.to(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
    total_loss = total_loss[0]
    for idx, _ in enumerate(range(len(input))):
        pred = input[idx]
        label = target[idx]
        loss = criteon(pred, label)
        total_loss += loss
    return total_loss / len(input)

訓練代碼如下:

def train(net_path, n_classes=62, epochs=50, batch_size=32, lr=1e-4, root_path='data/en'):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    if os.path.exists(net_path):    
        net_dict = torch.load(net_path)
        model = net_dict['model']
        best_acc = net_dict['best_acc']
    else:
        model = Net1(n_classes).to(device)
        best_acc = 0
    char_set = os.path.split(root_path)[-1]
    train_set = CaptchaSet(mode='train', root_path=root_path, char_set=char_set)
    train_laoder = DataLoader(train_set, batch_size=batch_size, shuffle=True)
    val_set = CaptchaSet(mode='val', root_path=root_path, char_set=char_set)
    val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False)
    model = model.to(device)
    criteon = nn.CrossEntropyLoss().to(device)
    optim = torch.optim.Adam(model.parameters(), lr=lr)
    vis = Visdom()
    char_dict = train_set.char_dict
    char_dict = {str(key): value for value, key in char_dict.items()}
    for epoch in tqdm(range(1, epochs+1)):
        train_correct = 0
        train_result = 0
        val_correct = 0
        val_result = 0
        model.train()
        for i, (data, label) in enumerate(train_laoder):
            data, label = data.to(device), label.to(device)
            pred = model(data)
            # pred = pred[0]
            # label = label[0]
            train_loss = loss3d(pred, label, criteon)
            optim.zero_grad()
            train_loss.backward()
            optim.step()
            preds = torch.argmax(pred, dim=2)
            correct, result = calculate(preds, label)
            train_correct += correct
            train_result += result
            if i % 100 == 0:
                print('epoch:%s, step: %s, train_loss: %s' % (epoch, i, train_loss.mean().detach().cpu().item()))
        train_acc = train_correct / train_result
        model.eval()
        for data, label in val_loader:
            data, label = data.to(device), label.to(device)
            pred = model(data)
            val_loss = loss3d(pred, label, criteon)
            preds = torch.argmax(pred, dim=2)
            correct, result = calculate(preds, label)
            val_correct += correct
            val_result += result
        val_acc = val_correct / val_result
        if val_acc > best_acc:
            best_acc = val_acc
            net_dict = {
                'model': model,
                'char_dict': char_dict,
                'best_acc': best_acc,
            }
            torch.save(net_dict, 'best_net.h5')
        print('epoch: %s, train_loss: %s, train_acc: %s, val_loss: %s, val_acc: %s, best_acc: %s' % (epoch,
                                                                                                     train_loss.detach().cpu().item(),
                                                                                                     train_acc,
                                                                                                     val_loss.detach().cpu().item(),
                                                                                                     val_acc,
                                                                                                     best_acc
                                                                                                 ))
        data = data*255
        vis.images(data[:8], win='x')
        pred_text = preds[:8]
        pred_text = [[char_dict[str(char.item())] for char in chars] for chars in pred_text.detach().cpu()]
        label_text = label[:8]
        label_text = [[char_dict[str(char.item())] for char in chars] for chars in label_text.detach().cpu()]
        vis.text(str(pred_text), win='y')
        vis.text(str(label_text), win='true')
        net_dict = {
            'model': model,
            'char_dict': char_dict,
            'best_acc': best_acc,
        }
        torch.save(net_dict, 'net.h5')

經過訓練,在大小寫識別錯誤也算錯誤的情況下,準確度在百分之90以上,如果忽略大小寫,則準確度會更高。純數(shù)字驗證碼識別準確度在百分之98以上。

五、模型應用 (predict.py)

python predict.py  -f data/en/0A3s.jpg

識別結果:

到此這篇關于教你如何用Pytorch搭建數(shù)英混合驗證碼圖片識別模型的文章就介紹到這了,更多相關Pytorch數(shù)英驗證碼內容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家!

相關文章

  • docker-py 用Python調用Docker接口的方法

    docker-py 用Python調用Docker接口的方法

    今天小編就為大家分享一篇docker-py 用Python調用Docker接口的方法,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧
    2019-08-08
  • python requests模塊的使用示例

    python requests模塊的使用示例

    這篇文章主要介紹了python requests模塊的使用解析,幫助大家更好的理解和學習使用python,感興趣的朋友可以了解下
    2021-04-04
  • 詳解Python?NumPy如何使用argsort方法進行排序

    詳解Python?NumPy如何使用argsort方法進行排序

    NumPy提供了各種功能強大的數(shù)組操作方法,其中之一就是argsort方法,本文將詳細介紹argsort方法的使用,以及如何在實際項目中充分利用它進行排序操作,希望對大家有所幫助
    2024-03-03
  • 淺談python處理json和redis hash的坑

    淺談python處理json和redis hash的坑

    這篇文章主要介紹了淺談python處理json和redis hash的坑,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧
    2020-07-07
  • 對pytorch網絡層結構的數(shù)組化詳解

    對pytorch網絡層結構的數(shù)組化詳解

    今天小編就為大家分享一篇對pytorch網絡層結構的數(shù)組化詳解,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧
    2018-12-12
  • np.hstack()和np.dstack()的使用

    np.hstack()和np.dstack()的使用

    本文主要介紹了np.hstack()和np.dstack()的使用,文中通過示例代碼介紹的非常詳細,對大家的學習或者工作具有一定的參考學習價值,需要的朋友們下面隨著小編來一起學習學習吧
    2023-03-03
  • python通過加號運算符操作列表的方法

    python通過加號運算符操作列表的方法

    這篇文章主要介紹了python通過加號運算符操作列表的方法,實例分析了Python使用加號運算符實現(xiàn)列表追加的方法,需要的朋友可以參考下
    2015-07-07
  • django框架使用views.py的函數(shù)對表進行增刪改查內容操作詳解【models.py中表的創(chuàng)建、views.py中函數(shù)的使用,基于對象的跨表查詢】

    django框架使用views.py的函數(shù)對表進行增刪改查內容操作詳解【models.py中表的創(chuàng)建、views.py中

    這篇文章主要介紹了django框架使用views.py的函數(shù)對表進行增刪改查內容操作,結合實例形式詳細分析了models.py中表的創(chuàng)建、views.py中函數(shù)的使用,基于對象的跨表查詢等相關操作技巧與使用注意事項,需要的朋友可以參考下
    2019-12-12
  • Python中的高級數(shù)據結構詳解

    Python中的高級數(shù)據結構詳解

    這篇文章主要介紹了Python中的高級數(shù)據結構詳解,本文講解了Collection、Array、Heapq、Bisect、Weakref、Copy以及Pprint這些數(shù)據結構的用法,需要的朋友可以參考下
    2015-03-03
  • pygame實現(xiàn)成語填空游戲

    pygame實現(xiàn)成語填空游戲

    這篇文章主要介紹了pygame實現(xiàn)成語填空游戲,文中示例代碼介紹的非常詳細,具有一定的參考價值,感興趣的小伙伴們可以參考一下
    2019-10-10

最新評論