Python實(shí)現(xiàn)GIF動(dòng)圖以及視頻卡通化詳解
前言
我繼續(xù)魔改一下,讓該模型可以支持將gif動(dòng)圖或者視頻,也做成卡通化效果。畢竟一張圖可以那就帶邊視頻也可以,沒毛病。所以繼給次元壁來了一拳,我在加兩腳。
項(xiàng)目github地址:github地址
環(huán)境依賴
除了參考文章中的依賴,還需要加一些其他依賴,requirements.txt如下:
其他環(huán)境不太清楚的,可以看我前言鏈接地址的文章,有具體說明。
核心代碼
不廢話了,先上gif代碼。
gif動(dòng)圖卡通化
實(shí)現(xiàn)代碼如下:
#!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2021/12/5 18:10 # @Author : 劍客阿良_ALiang # @Site : # @File : gif_cartoon_tool.py # !/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2021/12/5 0:26 # @Author : 劍客阿良_ALiang # @Site : # @File : video_cartoon_tool.py # !/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2021/12/4 22:34 # @Author : 劍客阿良_ALiang # @Site : # @File : image_cartoon_tool.py from PIL import Image, ImageEnhance, ImageSequence import torch from torchvision.transforms.functional import to_tensor, to_pil_image from torch import nn import os import torch.nn.functional as F import uuid import imageio # -------------------------- hy add 01 -------------------------- class ConvNormLReLU(nn.Sequential): def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False): pad_layer = { "zero": nn.ZeroPad2d, "same": nn.ReplicationPad2d, "reflect": nn.ReflectionPad2d, } if pad_mode not in pad_layer: raise NotImplementedError super(ConvNormLReLU, self).__init__( pad_layer[pad_mode](padding), nn.Conv2d(in_ch, out_ch, kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=bias), nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True), nn.LeakyReLU(0.2, inplace=True) ) class InvertedResBlock(nn.Module): def __init__(self, in_ch, out_ch, expansion_ratio=2): super(InvertedResBlock, self).__init__() self.use_res_connect = in_ch == out_ch bottleneck = int(round(in_ch * expansion_ratio)) layers = [] if expansion_ratio != 1: layers.append(ConvNormLReLU(in_ch, bottleneck, kernel_size=1, padding=0)) # dw layers.append(ConvNormLReLU(bottleneck, bottleneck, groups=bottleneck, bias=True)) # pw layers.append(nn.Conv2d(bottleneck, out_ch, kernel_size=1, padding=0, bias=False)) layers.append(nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True)) self.layers = nn.Sequential(*layers) def forward(self, input): out = self.layers(input) if self.use_res_connect: out = input + out return out class Generator(nn.Module): def __init__(self, ): super().__init__() self.block_a = nn.Sequential( ConvNormLReLU(3, 32, kernel_size=7, padding=3), ConvNormLReLU(32, 64, stride=2, padding=(0, 1, 0, 1)), ConvNormLReLU(64, 64) ) self.block_b = nn.Sequential( ConvNormLReLU(64, 128, stride=2, padding=(0, 1, 0, 1)), ConvNormLReLU(128, 128) ) self.block_c = nn.Sequential( ConvNormLReLU(128, 128), InvertedResBlock(128, 256, 2), InvertedResBlock(256, 256, 2), InvertedResBlock(256, 256, 2), InvertedResBlock(256, 256, 2), ConvNormLReLU(256, 128), ) self.block_d = nn.Sequential( ConvNormLReLU(128, 128), ConvNormLReLU(128, 128) ) self.block_e = nn.Sequential( ConvNormLReLU(128, 64), ConvNormLReLU(64, 64), ConvNormLReLU(64, 32, kernel_size=7, padding=3) ) self.out_layer = nn.Sequential( nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0, bias=False), nn.Tanh() ) def forward(self, input, align_corners=True): out = self.block_a(input) half_size = out.size()[-2:] out = self.block_b(out) out = self.block_c(out) if align_corners: out = F.interpolate(out, half_size, mode="bilinear", align_corners=True) else: out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False) out = self.block_d(out) if align_corners: out = F.interpolate(out, input.size()[-2:], mode="bilinear", align_corners=True) else: out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False) out = self.block_e(out) out = self.out_layer(out) return out # -------------------------- hy add 02 -------------------------- def handle(gif_path: str, output_dir: str, type: int, device='cpu'): _ext = os.path.basename(gif_path).strip().split('.')[-1] if type == 1: _checkpoint = './weights/paprika.pt' elif type == 2: _checkpoint = './weights/face_paint_512_v1.pt' elif type == 3: _checkpoint = './weights/face_paint_512_v2.pt' elif type == 4: _checkpoint = './weights/celeba_distill.pt' else: raise Exception('type not support') os.makedirs(output_dir, exist_ok=True) net = Generator() net.load_state_dict(torch.load(_checkpoint, map_location="cpu")) net.to(device).eval() result = os.path.join(output_dir, '{}.{}'.format(uuid.uuid1().hex, _ext)) img = Image.open(gif_path) out_images = [] for frame in ImageSequence.Iterator(img): frame = frame.convert("RGB") with torch.no_grad(): image = to_tensor(frame).unsqueeze(0) * 2 - 1 out = net(image.to(device), False).cpu() out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5 out = to_pil_image(out) out_images.append(out) # out_images[0].save(result, save_all=True, loop=True, append_images=out_images[1:], duration=100) imageio.mimsave(result, out_images, fps=15) return result if __name__ == '__main__': print(handle('samples/gif/128.gif', 'samples/gif_result/', 3, 'cuda'))
代碼說明:
1、主要的handle方法入?yún)⒎謩e為:gif地址、輸出目錄、類型、設(shè)備使用(默認(rèn)cpu,可選cuda使用顯卡)。
2、類型主要是選擇模型,最好用3,人像處理更生動(dòng)一些。
執(zhí)行驗(yàn)證一下
下面是我準(zhǔn)備的gif素材
執(zhí)行結(jié)果如下
看一下效果
哈哈,有點(diǎn)意思哦。
視頻卡通化
實(shí)現(xiàn)代碼如下:
#!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2021/12/5 0:26 # @Author : 劍客阿良_ALiang # @Site : # @File : video_cartoon_tool.py # !/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2021/12/4 22:34 # @Author : 劍客阿良_ALiang # @Site : # @File : image_cartoon_tool.py from PIL import Image, ImageEnhance import torch from torchvision.transforms.functional import to_tensor, to_pil_image from torch import nn import os import torch.nn.functional as F import uuid import cv2 import numpy as np import time from ffmpy import FFmpeg # -------------------------- hy add 01 -------------------------- class ConvNormLReLU(nn.Sequential): def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False): pad_layer = { "zero": nn.ZeroPad2d, "same": nn.ReplicationPad2d, "reflect": nn.ReflectionPad2d, } if pad_mode not in pad_layer: raise NotImplementedError super(ConvNormLReLU, self).__init__( pad_layer[pad_mode](padding), nn.Conv2d(in_ch, out_ch, kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=bias), nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True), nn.LeakyReLU(0.2, inplace=True) ) class InvertedResBlock(nn.Module): def __init__(self, in_ch, out_ch, expansion_ratio=2): super(InvertedResBlock, self).__init__() self.use_res_connect = in_ch == out_ch bottleneck = int(round(in_ch * expansion_ratio)) layers = [] if expansion_ratio != 1: layers.append(ConvNormLReLU(in_ch, bottleneck, kernel_size=1, padding=0)) # dw layers.append(ConvNormLReLU(bottleneck, bottleneck, groups=bottleneck, bias=True)) # pw layers.append(nn.Conv2d(bottleneck, out_ch, kernel_size=1, padding=0, bias=False)) layers.append(nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True)) self.layers = nn.Sequential(*layers) def forward(self, input): out = self.layers(input) if self.use_res_connect: out = input + out return out class Generator(nn.Module): def __init__(self, ): super().__init__() self.block_a = nn.Sequential( ConvNormLReLU(3, 32, kernel_size=7, padding=3), ConvNormLReLU(32, 64, stride=2, padding=(0, 1, 0, 1)), ConvNormLReLU(64, 64) ) self.block_b = nn.Sequential( ConvNormLReLU(64, 128, stride=2, padding=(0, 1, 0, 1)), ConvNormLReLU(128, 128) ) self.block_c = nn.Sequential( ConvNormLReLU(128, 128), InvertedResBlock(128, 256, 2), InvertedResBlock(256, 256, 2), InvertedResBlock(256, 256, 2), InvertedResBlock(256, 256, 2), ConvNormLReLU(256, 128), ) self.block_d = nn.Sequential( ConvNormLReLU(128, 128), ConvNormLReLU(128, 128) ) self.block_e = nn.Sequential( ConvNormLReLU(128, 64), ConvNormLReLU(64, 64), ConvNormLReLU(64, 32, kernel_size=7, padding=3) ) self.out_layer = nn.Sequential( nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0, bias=False), nn.Tanh() ) def forward(self, input, align_corners=True): out = self.block_a(input) half_size = out.size()[-2:] out = self.block_b(out) out = self.block_c(out) if align_corners: out = F.interpolate(out, half_size, mode="bilinear", align_corners=True) else: out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False) out = self.block_d(out) if align_corners: out = F.interpolate(out, input.size()[-2:], mode="bilinear", align_corners=True) else: out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False) out = self.block_e(out) out = self.out_layer(out) return out # -------------------------- hy add 02 -------------------------- def handle(video_path: str, output_dir: str, type: int, fps: int, device='cpu'): _ext = os.path.basename(video_path).strip().split('.')[-1] if type == 1: _checkpoint = './weights/paprika.pt' elif type == 2: _checkpoint = './weights/face_paint_512_v1.pt' elif type == 3: _checkpoint = './weights/face_paint_512_v2.pt' elif type == 4: _checkpoint = './weights/celeba_distill.pt' else: raise Exception('type not support') os.makedirs(output_dir, exist_ok=True) # 獲取視頻音頻 _audio = extract(video_path, output_dir, 'wav') net = Generator() net.load_state_dict(torch.load(_checkpoint, map_location="cpu")) net.to(device).eval() result = os.path.join(output_dir, '{}.{}'.format(uuid.uuid1().hex, _ext)) capture = cv2.VideoCapture(video_path) size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))) print(size) videoWriter = cv2.VideoWriter(result, cv2.VideoWriter_fourcc(*'mp4v'), fps, size) cul = 0 with torch.no_grad(): while True: ret, frame = capture.read() if ret: print(ret) image = to_tensor(frame).unsqueeze(0) * 2 - 1 out = net(image.to(device), False).cpu() out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5 out = to_pil_image(out) contrast_enhancer = ImageEnhance.Contrast(out) img_enhanced_image = contrast_enhancer.enhance(2) enhanced_image = np.asarray(img_enhanced_image) videoWriter.write(enhanced_image) cul += 1 print('第{}張圖'.format(cul)) else: break videoWriter.release() # 視頻添加原音頻 _final_video = video_add_audio(result, _audio, output_dir) return _final_video # -------------------------- hy add 03 -------------------------- def extract(video_path: str, tmp_dir: str, ext: str): file_name = '.'.join(os.path.basename(video_path).split('.')[0:-1]) print('文件名:{},提取音頻'.format(file_name)) if ext == 'mp3': return _run_ffmpeg(video_path, os.path.join(tmp_dir, '{}.{}'.format(uuid.uuid1().hex, ext)), 'mp3') if ext == 'wav': return _run_ffmpeg(video_path, os.path.join(tmp_dir, '{}.{}'.format(uuid.uuid1().hex, ext)), 'wav') def _run_ffmpeg(video_path: str, audio_path: str, format: str): ff = FFmpeg(inputs={video_path: None}, outputs={audio_path: '-f {} -vn'.format(format)}) print(ff.cmd) ff.run() return audio_path # 視頻添加音頻 def video_add_audio(video_path: str, audio_path: str, output_dir: str): _ext_video = os.path.basename(video_path).strip().split('.')[-1] _ext_audio = os.path.basename(audio_path).strip().split('.')[-1] if _ext_audio not in ['mp3', 'wav']: raise Exception('audio format not support') _codec = 'copy' if _ext_audio == 'wav': _codec = 'aac' result = os.path.join( output_dir, '{}.{}'.format( uuid.uuid4(), _ext_video)) ff = FFmpeg( inputs={video_path: None, audio_path: None}, outputs={result: '-map 0:v -map 1:a -c:v copy -c:a {} -shortest'.format(_codec)}) print(ff.cmd) ff.run() return result if __name__ == '__main__': print(handle('samples/video/981.mp4', 'samples/video_result/', 3, 25, 'cuda'))
代碼說明
1、主要的實(shí)現(xiàn)方法入?yún)⒎謩e為:視頻地址、輸出目錄、類型、fps(幀率)、設(shè)備類型(默認(rèn)cpu,可選擇cuda顯卡模式)。
2、類型主要是選擇模型,最好用3,人像處理更生動(dòng)一些。
3、代碼設(shè)計(jì)思路:先將視頻音頻提取出來、將視頻逐幀處理后寫入新視頻、新視頻和原視頻音頻融合。
關(guān)于如何視頻提取音頻可以參考我的另一篇文章:python 提取視頻中的音頻
關(guān)于如何視頻融合音頻可以參考我的另一篇文章:Python 視頻添加音頻
4、視頻中間會(huì)產(chǎn)生臨時(shí)文件,沒有清理,如需要可以修改代碼自行清理。
驗(yàn)證一下
下面是我準(zhǔn)備的視頻素材截圖,我會(huì)上傳到github上。
執(zhí)行結(jié)果
看看效果截圖
還是很不錯(cuò)的哦。
總結(jié)
這次可不是沒什么好總結(jié)的,總結(jié)的東西蠻多的。首先我說一下這個(gè)開源項(xiàng)目目前模型的一些問題。
1、我測(cè)試了不少圖片,總的來說對(duì)亞洲人的臉型不能很好的卡通化,但是歐美的臉型都比較好。所以還是訓(xùn)練的數(shù)據(jù)不是很夠,但是能理解,畢竟要專門做卡通化的標(biāo)注數(shù)據(jù)想想就是蠻頭疼的事。所以我建議大家在使用的時(shí)候,多關(guān)注一下項(xiàng)目是否更新了最新的模型。
2、視頻一但有字幕,會(huì)對(duì)字幕也做處理。所以可以考慮找一些視頻和字幕分開的素材,效果會(huì)更好一些。
以上就是Python實(shí)現(xiàn)GIF動(dòng)圖以及視頻卡通化詳解的詳細(xì)內(nèi)容,更多關(guān)于Python 動(dòng)圖 視頻卡通化的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
使用Python對(duì)Dicom文件進(jìn)行讀取與寫入的實(shí)現(xiàn)
這篇文章主要介紹了使用Python對(duì)Dicom文件進(jìn)行讀取與寫入的實(shí)現(xiàn),文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2020-04-04Python將list中的string批量轉(zhuǎn)化成int/float的方法
今天小編就為大家分享一篇Python將list中的string批量轉(zhuǎn)化成int/float的方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2018-06-06python讀取excel表格生成erlang數(shù)據(jù)
這篇文章主要為大家詳細(xì)介紹了python讀取excel表格生成erlang數(shù)據(jù),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2017-08-08anaconda虛擬環(huán)境默認(rèn)路徑的更改圖文教程
在Anaconda中如果沒有指定路徑,虛擬環(huán)境會(huì)默認(rèn)安裝在anaconda所安裝的目錄下,這篇文章主要給大家介紹了關(guān)于anaconda虛擬環(huán)境默認(rèn)路徑更改的相關(guān)資料,需要的朋友可以參考下2023-10-10python將多個(gè)py文件和其他文件打包為exe可執(zhí)行文件
這篇文章主要介紹了python將多個(gè)py文件和其他文件打包為exe可執(zhí)行文件,通過準(zhǔn)備要打包的工程文件展開詳情,需要的小伙伴可以參考一下2022-05-05Pyinstaller打包報(bào)錯(cuò)小結(jié)
本文主要介紹了Pyinstaller打包報(bào)錯(cuò)小結(jié),詳細(xì)的介紹了5種錯(cuò)誤的解決方法,文中通過示例代碼介紹的非常詳細(xì),需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2024-02-02