用python制作詞云視頻詳解

更新時間：2021年04月19日 17:33:19 作者：黑白的黒

這篇文章主要介紹了用python制作詞云視頻詳解，原理解釋清晰,代碼詳細(xì),用于練習(xí)很適合,需要的朋友可以參考下

使用到的第三方庫

Package         Version
--------------- ---------
baidu-aip       2.2.18.0
jieba           0.42.1
moviepy         1.0.3
numpy           1.20.2
opencv-python   4.5.1.48
Pillow          8.2.0
requests        2.25.1
wordcloud       1.8.1
you-get         0.4.1520

B站彈幕爬取

思路

通過視頻BV號請求cid，再使用cid請求彈幕文件，最后使用正則表達(dá)式去匹配彈幕文本，將匹配出來的結(jié)果保存在本地供之后使用，代碼及思路比較簡單，就不做過多贅述

實(shí)現(xiàn)

cid請求鏈接：https://api.bilibili.com/x/web-interface/view?bvid=

彈幕請求鏈接：https://api.bilibili.com/x/v1/dm/list.so?oid=

參考代碼

    def get_cid(cls, bv):
        url = "https://api.bilibili.com/x/web-interface/view?bvid=" + str(bv)
        response = requests.get(url)
        dirt = json.loads(response.text)
        aid = dirt['data']['cid']
        return str(aid)
    def get_barrage(cls, bv, to_file_path):
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36",
        }
        cid = cls.get_cid(bv)
        response = requests.get("https://api.bilibili.com/x/v1/dm/list.so?oid=" + cid, headers=headers)
        html_doc = response.content.decode('utf-8')
        regex = re.compile("<d.*?>(.*?)</d>")
        DanMu = regex.findall(html_doc)
        with open(to_file_path, "w", encoding="utf_8")as f:
            for i in DanMu:
                f.write(i)
                f.write("\n")

視頻下載

思路

使用第三方開源庫you-get進(jìn)行下載

獲取視頻信息

視頻下載

you-get支持的音視頻網(wǎng)站

Site	URL	Videos?	Images?	Audios?
YouTube	https://www.youtube.com/	✓
Twitter	https://twitter.com/	✓	✓
VK	http://vk.com/	✓	✓
Vine	https://vine.co/	✓
Vimeo	https://vimeo.com/	✓
Veoh	http://www.veoh.com/	✓
Tumblr	https://www.tumblr.com/	✓	✓	✓
TED	http://www.ted.com/	✓
SoundCloud	https://soundcloud.com/			✓
SHOWROOM	https://www.showroom-live.com/	✓
Pinterest	https://www.pinterest.com/		✓
MTV81	http://www.mtv81.com/	✓
Mixcloud	https://www.mixcloud.com/			✓
Metacafe	http://www.metacafe.com/	✓
Magisto	http://www.magisto.com/	✓
Khan Academy	https://www.khanacademy.org/	✓
Internet Archive	https://archive.org/	✓
Instagram	https://instagram.com/	✓	✓
InfoQ	http://www.infoq.com/presentations/	✓
Imgur	http://imgur.com/		✓
Heavy Music Archive	http://www.heavy-music.ru/			✓
Freesound	http://www.freesound.org/			✓
Flickr	https://www.flickr.com/	✓	✓
FC2 Video	http://video.fc2.com/	✓
Facebook	https://www.facebook.com/	✓
eHow	http://www.ehow.com/	✓
Dailymotion	http://www.dailymotion.com/	✓
Coub	http://coub.com/	✓
CBS	http://www.cbs.com/	✓
Bandcamp	http://bandcamp.com/			✓
AliveThai	http://alive.in.th/	✓
interest.me	http://ch.interest.me/tvn	✓
755 ナナゴーゴー	http://7gogo.jp/	✓	✓
niconico ニコニコ動畫	http://www.nicovideo.jp/	✓
163 網(wǎng)易視頻網(wǎng)易云音樂	http://v.163.com/ http://music.163.com/	✓		✓
56網(wǎng)	http://www.56.com/	✓
AcFun	http://www.acfun.cn/	✓
Baidu 百度貼吧	http://tieba.baidu.com/	✓	✓
爆米花網(wǎng)	http://www.baomihua.com/	✓
bilibili 嗶哩嗶哩	http://www.bilibili.com/	✓	✓	✓
豆瓣	http://www.douban.com/	✓		✓
斗魚	http://www.douyutv.com/	✓
鳳凰視頻	http://v.ifeng.com/	✓
風(fēng)行網(wǎng)	http://www.fun.tv/	✓
iQIYI 愛奇藝	http://www.iqiyi.com/	✓
激動網(wǎng)	http://www.joy.cn/	✓
酷6網(wǎng)	http://www.ku6.com/	✓
酷狗音樂	http://www.kugou.com/			✓
酷我音樂	http://www.kuwo.cn/			✓
樂視網(wǎng)	http://www.le.com/	✓
荔枝FM	http://www.lizhi.fm/			✓
懶人聽書	http://www.lrts.me/			✓
秒拍	http://www.miaopai.com/	✓
MioMio彈幕網(wǎng)	http://www.miomio.tv/	✓
MissEvan 貓耳FM	http://www.missevan.com/			✓
痞客邦	https://www.pixnet.net/	✓
PPTV聚力	http://www.pptv.com/	✓
齊魯網(wǎng)	http://v.iqilu.com/	✓
QQ 騰訊視頻	http://v.qq.com/	✓
企鵝直播	http://live.qq.com/	✓
Sina 新浪視頻微博秒拍視頻	http://video.sina.com.cn/ http://video.weibo.com/	✓
Sohu 搜狐視頻	http://tv.sohu.com/	✓
Tudou 土豆	http://www.tudou.com/	✓
陽光衛(wèi)視	http://www.isuntv.com/	✓
Youku 優(yōu)酷	http://www.youku.com/	✓
戰(zhàn)旗TV	http://www.zhanqi.tv/lives	✓
央視網(wǎng)	http://www.cntv.cn/	✓
Naver 네이버	http://tvcast.naver.com/	✓
芒果TV	http://www.mgtv.com/	✓
火貓TV	http://www.huomao.com/	✓
陽光寬頻網(wǎng)	http://www.365yg.com/	✓
西瓜視頻	https://www.ixigua.com/	✓
新片場	https://www.xinpianchang.com/	✓
快手	https://www.kuaishou.com/	✓	✓
抖音	https://www.douyin.com/	✓
TikTok	https://www.tiktok.com/	✓
中國體育(TV)	http://v.zhibo.tv/ http://video.zhibo.tv/	✓
知乎	https://www.zhihu.com/	✓

# 獲取視頻信息
you-get -i https://www.bilibili.com/video/BV1f4411M7QC
# 下載視頻
you-get --format=flv -o E:\Desktop\output https://www.bilibili.com/video/BV1f4411M7QC

視頻、音頻剪輯和音頻提取

思路

這部分的需求非常簡單，就是剪下視頻或者音頻中的某一段并保存

Python有一個叫moviepy的第三方庫，可以實(shí)現(xiàn)視頻的剪輯、拼接，音頻的剪輯、拼接、提取，以及音視頻的合并等操作

參考代碼

    def cut_video(cls, origin_file_path, to_file_path, start, end):
        """
        視頻剪輯
        :param origin_file_path: 原視頻文件路徑
        :param to_file_path: 保存路徑
        :param start: 起始時間點(diǎn)
        :param end: 結(jié)束時間點(diǎn)
        """
        clip = VideoFileClip(origin_file_path).subclip(start, end)
        clip.write_videofile(to_file_path)
    def cut_audio(cls, origin_file_path, to_file_path, start, end):
        """
        音頻剪輯
        :param origin_file_path: 原視頻文件路徑
        :param to_file_path: 保存路徑
        :param start: 起始時間點(diǎn)
        :param end: 結(jié)束時間點(diǎn)
        """
        clip = AudioFileClip(origin_file_path).subclip(start, end)
        clip.write_audiofile(to_file_path)
    def get_audio_from_video(cls, video_file_path, to_file_path):
        """
        音頻提取
        :param video_file_path: 視頻文件路徑
        :param to_file_path: 音頻文件路徑
        """
        video = VideoFileClip(video_file_path)
        video.audio.write_audiofile(to_file_path)

視頻幀提取

思路

使用opencv-python(cv2)打開視頻文件并按幀讀取，再將每一幀保存到文件夾中

視頻幀提取

參考代碼

    def split(cls, from_file_path, to_folder_path, frames=0):
        """
        視頻按幀讀取并保存
        :param from_file_path: 視頻路徑
        :param to_folder_path: 保存路徑
        :param frames: 保存幀數(shù)(張數(shù))，為0則保存所有幀
        """
        vc = cv2.VideoCapture(from_file_path)    # cv2打開視頻文件
        frames_count = vc.get(7)    # 獲取視頻總幀數(shù)
        c = 0
        if vc.isOpened():
            ret, frame = vc.read()          # 按幀讀取視頻
        else:
            ret = False
        while ret:
            if 0 < frames == c:
                break
            ret, frame = vc.read()  # 讀取每一視頻幀，并保存至圖片中
            cv2.imwrite(os.path.join(to_folder_path, '{}.jpg'.format(c)), frame)
            c += 1
            if c == frames_count - 1:
                break
            print('第 {} 張圖片存放成功！'.format(c))

圖片二值化

思路

圖片二值化這里有兩種思路，一種是使用opencv，還有一種方法是使用百度智能云的人像分割接口。

兩種方法各有優(yōu)劣：

使用opencv的速度快，但是只能對整張圖片二值化，無法有效提取出圖片主體部分，只適用于純色背景及輪廓分明的圖片，當(dāng)圖片中有背景或者其他干擾畫面時，效果不理想，達(dá)不到做詞云遮罩的效果
百度的人像分割接口可以將圖片中的人物摳出來，單獨(dú)對人物進(jìn)行二值化，但是速度很慢(處理速度慢，還限制接口并發(fā)數(shù))，一千張圖片往往需要一兩個小時

所以具體使用時需要根據(jù)視頻的情況進(jìn)行切換

下面為兩周處理方法的不同效果(圖一為cv2，圖二為百度人像分割)

使用cv2進(jìn)行二值化

使用百度人像分割進(jìn)行二值化

參考代碼

    def binary_option_cv2(cls, from_file_path, to_file_path):
        """
        圖片二值化并保存(使用cv2)
        :param from_file_path: 原圖路徑
        :param to_file_path: 二值化圖路徑
        """
        img = cv2.imread(from_file_path)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        new_gray = np.uint8((255 * (gray / 255.0) ** 1.4))
        dst = cv2.adaptiveThreshold(new_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 15, 1)
        cv2.medianBlur(dst, 5)
        cv2.imwrite(to_file_path, dst)
    def binary_option_baidu(cls, from_file_path, to_file_path):
        """
        圖片二值化并保存(使用百度人像分割)
        :param from_file_path: 原圖路徑
        :param to_file_path: 二值化圖路徑
        """
        def get_file_content(filePath):
            with open(filePath, 'rb') as fp:
                return fp.read()
        height, width, bgr = cv2.imread(from_file_path).shape
        image = get_file_content(from_file_path)
        cls.client.bodySeg(image)
        res = cls.client.bodySeg(image)
        labelmap = base64.b64decode(res['labelmap'])
        labelimg = np.frombuffer(labelmap, np.uint8)  # 轉(zhuǎn)化為np數(shù)組 0-255
        labelimg = cv2.imdecode(labelimg, 1)
        labelimg = cv2.resize(labelimg, (width, height), interpolation=cv2.INTER_NEAREST)
        img_new = np.where(labelimg == 1, 255, labelimg)  # 將 1 轉(zhuǎn)化為 255
        cv2.imwrite(to_file_path, img_new)

詞云圖片生成

思路

使用wordcloud庫，并使用前面爬取的B站彈幕作為詞云內(nèi)容，二值化圖片作為遮罩

詞云效果

原圖與詞云圖拼接和圖片合并生成視頻

思路

使用numpy拼接圖片，使用cv2將拼接的圖片寫入視頻流并保存

為了將視頻與音軌對齊，生成視頻時需要設(shè)置合適的視頻幀率（與原視頻保持一致），原視頻幀率可以使用播放器查看，也可以使用cv2獲取

拼接效果

參考代碼

    def joint(cls, origin_folder, word_cloud_folder, to_file_path):
        """
        批量拼接圖片并合成視頻
        :param origin_folder: 原圖文件夾
        :param word_cloud_folder: 詞云圖片文件夾
        :param to_file_path: 保存路徑
        """
        num_list = [int(str(i).split('.')[0]) for i in os.listdir(origin_folder)]
        fps = 30  # 視頻幀率，需要根據(jù)原視頻幀率做調(diào)整
        height, width, _ = cv2.imread(os.path.join(origin_folder, '{}.jpg'.format(num_list[0]))).shape  # 視頻高度和寬度
        width = width * 2
        # 創(chuàng)建一個寫入操作;
        video_writer = cv2.VideoWriter(to_file_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))
        for i in sorted(num_list):
            i = '{}.jpg'.format(i)
            ori_jpg = os.path.join(origin_folder, str(i))
            word_jpg = os.path.join(word_cloud_folder, str(i))
            # com_jpg = os.path.join(Composite_path,str(i))
            ori_arr = cv2.imread(ori_jpg)
            word_arr = cv2.imread(word_jpg)
            # 利用 Numpy 進(jìn)行拼接
            com_arr = np.hstack((ori_arr, word_arr))
            video_writer.write(com_arr)  # 將每一幀畫面寫入視頻流中
            print("{}寫入視頻流成功".format(ori_jpg))

音視頻合并和視頻導(dǎo)出

思路

與前面原圖與詞云圖拼接和圖片合并生成視頻思路相似

參考代碼

    def set_audio_for_video(cls, video_file_path, audio_file_path, to_file_path):
        """
        音視頻合并
        :param video_file_path: 視頻文件路徑
        :param audio_file_path: 音頻文件路徑
        :param to_file_path: 保存路徑
        """
        video = VideoFileClip(video_file_path)
        audio = AudioFileClip(audio_file_path)
        new_video = video.set_audio(audio)
        new_video.write_videofile(to_file_path)

最終效果