欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

Python PaddleNLP實(shí)現(xiàn)自動(dòng)生成虎年藏頭詩(shī)

 更新時(shí)間:2022年01月23日 14:21:05   作者:Livingbody  
這篇文章主要介紹了利用Python PaddleNLP實(shí)現(xiàn)自動(dòng)生成虎年藏頭詩(shī)功能,文中的示例代碼講解詳細(xì),感興趣的同學(xué)可以跟隨小編一起試一試

一、 數(shù)據(jù)處理

本項(xiàng)目中利用古詩(shī)數(shù)據(jù)集作為訓(xùn)練集,編碼器接收古詩(shī)的每個(gè)字的開(kāi)頭,解碼器利用編碼器的信息生成所有的詩(shī)句。為了詩(shī)句之間的連貫性,編碼器同時(shí)也在詩(shī)頭之前加上之前詩(shī)句的信息。舉例:

“白日依山盡,黃河入海流,欲窮千里目,更上一層樓。” 可以生成兩個(gè)樣本:

樣本一:編碼器輸入,“白”;解碼器輸入,“白日依山盡,黃河入海流”

樣本二:編碼器輸入,“白日依山盡,黃河入海流。欲”;解碼器輸入,“欲窮千里目,更上一層樓。”

1.paddlenlp升級(jí)

!pip install -U paddlenlp
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting paddlenlp
[?25l  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/17/9b/4535ccf0e96c302a3066bd2e4d0f44b6b1a73487c6793024475b48466c32/paddlenlp-2.2.3-py3-none-any.whl (1.2MB)
     |████████████████████████████████| 1.2MB 11.2MB/s eta 0:00:01
[?25hRequirement already satisfied, skipping upgrade: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (2.9.0)
Requirement already satisfied, skipping upgrade: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (4.1.0)
Requirement already satisfied, skipping upgrade: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.4.4)
Requirement already satisfied, skipping upgrade: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (1.2.2)
Requirement already satisfied, skipping upgrade: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.42.1)
Requirement already satisfied, skipping upgrade: multiprocess in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.70.11.1)
Requirement already satisfied, skipping upgrade: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp) (1.16.0)
Requirement already satisfied, skipping upgrade: numpy>=1.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp) (1.20.3)
Requirement already satisfied, skipping upgrade: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp) (0.24.2)
Requirement already satisfied, skipping upgrade: dill>=0.3.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from multiprocess->paddlenlp) (0.3.3)
Requirement already satisfied, skipping upgrade: scipy>=0.19.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (1.6.3)
Requirement already satisfied, skipping upgrade: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (2.1.0)
Requirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (0.14.1)
Installing collected packages: paddlenlp
  Found existing installation: paddlenlp 2.1.1
    Uninstalling paddlenlp-2.1.1:
      Successfully uninstalled paddlenlp-2.1.1
Successfully installed paddlenlp-2.2.3

2.提取詩(shī)頭

import re
poems_file = open("./data/data70759/poems_zh.txt", encoding="utf8")
# 對(duì)讀取的每一行詩(shī)句,統(tǒng)計(jì)每一句的詞頭
poems_samples = []
poems_prefix = []
poems_heads = []
for line in poems_file.readlines():
    line_ = re.sub('。', ' ', line)
    line_ = line_.split()
    # 生成訓(xùn)練樣本
    for i, p in enumerate(line_):
        poems_heads.append(p[0])
        poems_prefix.append('。'.join(line_[:i]))
        poems_samples.append(p + '。')


# 輸出文件信息
for i in range(20):
    print("poems heads:{}, poems_prefix: {}, poems:{}".format(poems_heads[i], poems_prefix[i], poems_samples[i]))
poems heads:欲, poems_prefix: , poems:欲出未出光辣達(dá),千山萬(wàn)山如火發(fā)。
poems heads:須, poems_prefix: 欲出未出光辣達(dá),千山萬(wàn)山如火發(fā), poems:須臾走向天上來(lái),逐卻殘星趕卻月。
poems heads:未, poems_prefix: , poems:未離海底千山黑,才到天中萬(wàn)國(guó)明。
poems heads:滿, poems_prefix: , poems:滿目江山四望幽,白云高卷嶂煙收。
poems heads:日, poems_prefix: 滿目江山四望幽,白云高卷嶂煙收, poems:日回禽影穿疏木,風(fēng)遞猿聲入小樓。
poems heads:遠(yuǎn), poems_prefix: 滿目江山四望幽,白云高卷嶂煙收。日回禽影穿疏木,風(fēng)遞猿聲入小樓, poems:遠(yuǎn)岫似屏橫碧落,斷帆如葉截中流。
poems heads:片, poems_prefix: , poems:片片飛來(lái)靜又閑,樓頭江上復(fù)山前。
poems heads:飄, poems_prefix: 片片飛來(lái)靜又閑,樓頭江上復(fù)山前, poems:飄零盡日不歸去,帖破清光萬(wàn)里天。
poems heads:因, poems_prefix: , poems:因登巨石知來(lái)處,勃勃元生綠蘚痕。
poems heads:靜, poems_prefix: 因登巨石知來(lái)處,勃勃元生綠蘚痕, poems:靜即等閑藏草木,動(dòng)時(shí)頃刻徧乾坤。
poems heads:橫, poems_prefix: 因登巨石知來(lái)處,勃勃元生綠蘚痕。靜即等閑藏草木,動(dòng)時(shí)頃刻徧乾坤, poems:橫天未必朋元惡,捧日還曾瑞至尊。
poems heads:不, poems_prefix: 因登巨石知來(lái)處,勃勃元生綠蘚痕。靜即等閑藏草木,動(dòng)時(shí)頃刻徧乾坤。橫天未必朋元惡,捧日還曾瑞至尊, poems:不獨(dú)朝朝在巫峽,楚王何事謾勞魂。
poems heads:若, poems_prefix: , poems:若教作鎮(zhèn)居中國(guó),爭(zhēng)得泥金在泰山。
poems heads:才, poems_prefix: , poems:才聞暖律先偷眼,既待和風(fēng)始展眉。
poems heads:嚼, poems_prefix: , poems:嚼處春冰敲齒冷,咽時(shí)雪液沃心寒。
poems heads:蒙, poems_prefix: , poems:蒙君知重惠瓊實(shí),薄起金刀釘玉深。
poems heads:深, poems_prefix: , poems:深?yuàn)y玉瓦平無(wú)垅,亂拂蘆花細(xì)有聲。
poems heads:片, poems_prefix: , poems:片逐銀蟾落醉觥。
poems heads:巧, poems_prefix: , poems:巧剪銀花亂,輕飛玉葉狂。
poems heads:寒, poems_prefix: , poems:寒艷芳姿色盡明。

3.生成詞表

# 用PaddleNLP生成詞表文件,由于詩(shī)文的句式較短,我們以單個(gè)字作為詞單元生成詞表
from paddlenlp.data import Vocab

vocab = Vocab.build_vocab(poems_samples, unk_token="<unk>", pad_token="<pad>", bos_token="<", eos_token=">")
vocab_size = len(vocab)

print("vocab size", vocab_size)
print("word to idx:", vocab.token_to_idx)

4.定義dataset

# 定義數(shù)據(jù)讀取器
from paddle.io import Dataset, BatchSampler, DataLoader
import numpy as np

class PoemDataset(Dataset):
    def __init__(self, poems_data, poems_heads, poems_prefix, vocab, encoder_max_len=128, decoder_max_len=32):
        super(PoemDataset, self).__init__()
        self.poems_data = poems_data
        self.poems_heads = poems_heads
        self.poems_prefix = poems_prefix
        self.vocab = vocab
        self.tokenizer = lambda x: [vocab.token_to_idx[x_] for x_ in x]
        self.encoder_max_len = encoder_max_len
        self.decoder_max_len = decoder_max_len

    def __getitem__(self, idx):
        eos_id = vocab.token_to_idx[vocab.eos_token]
        bos_id = vocab.token_to_idx[vocab.bos_token]
        pad_id = vocab.token_to_idx[vocab.pad_token]
        # 確保encoder和decoder的輸出都小于最大長(zhǎng)度
        poet = self.poems_data[idx][:self.decoder_max_len - 2]  # -2 包含bos_id和eos_id
        prefix = self.poems_prefix[idx][- (self.encoder_max_len - 3):]  # -3 包含bos_id, eos_id, 和head的編碼
        # 對(duì)輸入輸出編碼

        sample = [bos_id] + self.tokenizer(poet) + [eos_id]
        prefix = self.tokenizer(prefix) if prefix else []
        heads = prefix + [bos_id] + self.tokenizer(self.poems_heads[idx]) + [eos_id] 
        sample_len = len(sample)
        heads_len = len(heads)
        sample = sample + [pad_id] * (self.decoder_max_len - sample_len)
        heads = heads + [pad_id] * (self.encoder_max_len - heads_len)
        mask = [1] * (sample_len - 1) + [0] * (self.decoder_max_len - sample_len) # -1 to make equal to out[2]
        out = [np.array(d, "int64") for d in [heads, heads_len, sample, sample, mask]]
        out[2] = out[2][:-1]
        out[3] = out[3][1:, np.newaxis]
        return out

    def shape(self):
        return [([None, self.encoder_max_len], 'int64', 'src'),
                ([None, 1], 'int64', 'src_length'),
                ([None, self.decoder_max_len - 1],'int64', 'trg')], \
               [([None, self.decoder_max_len - 1, 1], 'int64', 'label'),
                ([None, self.decoder_max_len - 1], 'int64', 'trg_mask')]


    def __len__(self):
        return len(self.poems_data)

dataset = PoemDataset(poems_samples, poems_heads, poems_prefix, vocab)
batch_sampler = BatchSampler(dataset, batch_size=2048)
data_loader = DataLoader(dataset, batch_sampler=batch_sampler)

二、定義模型并訓(xùn)練

1.模型定義

from Seq2Seq.models import Seq2SeqModel
from paddlenlp.metrics import Perplexity
from Seq2Seq.loss import CrossEntropyCriterion
import paddle
from paddle.static import InputSpec

# 參數(shù)
lr = 1e-6
max_epoch = 20
models_save_path = "./checkpoints"

encoder_attrs = {"vocab_size": vocab_size, "embed_dim": 200, "hidden_size": 128, "num_layers": 4, "dropout": .2,
                    "direction": "bidirectional", "mode": "GRU"}
decoder_attrs = {"vocab_size": vocab_size, "embed_dim": 200, "hidden_size": 128, "num_layers": 4, "direction": "forward",
                    "dropout": .2, "mode": "GRU", "use_attention": True}

# inputs shape and label shape
inputs_shape, labels_shape = dataset.shape()
inputs_list = [InputSpec(input_shape[0], input_shape[1], input_shape[2]) for input_shape in inputs_shape]
labels_list = [InputSpec(label_shape[0], label_shape[1], label_shape[2]) for label_shape in labels_shape]

net = Seq2SeqModel(encoder_attrs, decoder_attrs)
model = paddle.Model(net, inputs_list, labels_list)

model.load("./final_models/model")

opt = paddle.optimizer.Adam(learning_rate=lr, parameters=model.parameters())

model.prepare(opt, CrossEntropyCriterion(), Perplexity())
W0122 21:03:30.616776   166 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0122 21:03:30.620450   166 device_context.cc:465] device: 0, cuDNN Version: 7.6.

2.模型訓(xùn)練

# 訓(xùn)練,訓(xùn)練時(shí)間較長(zhǎng),已提供了訓(xùn)練好的模型(./final_models/model)
model.fit(train_data=data_loader, epochs=max_epoch, eval_freq=1, save_freq=5, save_dir=models_save_path, shuffle=True)

3.模型保存

# 保存
model.save("./final_models/model")

三、生成藏頭詩(shī)

import warnings

def post_process_seq(seq, bos_idx, eos_idx, output_bos=False, output_eos=False):
    """
    Post-process the decoded sequence.
    """
    eos_pos = len(seq) - 1
    for i, idx in enumerate(seq):
        if idx == eos_idx:
            eos_pos = i
            break
    seq = [idx for idx in seq[:eos_pos + 1]
           if (output_bos or idx != bos_idx) and (output_eos or idx != eos_idx)]
    return seq

# 定義用于生成祝福語(yǔ)的類
from paddlenlp.data.tokenizer import JiebaTokenizer

class GenPoems():
    # content (str): the str to generate poems, like "恭喜發(fā)財(cái)"
    # vocab: the instance of paddlenlp.data.vocab.Vocab
    # model: the Inference Model
    def __init__(self, vocab, model):
        self.bos_id = vocab.token_to_idx[vocab.bos_token]
        self.eos_id = vocab.token_to_idx[vocab.eos_token]
        self.pad_id = vocab.token_to_idx[vocab.pad_token]
        self.tokenizer = lambda x: [vocab.token_to_idx[x_] for x_ in x]
        self.model = model
        self.vocab = vocab

    def gen(self, content, max_len=128):
        # max_len is the encoder_max_len in Seq2Seq Model.
        out = []
        vocab_list = list(vocab.token_to_idx.keys())
        for w in content:
            if w in vocab_list:
                content = re.sub("([。,])", '', content)
                heads = out[- (max_len - 3):] + [self.bos_id] + self.tokenizer(w) + [self.eos_id]
                len_heads = len(heads)
                heads = heads + [self.pad_id] * (max_len - len_heads)
                x = paddle.to_tensor([heads], dtype="int64")
                len_x = paddle.to_tensor([len_heads], dtype='int64')
                pred = self.model.predict_batch(inputs = [x, len_x])[0]
                out += self._get_results(pred)[0]
            else:
                warnings.warn("{} is not in vocab list, so it is skipped.".format(w))
                pass
        out = ''.join([self.vocab.idx_to_token[id] for id in out])
        return out
    
    def _get_results(self, pred):
        pred = pred[:, :, np.newaxis] if len(pred.shape) == 2 else pred
        pred = np.transpose(pred, [0, 2, 1])
        outs = []
        for beam in pred[0]:
            id_list = post_process_seq(beam, self.bos_id, self.eos_id)
            outs.append(id_list)
        return outs
# 載入預(yù)測(cè)模型
from Seq2Seq.models import Seq2SeqInferModel
import paddle

encoder_attrs = {"vocab_size": vocab_size, "embed_dim": 200, "hidden_size": 128, "num_layers": 4, "dropout": .2,
                    "direction": "bidirectional", "mode": "GRU"}
decoder_attrs = {"vocab_size": vocab_size, "embed_dim": 200, "hidden_size": 128, "num_layers": 4, "direction": "forward",
                    "dropout": .2, "mode": "GRU", "use_attention": True}

infer_model = paddle.Model(Seq2SeqInferModel(encoder_attrs,
                                             decoder_attrs,
                                             bos_id=vocab.token_to_idx[vocab.bos_token],
                                             eos_id=vocab.token_to_idx[vocab.eos_token],
                                             beam_size=10,
                                             max_out_len=256))
infer_model.load("./final_models/model")
# 送新年祝福
# 當(dāng)然,表白也可以
generator = GenPoems(vocab, infer_model)

content = "生龍活虎"
poet = generator.gen(content)
for line in poet.strip().split('。'):
    try:
        print("{}\t{}。".format(line[0], line))
    except:
        pass

輸出結(jié)果

生    生涯不可見(jiàn),何處不相逢。
龍    龍虎不知何處,人間不見(jiàn)人間。
活    活人不是人間事,不覺(jué)人間不可識(shí)。
虎    虎豹相逢不可尋,不知何處不相識(shí)。

總結(jié)

這個(gè)項(xiàng)目介紹了如何訓(xùn)練一個(gè)生成藏頭詩(shī)的模型,從結(jié)果可以看出,模型已經(jīng)具有一定的生成詩(shī)句的能力。但是,限于訓(xùn)練集規(guī)模和訓(xùn)練時(shí)間,生成的詩(shī)句還有很大的改進(jìn)空間,未來(lái)還將進(jìn)一步優(yōu)化這個(gè)模型,敬請(qǐng)期待。

以上就是Python PaddleNLP實(shí)現(xiàn)自動(dòng)生成虎年藏頭詩(shī)的詳細(xì)內(nèi)容,更多關(guān)于PaddleNLP生成藏頭詩(shī)的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!

相關(guān)文章

  • Python光學(xué)仿真之對(duì)光的干涉理解學(xué)習(xí)

    Python光學(xué)仿真之對(duì)光的干涉理解學(xué)習(xí)

    這篇文章主要為大家介紹了Python光學(xué)仿真之對(duì)光的干涉理解學(xué)習(xí),有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步早日升職加薪
    2021-10-10
  • 通過(guò)LyScript實(shí)現(xiàn)從文本中讀寫(xiě)ShellCode

    通過(guò)LyScript實(shí)現(xiàn)從文本中讀寫(xiě)ShellCode

    LyScript 插件通過(guò)配合內(nèi)存讀寫(xiě),可實(shí)現(xiàn)對(duì)特定位置的ShellCode代碼的導(dǎo)出。本文將利用這一特性實(shí)現(xiàn)從文本中讀寫(xiě)ShellCode,感興趣的可以了解一下
    2022-08-08
  • pycharm使用anaconda全過(guò)程

    pycharm使用anaconda全過(guò)程

    這篇文章主要介紹了pycharm使用anaconda全過(guò)程,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2023-02-02
  • 淺談python新手中常見(jiàn)的疑惑及解答

    淺談python新手中常見(jiàn)的疑惑及解答

    下面小編就為大家?guī)?lái)一篇淺談python新手中常見(jiàn)的疑惑及解答。小編覺(jué)得挺不錯(cuò)的,現(xiàn)在就分享給大家,也給大家做個(gè)參考。一起跟隨小編過(guò)來(lái)看看吧
    2016-06-06
  • 一文帶你深度解密Python的字節(jié)碼

    一文帶你深度解密Python的字節(jié)碼

    當(dāng)我們想要執(zhí)行一個(gè)?py?文件的時(shí)候,只需要在命令行中輸入?python?xxx.py?即可,但你有沒(méi)有想過(guò)這背后的流程是怎樣的呢?本文主要賀和大家來(lái)聊聊Python中的字節(jié)碼,感興趣的可以了解一下
    2022-12-12
  • python中enumerate函數(shù)遍歷元素用法分析

    python中enumerate函數(shù)遍歷元素用法分析

    這篇文章主要介紹了python中enumerate函數(shù)遍歷元素用法,結(jié)合實(shí)例形式分析了enumerate函數(shù)遍歷元素的相關(guān)實(shí)現(xiàn)技巧,需要的朋友可以參考下
    2016-03-03
  • 對(duì)Django中的權(quán)限和分組管理實(shí)例講解

    對(duì)Django中的權(quán)限和分組管理實(shí)例講解

    今天小編就為大家分享一篇對(duì)Django中的權(quán)限和分組管理實(shí)例講解,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧
    2019-08-08
  • Python 包含漢字的文件讀寫(xiě)之每行末尾加上特定字符

    Python 包含漢字的文件讀寫(xiě)之每行末尾加上特定字符

    這篇文章主要介紹了Python 包含漢字的文件讀寫(xiě)之每行末尾加上特定字符的相關(guān)資料,需非常不錯(cuò),具有參考借鑒價(jià)值,要的朋友可以參考下
    2016-12-12
  • python實(shí)現(xiàn)對(duì)變位詞的判斷方法

    python實(shí)現(xiàn)對(duì)變位詞的判斷方法

    這篇文章主要為大家詳細(xì)介紹了python實(shí)現(xiàn)對(duì)變位詞的判斷方法,文中示例代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下
    2020-04-04
  • Python坐標(biāo)線性插值應(yīng)用實(shí)現(xiàn)

    Python坐標(biāo)線性插值應(yīng)用實(shí)現(xiàn)

    這篇文章主要介紹了Python坐標(biāo)線性插值應(yīng)用實(shí)現(xiàn),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧
    2019-11-11

最新評(píng)論