快捷導(dǎo)航

Pytorch的torch.nn.embedding()如何實(shí)現(xiàn)詞嵌入層

更新時(shí)間：2024年02月27日 15:49:11 作者：#苦行僧

這篇文章主要介紹了Pytorch的torch.nn.embedding()如何實(shí)現(xiàn)詞嵌入層問(wèn)題,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教

torch.nn.embedding()實(shí)現(xiàn)詞嵌入層

nn.embedding()其實(shí)是NLP中常用的詞嵌入層，在實(shí)現(xiàn)詞嵌入的過(guò)程中embedding層的權(quán)重用于隨機(jī)初始化詞的向量，該embedding層的權(quán)重參數(shù)在后續(xù)訓(xùn)練時(shí)會(huì)不斷更新調(diào)整，并被優(yōu)化。

nn.embedding:這是一個(gè)矩陣類(lèi)，該開(kāi)始時(shí)里面初始化了一個(gè)隨機(jī)矩陣，矩陣的長(zhǎng)是字典的大小，寬是用來(lái)表示字典中每個(gè)元素的屬性向量，向量的維度根據(jù)你想要表示的元素的復(fù)雜度而定。

類(lèi)實(shí)例化之后可以根據(jù)字典中元素的下標(biāo)來(lái)查找元素對(duì)應(yīng)的向量。

因?yàn)檩斎氲木渥娱L(zhǎng)度不一，有的長(zhǎng)有的短。

長(zhǎng)了截?cái)?，不夠長(zhǎng)補(bǔ)齊(我文中用’'填充，然后在nn.embedding層將其補(bǔ)0，也就是用它來(lái)表示無(wú)意義的詞，這樣在后面的max-pooling層也就自然而然會(huì)把其過(guò)濾掉，這樣就不用擔(dān)心他會(huì)影響識(shí)別。)

這里說(shuō)一下它的用法：

nn.embedding()主要3個(gè)參數(shù)

第一個(gè)參數(shù)num_embeddings是指詞表大小
第二個(gè)參數(shù)embedding_dim是指你需要用多少維來(lái)表示一個(gè)符號(hào)
第三個(gè)參數(shù)pading_idx即需要用0填充的符號(hào)在詞表中的位置，如下，輸出中后面兩個(gè)’'都有被填充為了0.

import torch
import torch.nn as nn


#詞表
word_to_id = {'hello':0, '<PAD>':1,'world':2}
embeds = nn.Embedding(len(word_to_id), 4,padding_idx=word_to_id['<PAD>'])

text = 'hello world <PAD> <PAD>'
hello_idx = torch.LongTensor([word_to_id[i] for i in text.split()])
#詞嵌入得到詞向量
hello_embed = embeds(hello_idx)
print(hello_embed)

從以下輸出可以看到，每行代表句子中一個(gè)單詞的詞嵌入向量，句子中的每個(gè)單詞都有4維度，最后兩個(gè)0向量是時(shí)用來(lái)填充補(bǔ)齊的沒(méi)意義。

所以embedding層其實(shí)相當(dāng)于將前面用索引編碼的句子表示乘上embedding層的可訓(xùn)練權(quán)重得到的就是詞嵌入的結(jié)果

輸出：

tensor([[-1.1436, 1.4588, -1.2755, 0.0077],
[-0.9600, -1.9986, -1.1087, -0.1520],
[ 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000]], grad_fn=)

你也可以使用nn.Embedding.from_pretrained()加載預(yù)訓(xùn)練好的模型，如word2vec,glove等，在訓(xùn)練的過(guò)程中也可以邊訓(xùn)練，邊更新詞向量，加快模型的收斂。

本文用的只是簡(jiǎn)單的nn.embedding()

然后具體使用 nn.embedding() 時(shí)，寫(xiě)在初始化搭建網(wǎng)絡(luò)里

如下：

class Network(nn.Module):
    def __init__(self):
        super(TextCNN, self).__init__(nvocab,embed)
        self.filter_sizes = (2, 3, 4)
        self.embed = embed
        self.num_filters = 256
        self.dropout = 0.5
        self.num_classes = num_classes
        self.n_vocab = nvocab
        #通過(guò)padding_idx將<PAD>字符填充為0，因?yàn)樗麤](méi)意義哦，后面max-pooling自然而然會(huì)把他過(guò)濾掉哦
        self.embedding = nn.Embedding(self.n_vocab, self.embed, padding_idx=word2idx['<PAD>'])
        self.convs = nn.ModuleList(
            [nn.Conv2d(1, self.num_filters, (k, self.embed)) for k in self.filter_sizes])
        
        self.dropout = nn.Dropout(self.dropout)
        self.fc = nn.Linear(self.num_filters * len(self.filter_sizes), self.num_classes)
        
    def conv_and_pool(self, x, conv):
        x = F.relu(conv(x)).squeeze(3)
        x = F.max_pool1d(x, x.size(2)).squeeze(2)
        return x
        
    def forward(self, x):
        out = self.embedding(x)
        out = out.unsqueeze(1)
        out = torch.cat([self.conv_and_pool(out, conv) for conv in self.convs], 1)
        out = self.dropout(out)
        out = self.fc(out)
        return out