python tiktoken的簡介、安裝、使用方法
更新時間:2023年10月20日 14:28:21 作者:一個處女座的程序猿
tiktoken是OpenAI于近期開源的Python第三方模塊,該模塊主要實現了tokenizer的BPE(Byte pair encoding)算法,并對運行性能做了極大的優(yōu)化,本文將介紹python tiktoken的簡介、安裝、使用方法,感興趣的朋友跟隨小編一起看看吧
tiktoken的簡介
tiktoken是一個用于OpenAI模型的快速BPE標記器。
1、性能:tiktoken比一個類似的開源分詞器快3到6倍
tiktoken的安裝
pip install tiktoken pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tiktoken
C:\Windows\system32>pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tiktoken Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting tiktoken Downloading https://pypi.tuna.tsinghua.edu.cn/packages/91/cf/7f3b821152f7abb240950133c60c394f7421a5791b020cedb190ff7a61b4/tiktoken-0.5.1-cp39-cp39-win_amd64.whl (760 kB) |████████████████████████████████| 760 kB 726 kB/s Requirement already satisfied: regex>=2022.1.18 in d:\programdata\anaconda3\lib\site-packages (from tiktoken) (2022.3.15) Requirement already satisfied: requests>=2.26.0 in d:\programdata\anaconda3\lib\site-packages (from tiktoken) (2.31.0) Requirement already satisfied: charset-normalizer<4,>=2 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (2.0.12) Requirement already satisfied: urllib3<3,>=1.21.1 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (1.26.9) Requirement already satisfied: idna<4,>=2.5 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (3.3) Requirement already satisfied: certifi>=2017.4.17 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (2021.10.8) Installing collected packages: tiktoken Successfully installed tiktoken-0.5.1
tiktoken的使用方法
1、基礎用法
(1)、用于OpenAI模型的快速BPE標記器
import tiktoken enc = tiktoken.get_encoding("cl100k_base") assert enc.decode(enc.encode("hello world")) == "hello world" # To get the tokeniser corresponding to a specific model in the OpenAI API: enc = tiktoken.encoding_for_model("gpt-4")
(2)、幫助可視化BPE過程的代碼
from tiktoken._educational import * # Train a BPE tokeniser on a small amount of text enc = train_simple_encoding() # Visualise how the GPT-4 encoder encodes text enc = SimpleBytePairEncoding.from_tiktoken("cl100k_base") enc.encode("hello world aaaaaaaaaaaa")
到此這篇關于python tiktoken的簡介、安裝、使用方法的文章就介紹到這了,更多相關python tiktoken安裝使用內容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家!