快捷導(dǎo)航

C++ Boost Tokenizer使用詳細(xì)講解

更新時間：2022年11月11日 16:09:26 作者：無水先生

Boost是為C++語言標(biāo)準(zhǔn)庫提供擴(kuò)展的一些C++程序庫的總稱。Boost庫是一個可移植、提供源代碼的C++庫，作為標(biāo)準(zhǔn)庫的后備，是C++標(biāo)準(zhǔn)化進(jìn)程的開發(fā)引擎之一，是為C++語言標(biāo)準(zhǔn)庫提供擴(kuò)展的一些C++程序庫的總稱

介紹

庫 Boost.Tokenizer 允許您通過將某些字符解釋為分隔符來迭代字符串中的部分表達(dá)式。使用 boost::tokenizer 迭代字符串中的部分表達(dá)式

示例一

使用 boost::tokenizer 迭代字符串中的部分表達(dá)式

#include <boost/tokenizer.hpp>
#include <string>
#include <iostream>
int main()
{
  typedef boost::tokenizer<boost::char_separator<char>> tokenizer;
  std::string s = "Boost C++ Libraries";
  tokenizer tok{s};
  for (tokenizer::iterator it = tok.begin(); it != tok.end(); ++it)
    std::cout << *it << '\n';
}

Boost.Tokenizer 在 boost/tokenizer.hpp 中定義了一個名為 boost::tokenizer 的類模板。它期望一個標(biāo)識連貫表達(dá)式的類作為模板參數(shù)。示例 10.1 使用了 boost::char_separator 類，它將空格和標(biāo)點符號解釋為分隔符。

必須使用 std::string 類型的字符串初始化標(biāo)記器。使用成員函數(shù) begin() 和 end()，可以像容器一樣訪問標(biāo)記器。用于初始化標(biāo)記器的字符串的部分表達(dá)式可通過迭代器獲得。部分表達(dá)式的計算方式取決于作為模板參數(shù)傳遞的類的類型。

因為 boost::char_separator 默認(rèn)將空格和標(biāo)點符號解釋為分隔符，所以示例 10.1 會顯示 Boost、C、+、+ 和庫。 boost::char_separator 使用 std::isspace() 和 std::ispunct() 來識別分隔符。 Boost.Tokenizer 區(qū)分應(yīng)該顯示的分隔符和應(yīng)該抑制的分隔符。默認(rèn)情況下，空格被抑制并顯示標(biāo)點符號。

示例二

初始化 boost::char_separator 以適應(yīng)迭代

#include <boost/tokenizer.hpp>
#include <string>
#include <iostream>
int main()
{
  typedef boost::tokenizer<boost::char_separator<char>> tokenizer;
  std::string s = "Boost C++ Libraries";
  boost::char_separator<char> sep{" "};
  tokenizer tok{s, sep};
  for (const auto &t : tok)
    std::cout << t << '\n';
}

為了防止標(biāo)點符號被解釋為分隔符，請在將 boost::char_separator 對象傳遞給分詞器之前對其進(jìn)行初始化。

boost::char_separator 的構(gòu)造函數(shù)一共接受三個參數(shù)，但只需要第一個。第一個參數(shù)描述被抑制的各個分隔符。示例 10.2 與示例 10.1 一樣，將空格視為分隔符。

第二個參數(shù)指定應(yīng)顯示的分隔符。如果省略此參數(shù)，則不顯示分隔符，程序現(xiàn)在將顯示 Boost、C++ 和庫。

示例三

使用 boost::char_separator 模擬默認(rèn)行為

#include <boost/tokenizer.hpp>
#include <string>
#include <iostream>
int main()
{
  typedef boost::tokenizer<boost::char_separator<char>> tokenizer;
  std::string s = "Boost C++ Libraries";
  boost::char_separator<char> sep{" ", "+"};
  tokenizer tok{s, sep};
  for (const auto &t : tok)
    std::cout << t << '\n';
}

如果將加號作為第二個參數(shù)傳遞，則示例 10.3 的行為類似于示例 10.1。

第三個參數(shù)決定是否顯示空的部分表達(dá)式。如果連續(xù)找到兩個分隔符，則對應(yīng)的部分表達(dá)式為空。默認(rèn)情況下，不顯示這些空表達(dá)式。使用第三個參數(shù)，可以更改默認(rèn)行為。

示例四

初始化 boost::char_separator 以顯示空的部分表達(dá)式

#include <boost/tokenizer.hpp>
#include <string>
#include <iostream>
int main()
{
  typedef boost::tokenizer<boost::char_separator<char>> tokenizer;
  std::string s = "Boost C++ Libraries";
  boost::char_separator<char> sep{" ", "+", boost::keep_empty_tokens};
  tokenizer tok{s, sep};
  for (const auto &t : tok)
    std::cout << t << '\n';
}

示例 10.4 顯示了兩個額外的空部分表達(dá)式。第一個位于兩個加號之間，而第二個位于第二個加號和后面的空格之間。

示例五

具有寬字符串的 Boost.Tokenizer

#include <boost/tokenizer.hpp>
#include <string>
#include <iostream>
int main()
{
  typedef boost::tokenizer<boost::char_separator<wchar_t>,
    std::wstring::const_iterator, std::wstring> tokenizer;
  std::wstring s = L"Boost C++ Libraries";
  boost::char_separator<wchar_t> sep{L" "};
  tokenizer tok{s, sep};
  for (const auto &t : tok)
    std::wcout << t << '\n';
}

Example

示例 10.5 迭代一個 std::wstring 類型的字符串。為了支持此字符串類型，必須使用附加模板參數(shù)初始化標(biāo)記器。類 boost::char_separator 也必須用 wchar_t 初始化。

除了 boost::char_separator 之外，Boost.Tokenizer 還提供了兩個額外的類來識別部分表達(dá)式。

示例六

使用 boost::escaped_list_separator 解析 CSV 文件

#include <boost/tokenizer.hpp>
#include <string>
#include <iostream>
int main()
{
  typedef boost::tokenizer<boost::escaped_list_separator<char>> tokenizer;
  std::string s = "Boost,\"C++ Libraries\"";
  tokenizer tok{s};
  for (const auto &t : tok)
    std::cout << t << '\n';
}

boost::escaped_list_separator 用于讀取以逗號分隔的多個值。這種格式通常稱為 CSV（逗號分隔值）。 boost::escaped_list_separator 還處理雙引號和轉(zhuǎn)義序列。因此，示例 10.6 的輸出是 Boost 和 C++ 庫。

提供的第二個類是 boost::offset_separator，它必須被實例化。相應(yīng)的對象必須作為第二個參數(shù)傳遞給 boost::tokenizer 的構(gòu)造函數(shù)。

示例七

使用 boost::offset_separator 迭代部分表達(dá)式

#include <boost/tokenizer.hpp>
#include <string>
#include <iostream>
int main()
{
  typedef boost::tokenizer<boost::offset_separator> tokenizer;
  std::string s = "Boost_C++_Libraries";
  int offsets[] = {5, 5, 9};
  boost::offset_separator sep{offsets, offsets + 3};
  tokenizer tok{s, sep};
  for (const auto &t : tok)
    std::cout << t << '\n';
}

boost::offset_separator 指定字符串中各個部分表達(dá)式結(jié)束的位置。示例 10.7 指定第一個部分表達(dá)式在 5 個字符后結(jié)束，第二個在另外 5 個字符后結(jié)束，第三個在以下 9 個字符后結(jié)束。輸出將是 Boost、_C++_ 和庫。

到此這篇關(guān)于C++ Boost Tokenizer使用詳細(xì)講解的文章就介紹到這了,更多相關(guān)C++ Boost Tokenizer內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: