R語言學(xué)習(xí)Rcpp基礎(chǔ)知識全面整理

更新時間：2021年11月06日 15:31:46 作者：EmissaryDX

這篇文章主要介紹了R語言學(xué)習(xí)Rcpp知識的全面整理，包括相關(guān)配置說明，常用數(shù)據(jù)類型及建立等基礎(chǔ)知識的全面詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助

1. 相關(guān)配置和說明

由于Dirk的書Seamless R and C++ Integration with Rcpp是13年出版的，當(dāng)時Rcpp Attributes這一特性還沒有被CRAN批準(zhǔn)，所以當(dāng)時調(diào)用和編寫Rcpp函數(shù)還比較繁瑣。Rcpp Attributes（2016）極大簡化了這一過程(“provides an even more direct connection between C++ and R”)，保留了內(nèi)聯(lián)函數(shù)，并提供了sourceCpp函數(shù)用于調(diào)用外部的.cpp文件。換句話說，我們可以將某C++函數(shù)存在某個.cpp文件中，再從R腳本文件中，像使用source一樣，通過sourceCpp來調(diào)用此C++函數(shù)。

例如，在R腳本文件中，我們希望調(diào)用名叫test.cpp文件中的函數(shù)，我們可以采用如下操作：

library(Rcpp)
Sys.setenv("PKG_CXXFLAGS"="-std=c++11")
sourceCpp("test.cpp")

其中第二行的意思是使用C++11的標(biāo)準(zhǔn)來編譯文件。

在test.cpp文件中, 頭文件使用Rcpp.h，需要輸出到R中的函數(shù)放置在//[[Rcpp::export]]之后。如果要輸出到R中的函數(shù)需要調(diào)用其他C++函數(shù)，可以將這些需要調(diào)用的函數(shù)放在//[[Rcpp::export]]之前。

#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]

為進行代數(shù)計算，Rcpp提供了RcppArmadillo和RcppEigen。如果要使用此包，需要在函數(shù)文件開頭注明依賴關(guān)系，例如// [[Rcpp::depends(RcppArmadillo)]]，并載入相關(guān)頭文件：

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]

C++的基本知識可以參見此處。

2. 常用數(shù)據(jù)類型

關(guān)鍵字	描述
int/double/bool/String/auto	整數(shù)型/數(shù)值型/布爾值/字符型/自動識別(C++11)
IntegerVector	整型向量
NumericVector	數(shù)值型向量(元素的類型為double)
ComplexVector	復(fù)數(shù)向量 Not Sure
LogicalVector	邏輯型向量； R的邏輯型變量可以取三種值：TRUE, FALSE, NA；而C++布爾值只有兩個,true or false。如果將R的NA轉(zhuǎn)化為C++中的布爾值，則會返回true。
CharacterVector	字符型向量
ExpressionVector	vectors of expression types
RawVector	vectors of type raw
IntegerMatrix	整型矩陣
NumericMatrix	數(shù)值型矩陣(元素的類型為double)
LogicalMatrix	邏輯型矩陣
CharacterMatrix	字符矩陣
List aka GenericVector	列表；lists;類似于R中列表，其元素可以使任何數(shù)據(jù)類型
DataFrame	數(shù)據(jù)框；data frames；在Rcpp內(nèi)部，數(shù)據(jù)框其實是通過列表實現(xiàn)的
Function	函數(shù)型
Environment	環(huán)境型；可用于引用R環(huán)境中的函數(shù)、其他R包中的函數(shù)、操作R環(huán)境中的變量
RObject	可以被R識別的類型

注釋：

某些R對象可以通過as<Some_RcppObject>(Some_RObject)轉(zhuǎn)化為轉(zhuǎn)化為Rcpp對象。例如:
在R中擬合一個線性模型（其為List），并將其傳入C++函數(shù)中

>mod=lm(Y~X);

NumericVector resid = as<NumericVector>(mod["residuals"]);
NumericVector fitted = as<NumericVector>(mod["fitted.values"]);

可以通過as<some_STL_vector>(Some_RcppVector)，將NumericVector轉(zhuǎn)換為std::vector。例如：

std::vector<double> vec;
vec = as<std::vector<double>>(x);

在函數(shù)中，可以用wrap()，將std::vector轉(zhuǎn)換為NumericVector。例如：

arma::vec long_vec(16,arma::fill::randn);
vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
NumericVector output = wrap(long_vec2);

在函數(shù)返回時，可以使用wrap()，將C++ STL類型轉(zhuǎn)化為R可識別類型。示例見后面輸入和輸出示例部分。

以上數(shù)據(jù)類型除了Environment之外（Function不確定），大多可直接作為函數(shù)返回值，并被自動轉(zhuǎn)化為R對象。

算數(shù)和邏輯運算符號+, -, *, /, ++, --, pow(x,p), <, <=, >, >=, ==, !=。邏輯關(guān)系符號&&, ||, !。

3. 常用數(shù)據(jù)類型的建立

//1. Vector
NumericVector V1(n);//創(chuàng)立了一個長度為n的默認(rèn)初始化的數(shù)值型向量V1。
NumericVector V2=NumericVector::create(1, 2, 3); //創(chuàng)立了一個數(shù)值型向量V2，并初始化使其含有三個數(shù)1，2，3。
LogicalVector V3=LogicalVector::create(true,false,R_NaN);//創(chuàng)立了一個邏輯型變量V3。如果將其轉(zhuǎn)化為R Object，則其含有三個值TRUE, FALSE, NA。
//2. Matrix
NumericMatrix M1(nrow,ncol);//創(chuàng)立了一個nrow*ncol的默認(rèn)初始化的數(shù)值型矩陣。
//3. Multidimensional Array
NumericVector out=NumericVector(Dimension(2,2,3));//創(chuàng)立了一個多維數(shù)組。然而我不知道有什么卵用。。
//4. List
NumericMatrix y1(2,2);
NumericVector y2(5);
List L=List::create(Named("y1")=y1,
                    Named("y2")=y2);

//5. DataFrame
NumericVector a=NumericVector::create(1,2,3);
CharacterVector b=CharacterVector::create("a","b","c");
std::vector<std::string> c(3);
c[0]="A";c[1]="B";c[2]="C";
DataFrame DF=DataFrame::create(Named("col1")=a,
                               Named("col2")=b,
                               Named("col3")=c);

4. 常用數(shù)據(jù)類型元素訪問

元素訪問	描述
[n]	對于向量類型或者列表，訪問第n個元素。對于矩陣類型，首先把矩陣的下一列接到上一列之下，從而構(gòu)成一個長列向量，并訪問第n個元素。不同于R，n從0開始。
(i,j)	對于矩陣類型，訪問第(i,j)個元素。不同于R，i和j從0開始。不同于向量，此處用圓括號。
List["name1"]/DataFrame["name2"]	訪問List中名為name1的元素/訪問DataFrame中，名為name2的列。

5. 成員函數(shù)

成員函數(shù)	描述
X.size()	返回X的長度；適用于向量或者矩陣，如果是矩陣，則先向量化
X.push_back(a)	將a添加進X的末尾；適用于向量
X.push_front(b)	將b添加進X的開頭；適用于向量
X.ncol()	返回X的列數(shù)
X.nrow()	返回X的行數(shù)

6. 語法糖

6.1 算術(shù)和邏輯運算符

+, -, *, /, pow(x,p), <, <=, >, >=, ==, !=, !

以上運算符均可向量化。

6.2. 常用函數(shù)

is.na()
Produces a logical sugar expression of the same length. Each element of the result expression evaluates to TRUE if the corresponding input is a missing value, or FALSE otherwise.

seq_len()
seq_len( 10 ) will generate an integer vector from 1 to 10 (Note: not from 0 to 9), which is very useful in conjugation withsapply() and lapply().

pmin(a,b) and pmax(a,b)
a and b are two vectors. pmin()(or pmax()) compares the i <script type="math/tex" id="MathJax-Element-1">i</script>th elements of a and b and return the smaller (larger) one.

ifelse()
ifelse( x > y, x+y, x-y ) means if x>y is true, then do the addition; otherwise do the subtraction.

sapply()
sapply applies a C++ function to each element of the given expression to create a new expression. The type of the resulting expression is deduced by the compiler from the result type of the function.

The function can be a free C++ function such as the overload generated by the template function below:

template <typename T>
T square( const T& x){
    return x * x ;
}
sapply( seq_len(10), square<int> ) ;

Alternatively, the function can be a functor whose type has a nested type called result_type

template <typename T>
struct square : std::unary_function<T,T> {
    T operator()(const T& x){
    return x * x ;
    }
}
sapply( seq_len(10), square<int>() ) ;

lappy()
lapply is similar to sapply except that the result is allways an list expression (an expression of type VECSXP).

sign()

其他函數(shù)

數(shù)學(xué)函數(shù): abs(), acos(), asin(), atan(), beta(), ceil(), ceiling(), choose(), cos(), cosh(), digamma(), exp(), expm1(), factorial(), floor(), gamma(), lbeta(), lchoose(), lfactorial(), lgamma(), log(), log10(), log1p(), pentagamma(), psigamma(), round(), signif(), sin(), sinh(), sqrt(), tan(), tanh(), tetragamma(), trigamma(), trunc().
匯總函數(shù): mean(), min(), max(), sum(), sd(), and (for vectors) var()
返回向量的匯總函數(shù): cumsum(), diff(), pmin(), and pmax()
查找函數(shù): match(), self_match(), which_max(), which_min()
重復(fù)值處理函數(shù): duplicated(), unique()

7. STL

Rcpp可以使用C++的標(biāo)準(zhǔn)模板庫STL中的數(shù)據(jù)結(jié)構(gòu)和算法。Rcpp也可以使用Boost中的數(shù)據(jù)結(jié)構(gòu)和算法。

7.1. 迭代器

此處僅僅以一個例子代替，詳細(xì)參見C++ Primer，或者此處。

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum3(NumericVector x) {
  double total = 0;
  NumericVector::iterator it;
  for(it = x.begin(); it != x.end(); ++it) {
    total += *it;
  }
  return total;
}

7.2. 算法

頭文件<algorithm>中提供了許多的算法（可以和迭代器共用），具體可以參見此處。

For example, we could write a basic Rcpp version of findInterval() that takes two arguments a vector of values and a vector of breaks, and locates the bin that each x falls into.

#include <algorithm>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector findInterval2(NumericVector x, NumericVector breaks) {
  IntegerVector out(x.size());
  NumericVector::iterator it, pos;
  IntegerVector::iterator out_it;
  for(it = x.begin(), out_it = out.begin(); it != x.end(); 
      ++it, ++out_it) {
    pos = std::upper_bound(breaks.begin(), breaks.end(), *it);
    *out_it = std::distance(breaks.begin(), pos);
  }
  return out;
}

7.3. 數(shù)據(jù)結(jié)構(gòu)

STL所提供的數(shù)據(jù)結(jié)構(gòu)也是可以使用的，Rcpp知道如何將STL的數(shù)據(jù)結(jié)構(gòu)轉(zhuǎn)換成R的數(shù)據(jù)結(jié)構(gòu)，所以可以從函數(shù)中直接返回他們，而不需要自己進行轉(zhuǎn)換。
具體請參考此處。

7.3.1. Vectors

詳細(xì)信息請參見處此

創(chuàng)建
vector<int>, vector<bool>, vector<double>, vector<String>

元素訪問
利用標(biāo)準(zhǔn)的[]符號訪問元素

元素增加
利用.push_back()增加元素。

存儲空間分配
如果事先知道向量長度，可用.reserve()分配足夠的存儲空間。

例子：

The following code implements run length encoding (rle()). It produces two vectors of output: a vector of values, and a vector lengths giving how many times each element is repeated. It works by looping through the input vector x comparing each value to the previous: if it's the same, then it increments the last value in lengths; if it's different, it adds the value to the end of values, and sets the corresponding length to 1.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List rleC(NumericVector x) {
  std::vector<int> lengths;
  std::vector<double> values;

  // Initialise first value
  int i = 0;
  double prev = x[0];
  values.push_back(prev);
  lengths.push_back(1);

  NumericVector::iterator it;
  for(it = x.begin() + 1; it != x.end(); ++it) {
    if (prev == *it) {
      lengths[i]++;
    } else {
      values.push_back(*it);
      lengths.push_back(1);

      i++;
      prev = *it;
    }
  }
  return List::create(
    _["lengths"] = lengths, 
    _["values"] = values
  );
}

7.3.2. Sets

參見鏈接1，鏈接2和鏈接3。

STL中的集合std::set不允許元素重復(fù)，而std::multiset允許元素重復(fù)。集合對于檢測重復(fù)和確定不重復(fù)的元素具有重要意義((like unique, duplicated, or in))。

Ordered set: std::set和std::multiset。

Unordered set: std::unordered_set
一般而言unordered set比較快，因為它們使用的是hash table而不是tree的方法。
unordered_set<int>, unordered_set<bool>, etc

7.3.3. Maps

與table()和match()關(guān)系密切。

Ordered map: std::map

Unordered map: std::unordered_map

Since maps have a value and a key, you need to specify both types when initialising a map:

map<double, int>, unordered_map<int, double>.

8. 與R環(huán)境的互動

通過EnvironmentRcpp可以獲取當(dāng)前R全局環(huán)境(Global Environment)中的變量和載入的函數(shù)，并可以對全局環(huán)境中的變量進行修改。我們也可以通過Environment獲取其他R包中的函數(shù)，并在Rcpp中使用。

獲取其他R包中的函數(shù)

Rcpp::Environment stats("package:stats");
Rcpp::Function rnorm = stats["rnorm"];
return rnorm(10, Rcpp::Named("sd", 100.0));

獲取R全局環(huán)境中的變量并進行更改
假設(shè)R全局環(huán)境中有一個向量x=c(1,2,3)，我們希望在Rcpp中改變它的值。

Rcpp::Environment global = Rcpp::Environment::global_env();//獲取全局環(huán)境并賦值給Environment型變量global
Rcpp::NumericVector tmp = global["x"];//獲取x
tmp=pow(tmp,2);//平方
global["x"]=tmp;//將新的值賦予到全局環(huán)境中的x

獲取R全局環(huán)境中的載入的函數(shù)
假設(shè)全局環(huán)境中有R函數(shù)funR，其定義為：

x=c(1,2,3);
funR<-function(x){
  return (-x);
}

并有R變量x=c(1,2,3)。我們希望在Rcpp中調(diào)用此函數(shù)并應(yīng)用在向量x上。

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector funC() {
  Rcpp::Environment global =
    Rcpp::Environment::global_env();
  Rcpp::Function funRinC = global["funR"];
  Rcpp::NumericVector tmp = global["x"];
  return funRinC(tmp);
}

9. 用Rcpp創(chuàng)建R包

見此文

Rcpp和RcppArmadillo創(chuàng)建R語言包的實現(xiàn)方式

10. 輸入和輸出示例

如何傳遞數(shù)組

如果要傳遞高維數(shù)組，可以將其存為向量，并附上維數(shù)信息。有兩種方式：

通過.attr("dim")設(shè)置維數(shù)

NumericVector可以包含維數(shù)信息。數(shù)組可以用過NumericVector輸出到R中。此NumericVector可以通過.attr(“dim”)設(shè)置其維數(shù)信息。

// Dimension最多設(shè)置三個維數(shù)
output.attr("dim") = Dimension(3,4,2);
// 可以給.attr(“dim”)賦予一個向量，則可以設(shè)置超過三個維數(shù)
NumericVector dim = NumericVector::create(2,2,2,2);
output.attr("dim") = dim;

示例：

// 返回一個3*3*2數(shù)組
RObject func(){
  arma::vec long_vec(18,arma::fill::randn);
  vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
  NumericVector output = wrap(long_vec2);
  output.attr("dim")=Dimension(3,3,2);
  return wrap(output);
}

// 返回一個2*2*2*2數(shù)組 
// 注意con_to<>::from()
RObject func(){
  arma::vec long_vec(16,arma::fill::randn);
  vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
  NumericVector output = wrap(long_vec2);
  NumericVector dim = NumericVector::create(2,2,2,2);
  output.attr("dim")=dim;
  return wrap(output);
}

另外建立一個向量存維數(shù)，在R中再通過.attr("dim")設(shè)置維數(shù)

函數(shù)返回一維STL vector

自動轉(zhuǎn)化為R中的向量

vector<double> func(NumericVector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return vec;
}
NumericVector func(NumericVector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return wrap(vec);
}
RObject func(NumericVector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return wrap(vec);
}

函數(shù)返回二維STL vector

自動轉(zhuǎn)化為R中的list，list中的每個元素是一個vector。

vector<vector<double>> func(NumericVector x) {
  vector<vector<double>> mat;
  for (int i=0;i!=3;++i){
    mat.push_back(as<vector<double>>(x));
  }
  return mat;
}
RObject func(NumericVector x) {
  vector<vector<double>> mat;
  for (int i=0;i!=3;++i){
    mat.push_back(as<vector<double> >(x));
  }
  return wrap(mat);
}

返回Armadillo matrix, Cube 或 field

自動轉(zhuǎn)化為R中的matrix

NumericMatrix func(){
  arma::mat A(3,4,arma::fill::randu);
  return wrap(A);
}
arma::mat func(){
  arma::mat A(3,4,arma::fill::randu);
  return A;
}

自動轉(zhuǎn)化為R中的三維array

arma::cube func(){
  arma::cube A(3,4,5,arma::fill::randu);
  return A;
}
RObject func(){
  arma::cube A(3,4,5,arma::fill::randu);
  return wrap(A);
}

自動轉(zhuǎn)化為R list，每個元素存儲一個R向量，但此向量有維數(shù)信息（通過.Internal(inspect())查詢）。

RObject func() {
  arma::cube A(3,4,2,arma::fill::randu);
  arma::cube B(3,4,2,arma::fill::randu);
  arma::field <arma::cube> F(2,1);
  F(0)=A;
  F(1)=B;
  return wrap(F);
}