R語言中data.frame的常用操作總結(jié)

更新時間：2021年04月21日 09:02:36 作者：HuskySir

這篇文章主要介紹了R語言中data.frame的常用操作總結(jié)，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧

前言：近段時間學(xué)習(xí)R語言用到最多的數(shù)據(jù)格式就是data.frame，現(xiàn)對data.frame常用操作進(jìn)行總結(jié)，其中函數(shù)大部分來自dplyr包，該包由Hadley Wickham所作，主要用于數(shù)據(jù)的清洗和整理。

一、創(chuàng)建

data.frame創(chuàng)建較為容易，調(diào)用data.frame函數(shù)即可。本文創(chuàng)建一個關(guān)于學(xué)生成績的數(shù)據(jù)框，接下來大部分操作都對該數(shù)據(jù)框進(jìn)行，其中學(xué)生成績隨機(jī)產(chǎn)生

> library(dplyr)       #導(dǎo)入dplyr包
> options(digits = 0)  #保留整數(shù)
> set.seed(1)          #設(shè)置種子函數(shù)
> df <- data.frame(ID = 1:12,                                 #ID
+                  Class = rep(c(1,2,3),4),                   #班級
+                  Chinese = runif(12,min = 0,max = 100),     #語文
+                  Math = runif(12,min = 0,max = 100),        #數(shù)學(xué)
+                  English = runif(12,min = 0,max = 100))     #英語
> for (i in 1:ncol(df)) {
+   df[,i] <- as.integer(df[,i])  #將每列類型變?yōu)閕nteger型
+ }

df結(jié)果如下

> df
   ID Class Chinese Math English
1   1     1      26   68      26
2   2     2      37   38      38
3   3     3      57   76       1
4   4     1      90   49      38
5   5     2      20   71      86
6   6     3      89   99      34
7   7     1      94   38      48
8   8     2      66   77      59
9   9     3      62   93      49
10 10     1       6   21      18
11 11     2      20   65      82
12 12     3      17   12      66

二、查詢

1、查詢某一行或某一列

可通過 data.frame[行號,] 或者 data.frame[,列號] 操作完成

其中 data.frame[行號,] 得到的類型是數(shù)據(jù)框

而 data.frame[,列號] 得到的類型是該列的類型

> df[2,]
  ID Class Chinese Math English
2  2   2     37     38    38
> df[,4]
 [1] 68 38 76 49 71 99 38 77 93 21 65 12

查詢某一列還可以通過 data.frame$列名操作完成

> df$Chinese
 [1] 26 37 57 90 20 89 94 66 62  6 20 17

data.frame[列號] 得到一個僅包含該列內(nèi)容的數(shù)據(jù)框

若要查找符合條件的行，可采用 which() 函數(shù)，得到的類型是數(shù)據(jù)框

> df[which(df$ID == 4),]
  ID Class Chinese Math English
4  4     1      90   49      38

2、查詢某一個值

可通過 data.frame[行號，列號] 或 data.frame[行號，‘列名'] 操作完成

> df[3,4]
[1] 76
> df[3,'Math']
[1] 76

若查找符合條件的值，可采用 which() 函數(shù)

> df[which(df$Chinese == 57),'Math'] #查詢語文成績?yōu)?7的同學(xué)的數(shù)學(xué)成績
[1] 76
> df[which(df$Class == 2),'English'] #查詢班級號為2的同學(xué)的英語成績
[1] 38 86 59 82

三、修改

1、修改某一行或列

> df[1,] <- c(1,2,65,59,73)  #修改第一行
#修改英語成績
> df[,'English'] <- c(23,45,67,87,34,46,87,95,43,76,23,94)

修改后結(jié)果為(1號同學(xué)英語成績先由26修改為73，再修改為23)

> df
   ID Class Chinese Math English
1   1     2      65   59      23
2   2     2      37   38      45
3   3     3      57   76      67
4   4     1      90   49      87
5   5     2      20   71      34
6   6     3      89   99      46
7   7     1      94   38      87
8   8     2      66   77      95
9   9     3      62   93      43
10 10     1       6   21      76
11 11     2      20   65      23
12 12     3      17   12      94

2、修改某一個值

直接將需要修改后的值賦給上述查詢某一個值的操作即可

> df[3,'Chinese'] <- 65 #將3號同學(xué)的語文成績修改為65
#將語文成績低于20的同學(xué)的語文成績修改為20
> df[which(df$Chinese < 20),'Chinese'] <- 20
> df
       ID Class Chinese Math English
1   1     2      65   59      23
2   2     2      37   38      45
3   3     3      65   76      67
4   4     1      90   49      87
5   5     2      20   71      34
6   6     3      89   99      46
7   7     1      94   38      87
8   8     2      66   77      95
9   9     3      62   93      43
10 10     1      20   21      76
11 11     2      20   65      23
12 12     3      20   12      94

3、修改行列名

可用rownames()及colnames()得到數(shù)據(jù)框的行列名，rownames(data.frame)[行號] 或 colnames(data.frame)[列號] 可得到指定位置的行名或者列名，若修改直接賦值給該變量即可

 > colnames(df)               #查詢列名
 [1] "ID"      "Class"   "Chinese" "Math"    "English"
 > colnames(df)[4]            #查詢第4列列名
 [1] "Math"
 > colnames(df)[4] <- "math"  #修改第4列列名為math
 #修改列名
 > colnames(df) <- c("ID","Class","Chinese","Math","English")

四、刪除

刪除行或列，僅需要選出該數(shù)據(jù)框的部分行或列，然后將其賦給該變量即可，其中在列號或行號前添加-表示不選該行或該列，在這里，為了方便接下來的操作，我們將選出后的數(shù)據(jù)框賦給其他變量，要實現(xiàn)刪除操作應(yīng)當(dāng)將選出后的數(shù)據(jù)框賦給自己

#選出df第1、3、5列  ( df <- df[,c(1,3,5)] )
> df.tmp <- df[,c(1,3,5)]
> df.tmp
   ID Chinese English
1   1      65      23
2   2      37      45
3   3      65      67
4   4      90      87
5   5      20      34
6   6      89      46
7   7      94      87
8   8      66      95
9   9      62      43
10 10      20      76
11 11      20      23
12 12      20      94
#刪除df第3行 ( df <- df[-3,] )
> df.tmp <- df[-3,]
> df.tmp
   ID Class Chinese Math English
1   1     2      65   59      23
2   2     2      37   38      45
4   4     1      90   49      87
5   5     2      20   71      34
6   6     3      89   99      46
7   7     1      94   38      87
8   8     2      66   77      95
9   9     3      62   93      43
10 10     1      20   21      76
11 11     2      20   65      23
12 12     3      20   12      94

五、添加

1、添加行

data.frame[新行號,] <- 行值

> df[13,] <- c(13,2,62,19,38) #新增13行數(shù)據(jù)
> df
   ID Class Chinese Math English
1   1     2      65   59      23
2   2     2      37   38      45
3   3     3      65   76      67
4   4     1      90   49      87
5   5     2      20   71      34
6   6     3      89   99      46
7   7     1      94   38      87
8   8     2      66   77      95
9   9     3      62   93      43
10 10     1      20   21      76
11 11     2      20   65      23
12 12     3      20   12      94
13 13     2      62   19      38

若想對行進(jìn)行復(fù)制，可以采用重復(fù)行號的方法

> df <- df[c(1,1:12),]      #復(fù)制第1行1次
> df
    ID Class Chinese Math English
1    1     2      65   59      23
1.1  1     2      65   59      23
2    2     2      37   38      45
3    3     3      65   76      67
4    4     1      90   49      87
5    5     2      20   71      34
6    6     3      89   99      46
7    7     1      94   38      87
8    8     2      66   77      95
9    9     3      62   93      43
10  10     1      20   21      76
11  11     2      20   65      23
12  12     3      20   12      94

可使用rep()函數(shù)方便進(jìn)行多行的復(fù)制

> df <- df[rep(1:12,each = 2),]     #對每行數(shù)據(jù)復(fù)制1次
> df
     ID Class Chinese Math English
1     1     2      65   59      23
1.1   1     2      65   59      23
2     2     2      37   38      45
2.1   2     2      37   38      45
3     3     3      65   76      67
3.1   3     3      65   76      67
4     4     1      90   49      87
4.1   4     1      90   49      87
5     5     2      20   71      34
5.1   5     2      20   71      34
6     6     3      89   99      46
6.1   6     3      89   99      46
7     7     1      94   38      87
7.1   7     1      94   38      87
8     8     2      66   77      95
8.1   8     2      66   77      95
9     9     3      62   93      43
9.1   9     3      62   93      43
10   10     1      20   21      76
10.1 10     1      20   21      76
11   11     2      20   65      23
11.1 11     2      20   65      23
12   12     3      20   12      94
12.1 12     3      20   12      94

還可采用rbind()函數(shù)，后續(xù)會有示例

2、添加列

data.frame$新列名 <- 列值

> df$Physics <- c(23,34,67,23,56,67,78,23,54,56,67,34)
> df
   ID Class Chinese Math English Physics
1   1     2      65   59      23      23
2   2     2      37   38      45      34
3   3     3      65   76      67      67
4   4     1      90   49      87      23
5   5     2      20   71      34      56
6   6     3      89   99      46      67
7   7     1      94   38      87      78
8   8     2      66   77      95      23
9   9     3      62   93      43      54
10 10     1      20   21      76      56
11 11     2      20   65      23      67
12 12     3      20   12      94      34

data.frame[,新列號] <- 列值

> df[,7] <- c(1:12)
> df
   ID Class Chinese Math English Physics V7
1   1     2      65   59      23      23     1
2   2     2      37   38      45      34     2
3   3     3      65   76      67      67     3
4   4     1      90   49      87      23     4
5   5     2      20   71      34      56     5
6   6     3      89   99      46      67     6
7   7     1      94   38      87      78     7
8   8     2      66   77      95      23     8
9   9     3      62   93      43      54     9
10 10     1      20   21      76      56    10
11 11     2      20   65      23      67    11
12 12     3      20   12      94      34    12

還可用dplyr包中的mutate()函數(shù)

> mutate(df,Chemistry = Chinese + Math + English + Physics)
   ID Class Chinese Math English Physics V7 Chemistry
1   1     2      65   59      23      23      1       170
2   2     2      37   38      45      34      2       154
3   3     3      65   76      67      67      3       275
4   4     1      90   49      87      23      4       249
5   5     2      20   71      34      56      5       181
6   6     3      89   99      46      67      6       301
7   7     1      94   38      87      78      7       297
8   8     2      66   77      95      23      8       261
9   9     3      62   93      43      54      9       252
10 10     1      20   21      76      56     10       173
11 11     2      20   65      23      67     11       175
12 12     3      20   12      94      34     12       160

還可采用cbind()函數(shù)，后續(xù)會有示例

六、dplyr包常用函數(shù)

> df  #原數(shù)據(jù)
   ID Class Chinese Math English
1   1     2      65   59      23
2   2     2      37   38      45
3   3     3      65   76      67
4   4     1      90   49      87
5   5     2      20   71      34
6   6     3      89   99      46
7   7     1      94   38      87
8   8     2      66   77      95
9   9     3      62   93      43
10 10     1      20   21      76
11 11     2      20   65      23
12 12     3      20   12      94

1、arrange() 排序

arrange(.data, ...)
arrange(.data, ..., .by_group = FALSE)

> arrange(df,Chinese)  #按語文成績由小到大排序
   ID Class Chinese Math English
1   5     2      20   71      34
2  10     1      20   21      76
3  11     2      20   65      23
4  12     3      20   12      94
5   2     2      37   38      45
6   9     3      62   93      43
7   1     2      65   59      23
8   3     3      65   76      67
9   8     2      66   77      95
10  6     3      89   99      46
11  4     1      90   49      87
12  7     1      94   38      87

函數(shù)中第一個是待排序的數(shù)據(jù)框，之后依次是變量，且變量優(yōu)先級逐漸降低，如語文、數(shù)學(xué)成績進(jìn)行排序

> arrange(df,Chinese,Math)  #依次按語文、數(shù)學(xué)成績由小到大排序
   ID Class Chinese Math English
1  12     3      20   12      94
2  10     1      20   21      76
3  11     2      20   65      23
4   5     2      20   71      34
5   2     2      37   38      45
6   9     3      62   93      43
7   1     2      65   59      23
8   3     3      65   76      67
9   8     2      66   77      95
10  6     3      89   99      46
11  4     1      90   49      87
12  7     1      94   38      87

若想由大到小排序，使用desc()函數(shù)

> arrange(df,desc(Chinese))  #按語文成績由大到小排序
   ID Class Chinese Math English
1   7     1      94   38      87
2   4     1      90   49      87
3   6     3      89   99      46
4   8     2      66   77      95
5   1     2      65   59      23
6   3     3      65   76      67
7   9     3      62   93      43
8   2     2      37   38      45
9   5     2      20   71      34
10 10     1      20   21      76
11 11     2      20   65      23
12 12     3      20   12      94

2、distinct()函數(shù) 去重

distinct(.data, ..., .keep_all = FALSE)

> df1 <- df[rep(1:nrow(df),each = 2),] #將df每行復(fù)制1次
> df1
     ID Class Chinese Math English
1     1     2      65   59      23
1.1   1     2      65   59      23
2     2     2      37   38      45
2.1   2     2      37   38      45
3     3     3      65   76      67
3.1   3     3      65   76      67
4     4     1      90   49      87
4.1   4     1      90   49      87
5     5     2      20   71      34
5.1   5     2      20   71      34
6     6     3      89   99      46
6.1   6     3      89   99      46
7     7     1      94   38      87
7.1   7     1      94   38      87
8     8     2      66   77      95
8.1   8     2      66   77      95
9     9     3      62   93      43
9.1   9     3      62   93      43
10   10     1      20   21      76
10.1 10     1      20   21      76
11   11     2      20   65      23
11.1 11     2      20   65      23
12   12     3      20   12      94
12.1 12     3      20   12      94
> df1 <- distinct(df1)  #去除重復(fù)的行
> df1
   ID Class Chinese Math English
1   1     2      65   59      23
2   2     2      37   38      45
3   3     3      65   76      67
4   4     1      90   49      87
5   5     2      20   71      34
6   6     3      89   99      46
7   7     1      94   38      87
8   8     2      66   77      95
9   9     3      62   93      43
10 10     1      20   21      76
11 11     2      20   65      23
12 12     3      20   12      94

3、group_by()函數(shù) 分組 summarise()函數(shù) 概括

group_by(.data, ..., add = FALSE, .drop = FALSE)
ungroup(x, ...)
summarise(.data, ...)

group_by()與summarise()函數(shù)常連用，用于對不同的分組進(jìn)行操作,在這里再介紹一個管道函數(shù)“%>%”，其作用是把左件的值發(fā)送給右件的表達(dá)式，并作為右件表達(dá)式函數(shù)的第一個參數(shù)

> df %>%
+   group_by(Class) %>%
+   summarise(max = max(Chinese)) #求出按Class分組每組中語文成績最高分
# A tibble: 3 x 2
  Class   max
  <dbl> <dbl>
1     1    94
2     2    66
3     3    89

4、filter()函數(shù) 篩選

filter(.data, ..., .preserve = FALSE)

選出符合條件的行(返回數(shù)據(jù)框格式)

> df %>%
+   group_by(Class) %>%
+   filter(Chinese == max(Chinese))  #選出每個班語文成績最高的學(xué)生的信息
# A tibble: 3 x 5
# Groups:   Class [3]
     ID Class Chinese  Math English
  <dbl> <dbl>   <dbl> <dbl>   <dbl>
1     6     3      89    99      46
2     7     1      94    38      87
3     8     2      66    77      95

5、select()函數(shù) 選擇

select(.data, ...)

> select(df,ID,Chinese,Math,English) #選出df中ID、語文、數(shù)學(xué)、英語數(shù)據(jù)
   ID Chinese Math English
1   1      65   59      23
2   2      37   38      45
3   3      65   76      67
4   4      90   49      87
5   5      20   71      34
6   6      89   99      46
7   7      94   38      87
8   8      66   77      95
9   9      62   93      43
10 10      20   21      76
11 11      20   65      23
12 12      20   12      94

6、rbind()函數(shù)與cbind()函數(shù) 合并

rbind()函數(shù)根據(jù)行進(jìn)行合并，cbind()根據(jù)列進(jìn)行合并

#新建數(shù)據(jù)框df1
> df1 <- data.frame(ID = 13,Class = 2,
Chinese = 65,Math = 26,English = 84)
> df1
  ID Class Chinese Math English
1 13     2      65   26      84
> rbind(df,df1)  #合并df與df1
   ID Class Chinese Math English
1   1     2      65   59      23
2   2     2      37   38      45
3   3     3      65   76      67
4   4     1      90   49      87
5   5     2      20   71      34
6   6     3      89   99      46
7   7     1      94   38      87
8   8     2      66   77      95
9   9     3      62   93      43
10 10     1      20   21      76
11 11     2      20   65      23
12 12     3      20   12      94
13 13     2      65   26      84
> df2 #新建數(shù)據(jù)框df2
   Biological
1          65
2          15
3          35
4          59
5          64
6          34
7          29
8          46
9          32
10         95
11         46
12         23
> cbind(df,df2)  #合并df與df2
   ID Class Chinese Math English Biological
1   1     2      65   59      23         65
2   2     2      37   38      45         15
3   3     3      65   76      67         35
4   4     1      90   49      87         59
5   5     2      20   71      34         64
6   6     3      89   99      46         34
7   7     1      94   38      87         29
8   8     2      66   77      95         46
9   9     3      62   93      43         32
10 10     1      20   21      76         95
11 11     2      20   65      23         46
12 12     3      20   12      94         23

7、join函數(shù) 連接

inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"),...)
left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"),...)
full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
semi_join(x, y, by = NULL, copy = FALSE, ...)
nest_join(x, y, by = NULL, copy = FALSE, keep = FALSE, name = NULL,...)
anti_join(x, y, by = NULL, copy = FALSE, ...)

join函數(shù)類型比較多，這里僅以left_join()函數(shù)舉例

#新建數(shù)據(jù)框Class
> Class <- data.frame(Class = c(1,2,3),class = c('一班','二班','三班'))
> Class
  Class class
1     1  一班
2     2  二班
3     3  三班
> left_join(df,Class,by = 'Class') #基于Class變量左連接df與Class數(shù)據(jù)框
   ID Class Chinese Math English class
1   1     2      65   59      23    二班
2   2     2      37   38      45    二班
3   3     3      65   76      67    三班
4   4     1      90   49      87    一班
5   5     2      20   71      34    二班
6   6     3      89   99      46    三班
7   7     1      94   38      87    一班
8   8     2      66   77      95    二班
9   9     3      62   93      43    三班
10 10     1      20   21      76    一班
11 11     2      20   65      23    二班
12 12     3      20   12      94    三班

left_join()函數(shù)僅保留df對應(yīng)的Class值的數(shù)據(jù)

以上為個人經(jīng)驗，希望能給大家一個參考，也希望大家多多支持腳本之家。如有錯誤或未考慮完全的地方，望不吝賜教。

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

R語言中data.frame的常用操作總結(jié)

一、創(chuàng)建

二、查詢

1、查詢某一行或某一列

2、查詢某一個值

三、修改

1、修改某一行或列

2、修改某一個值

3、修改行列名

四、刪除

五、添加

1、添加行

2、添加列

六、dplyr包常用函數(shù)

1、arrange() 排序

2、distinct()函數(shù) 去重

3、group_by()函數(shù) 分組 summarise()函數(shù) 概括

4、filter()函數(shù) 篩選

5、select()函數(shù) 選擇

6、rbind()函數(shù)與cbind()函數(shù) 合并

7、join函數(shù) 連接

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

R語言中data.frame的常用操作總結(jié)

一、創(chuàng)建

二、 查詢

1、查詢某一行或某一列

2、查詢某一個值

三、修改

1、修改某一行或列

2、修改某一個值

3、修改行列名

四、刪除

五、添加

1、添加行

2、添加列

六、dplyr包常用函數(shù)

1、arrange() 排序

2、distinct()函數(shù) 去重

3、group_by()函數(shù) 分組 summarise()函數(shù) 概括

4、filter()函數(shù) 篩選

5、select()函數(shù) 選擇

6、rbind()函數(shù)與cbind()函數(shù) 合并

7、join函數(shù) 連接

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

一、創(chuàng)建

二、查詢

1、查詢某一行或某一列

2、查詢某一個值

1、修改某一行或列

3、修改行列名

四、刪除

1、添加行

六、dplyr包常用函數(shù)

1、arrange() 排序

2、distinct()函數(shù) 去重

3、group_by()函數(shù) 分組 summarise()函數(shù) 概括

4、filter()函數(shù) 篩選

5、select()函數(shù) 選擇

6、rbind()函數(shù)與cbind()函數(shù) 合并

7、join函數(shù) 連接