腳本之家服務(wù)器常用軟件

快捷導(dǎo)航

Go 語言入門學(xué)習(xí)之正則表達(dá)式

更新時(shí)間：2022年04月25日 10:52:43 作者：宇宙之一粟

這篇文章主要介紹了Go 語言入門學(xué)習(xí)之正則表達(dá)式，文章基于GO語言的相關(guān)資料展開詳細(xì)內(nèi)容介紹，具有一定的參考價(jià)值，需要的小伙伴可以參考一下

前言

在計(jì)算中，我們經(jīng)常需要將特定模式的字符或字符子集匹配為另一個(gè)字符串中的字符串。此技術(shù)用于使用特別的語法來搜索給定字符串中的特定字符集。

如果搜索到的模式匹配，或者在目標(biāo)字符串中找到給定的子集，則搜索被稱為成功；否則被認(rèn)為是不成功的。

什么是正則表達(dá)式

正則表達(dá)式（或 RegEx）是一個(gè)特殊的字符序列，它定義了用于匹配特定文本的搜索模式。在 Golang 中，有一個(gè)內(nèi)置的正則表達(dá)式包: ??regexp?? 包，其中包含所有操作列表，如過濾、修改、替換、驗(yàn)證或提取。

正則表達(dá)式可以用于文本搜索和更高級(jí)的文本操作。正則表達(dá)式內(nèi)置于 grep 和 sed 等工具，vi 和 emacs 等文本編輯器，Go、Java 和 Python 等編程語言中。表達(dá)式的語法主要遵循這些流行語言中使用的已建立的 RE2 語法。 RE2 語法是 PCRE 的一個(gè)子集，有各種注意事項(xiàng)。

MatchString 函數(shù)

??MatchString()?? 函數(shù)報(bào)告作為參數(shù)傳遞的字符串是否包含正則表達(dá)式模式的任何匹配項(xiàng)。

package main
import (
"fmt"
"log"
"regexp"
)
func main() {
words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
for _, word := range words {
found, err := regexp.MatchString(".even", word)
if err != nil {
log.Fatal(err)
}
if found {
fmt.Printf("%s matches\n", word)
} else {
fmt.Printf("%s does not match\n", word)
}
}
}

運(yùn)行該代碼：

Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

但同時(shí)我們能看到編輯器有提示：

Go 語言入門很簡單：正則表達(dá)式_正則表達(dá)式

編譯器已經(jīng)開始提醒我們，??MatchString?? 直接使用性能很差，所以考慮使用 ??regexp.Compile?? 函數(shù)。

Compile 函數(shù)

??Compile?? 函數(shù)解析正則表達(dá)式，如果成功，則返回可用于匹配文本的 Regexp 對象。編譯的正則表達(dá)式產(chǎn)生更快的代碼。

??MustCompile?? 函數(shù)是一個(gè)便利函數(shù)，它編譯正則表達(dá)式并在無法解析表達(dá)式時(shí)發(fā)生 panic。

package main
import (
"fmt"
"log"
"regexp"
)
func main() {
words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
re, err := regexp.Compile(".even")
if err != nil {
log.Fatal(err)
}
for _, word := range words {
found := re.MatchString(word)
if found {
fmt.Printf("%s matches\n", word)
} else {
fmt.Printf("%s does not match\n", word)
}
}
}

在代碼示例中，我們使用了編譯的正則表達(dá)式。

re, err := regexp.Compile(".even")

即使用 ??Compile?? 編譯正則表達(dá)式。然后在返回的正則表達(dá)式對象上調(diào)用 ??MatchString?? 函數(shù)：

found := re.MatchString(word)

運(yùn)行程序，能看到同樣的代碼：

Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

MustCompile 函數(shù)

package main
import (
"fmt"
"regexp"
)
func main() {
words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
re := regexp.MustCompile(".even")
for _, word := range words {
found := re.MatchString(word)
if found {
fmt.Printf("%s matches\n", word)
} else {
fmt.Printf("%s does not match\n", word)
}
}
}

FindAllString 函數(shù)

??FindAllString?? 函數(shù)返回正則表達(dá)式的所有連續(xù)匹配的切片。

package main
import (
"fmt"
"os"
"regexp"
)
func main() {
var content = `Foxes are omnivorous mammals belonging to several genera
of the family Canidae. Foxes have a flattened skull, upright triangular ears,
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every
continent except Antarctica. By far the most common and widespread species of
fox is the red fox.`
re := regexp.MustCompile("(?i)fox(es)?")
found := re.FindAllString(content, -1)
fmt.Printf("%q\n", found)
if found == nil {
fmt.Printf("no match found\n")
os.Exit(1)
}
for _, word := range found {
fmt.Printf("%s\n", word)
}
}

在代碼示例中，我們找到了單詞 fox 的所有出現(xiàn)，包括它的復(fù)數(shù)形式。

re := regexp.MustCompile("(?i)fox(es)?")

使用 (?i) 語法，正則表達(dá)式不區(qū)分大小寫。（es）？表示“es”字符可能包含零次或一次。

found := re.FindAllString(content, -1)

我們使用 ??FindAllString?? 查找所有出現(xiàn)的已定義正則表達(dá)式。第二個(gè)參數(shù)是要查找的最大匹配項(xiàng)； -1 表示搜索所有可能的匹配項(xiàng)。

運(yùn)行結(jié)果：

["Foxes" "Foxes" "Foxes" "fox" "fox"]
Foxes
Foxes
Foxes
fox
fox

FindAllStringIndex 函數(shù)

package main
import (
"fmt"
"regexp"
)
func main() {
var content = `Foxes are omnivorous mammals belonging to several genera
of the family Canidae. Foxes have a flattened skull, upright triangular ears,
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every
continent except Antarctica. By far the most common and widespread species of
fox is the red fox.`
re := regexp.MustCompile("(?i)fox(es)?")
idx := re.FindAllStringIndex(content, -1)
for _, j := range idx {
match := content[j[0]:j[1]]
fmt.Printf("%s at %d:%d\n", match, j[0], j[1])
}
}

在代碼示例中，我們在文本中找到所有出現(xiàn)的 fox 單詞及其索引。

Foxes at 0:5
Foxes at 81:86
Foxes at 196:201
fox at 296:299
fox at 311:314

Split 函數(shù)

??Split?? 函數(shù)將字符串切割成由定義的正則表達(dá)式分隔的子字符串。它返回這些表達(dá)式匹配之間的子字符串切片。

package main
import (
"fmt"
"log"
"regexp"
"strconv"
)
func main() {
var data = `22, 1, 3, 4, 5, 17, 4, 3, 21, 4, 5, 1, 48, 9, 42`
sum := 0
re := regexp.MustCompile(",\s*")
vals := re.Split(data, -1)
for _, val := range vals {
n, err := strconv.Atoi(val)
sum += n
if err != nil {
log.Fatal(err)
}
}
fmt.Println(sum)
}

在代碼示例中，我們有一個(gè)逗號(hào)分隔的值列表。我們從字符串中截取值并計(jì)算它們的總和。

re := regexp.MustCompile(",\s*")

正則表達(dá)式包括一個(gè)逗號(hào)字符和任意數(shù)量的相鄰空格。

vals := re.Split(data, -1)

我們得到了值的一部分。

for _, val := range vals {
n, err := strconv.Atoi(val)
sum += n
if err != nil {
log.Fatal(err)
}
}

我們遍歷切片并計(jì)算總和。切片包含字符串；因此，我們使用 ??strconv.Atoi?? 函數(shù)將每個(gè)字符串轉(zhuǎn)換為整數(shù)。

運(yùn)行代碼：

189

Go 正則表達(dá)式捕獲組

圓括號(hào) () 用于創(chuàng)建捕獲組。這允許我們將量詞應(yīng)用于整個(gè)組或?qū)⒔惶嫦拗茷檎齽t表達(dá)式的一部分。為了找到捕獲組（Go 使用術(shù)語子表達(dá)式），我們使用 ??FindStringSubmatch?? 函數(shù)。

package main
import (
"fmt"
"regexp"
)
func main() {
websites := [...]string{"webcode.me", "zetcode.com", "freebsd.org", "netbsd.org"}
re := regexp.MustCompile("(\w+)\.(\w+)")
for _, website := range websites {
parts := re.FindStringSubmatch(website)
for i, _ := range parts {
fmt.Println(parts[i])
}
fmt.Println("---------------------")
}
}

在代碼示例中，我們使用組將域名分為兩部分。

re := regexp.MustCompile("(\w+)\.(\w+)")

我們用括號(hào)定義了兩個(gè)組。

parts := re.FindStringSubmatch(website)

??FindStringSubmatch?? 返回包含匹配項(xiàng)的字符串切片，包括來自捕獲組的字符串。

運(yùn)行代碼：

$ go run capturegroups.go
webcode.me
webcode
me
---------------------
zetcode.com
zetcode
com
---------------------
freebsd.org
freebsd
org
---------------------
netbsd.org
netbsd
org
---------------------

正則表達(dá)式替換字符串

可以用 ??ReplaceAllString?? 替換字符串。該方法返回修改后的字符串。

package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"regexp"
"strings"
)
func main() {
resp, err := http.Get("http://webcode.me")
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
content := string(body)
re := regexp.MustCompile("<[^>]*>")
replaced := re.ReplaceAllString(content, "")
fmt.Println(strings.TrimSpace(replaced))
}

該示例讀取網(wǎng)頁的 HTML 數(shù)據(jù)并使用正則表達(dá)式去除其 HTML 標(biāo)記。

resp, err := http.Get("http://webcode.me")

我們使用 http 包中的 Get 函數(shù)創(chuàng)建一個(gè) GET 請求。

body, err := ioutil.ReadAll(resp.Body)

我們讀取響應(yīng)對象的主體。

re := regexp.MustCompile("<[^>]*>")

這個(gè)模式定義了一個(gè)匹配 HTML 標(biāo)簽的正則表達(dá)式。

replaced := re.ReplaceAllString(content, "")

我們使用 ReplaceAllString 方法刪除所有標(biāo)簽。

ReplaceAllStringFunc 函數(shù)

??ReplaceAllStringFunc?? 返回一個(gè)字符串的副本，其中正則表達(dá)式的所有匹配項(xiàng)都已替換為指定函數(shù)的返回值。

package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
content := "an old eagle"
re := regexp.MustCompile(`[^aeiou]`)
fmt.Println(re.ReplaceAllStringFunc(content, strings.ToUpper))
}

在代碼示例中，我們將 ??strings.ToUpper?? 函數(shù)應(yīng)用于字符串的所有字符。