快捷導(dǎo)航

python成長技能之正則表達(dá)式示例詳解

更新時(shí)間：2025年03月05日 09:55:08 作者：杰仔正在努力

這篇文章主要介紹了python正則表達(dá)式的相關(guān)資料,涵蓋了正則表達(dá)式的基本語法、字符匹配、重復(fù)出現(xiàn)數(shù)量、字符集、邊界匹配、組、貪婪與非貪婪匹配等內(nèi)容,并通過實(shí)際例子展示了如何使用正則表達(dá)式進(jìn)行字符串匹配和處理,需要的朋友可以參考下

一、認(rèn)識正則表達(dá)式
二、使用正則表達(dá)式匹配單一字符
三、正則表達(dá)式之重復(fù)出現(xiàn)數(shù)量匹配
四、使用正則表達(dá)式匹配字符集
五、正則表達(dá)式之邊界匹配
六、正則表達(dá)式之組
七、正則表達(dá)式之貪婪與非貪婪
總結(jié)

一、認(rèn)識正則表達(dá)式

什么是正則表達(dá)式
正則表達(dá)式（英語：Regular Expression，常簡寫為regex、regexp或RE），又稱正則表示式、正則表
示法、規(guī)則表達(dá)式、常規(guī)表示法，是計(jì)算機(jī)科學(xué)的一個(gè)概念
正則表達(dá)式的作用
正則表達(dá)式使用單個(gè)字符串來描述、匹配一系列符合某個(gè)句法規(guī)則的字符串。在很多文本編輯器里，正
則表達(dá)式通常被用來檢索、替換那些符合某個(gè)模式的文本
正則表達(dá)式的特點(diǎn)
靈活性、邏輯性和功能性非常強(qiáng)；
可以迅速地用極簡單的方式達(dá)到字符串的復(fù)雜控制

如何在python中使用正則表達(dá)式----findall方法

python中，要使用正則表達(dá)式，需要導(dǎo)入re模塊，基本格式如下：

re.findall(pattern, string, flags=0)

函數(shù)參數(shù)說明

flags可選值如下

舉例，使用findall()方法

import re

str = "hello,my name is jie"

result = re.findall("jie",str)
print(result)

打印結(jié)果

['jie']

在python中使用正則表達(dá)式----match方法

re.match 嘗試從字符串的起始位置匹配一個(gè)模式，如果不是起始位置匹配成功的話，match()就返回none

import re

str = "hello,my name is jie"

# result = re.findall("jie",str)
# print(result)

match = re.match("hello",str)
print(match.group(0))

hello

要獲取匹配的結(jié)果，可以使用group(n),匹配結(jié)果又多個(gè)的時(shí)候，n從0開始遞增
當(dāng)匹配結(jié)果有多個(gè)的時(shí)候，也可以使用groups()一次性獲取所有匹配的結(jié)果

re.search方法

re.search 掃描整個(gè)字符串并返回第一個(gè)成功的匹配

import re 
s = 'hello world hello' 
result = re.search('hello', s) 
print(result.group(0))

二、使用正則表達(dá)式匹配單一字符

使用正則，匹配字符串中所有的數(shù)字

import re 

str = "12hellowordhello12"

result = re.findall("\d",str)
print(result)

打印結(jié)果

['1', '2', '1', '2']

使用正則，匹配字符串中所有的非數(shù)字

import re 

str = "12hellowordhello12"

result = re.findall("\D",str)
print(result)

打印結(jié)果

['h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'd', 'h', 'e', 'l', 'l', 'o']

使用正則匹配換頁符

import re 

str = "12hellowordhello12" + chr(12)

result = re.findall("\f",str)
print(result)

打印結(jié)果

['\x0c']

使用正則，匹配換行符

import re

str = "hello word my name is jie"
result = re.findall("/n",str)
print(result)

打印結(jié)果

[]

三、正則表達(dá)式之重復(fù)出現(xiàn)數(shù)量匹配

匹配0次到無限次

import re

s = "hello world helloo hell"
print(re.findall('hello*', s))

['hello', 'helloo', 'hell']

匹配一次或多次

import re

s = "hello world helloo hell"
print(re.findall('hello+', s))

['hello', 'helloo']

匹配零次或一次

import re

s = "hello world helloo hell"
print(re.findall('hello?', s))

['hello', 'hello', 'hell']

匹配n次

import re

s = "hello world helloo hell helloo hellooo helloo helloo"
print(re.findall('hello{2}', s))

['helloo', 'helloo', 'helloo', 'helloo', 'helloo']

匹配至少n次

import re

s = "hello world helloo hell helloo hellooo helloo helloo"
print(re.findall('hello{2,}', s))

['helloo', 'helloo', 'hellooo', 'helloo', 'helloo']

匹配n次以上，m次以下

import re

s = "hello world helloo hell helloo hellooo helloo helloo"
print(re.findall('hello{2,3}', s))

['helloo', 'helloo', 'hellooo', 'helloo', 'helloo']

四、使用正則表達(dá)式匹配字符集

如果是連續(xù)的范圍，可以使用橫杠-

import re

str = "110,120,130,230,250,160"
result = re.findall("1[1-9]0",str)
print(result)

['110', '120', '130', '160']

表示不是某范圍之內(nèi)的，可以使用^取反

import re

str = "110,120,130,230,250,160"
result = re.findall("1[^1-9]0",str)
print(result)

[]

五、正則表達(dá)式之邊界匹配

匹配整個(gè)字符串開頭

import re

str = "hello jiejie"

result = re.findall("^he",str)
print(result)

['he']

匹配整個(gè)字符串的結(jié)尾位置

import re

str = "hello jiejie e e e"

result = re.findall("e$",str)
print(result)

['e']

匹配單詞開頭

import re

str = "hello jiejie  hel"

result = re.findall(r'\bhe',str)
print(result)

['he', 'he']

六、正則表達(dá)式之組

什么是組

將括號：() 之間的表達(dá)式定義為“組”（group），并且將匹配這個(gè)表達(dá)式的字符保存到一個(gè)臨時(shí)區(qū)域（一個(gè)正則表達(dá)式中最多可以保存9個(gè)），它們可以用 \1 到\9 的符號來引用

捕獲組（Capturing Groups）：
- 使用圓括號 () 定義的組被稱為捕獲組
- 捕獲組可以捕獲匹配的部分，并可以在后續(xù)的處理中引用這些捕獲的內(nèi)容
非捕獲組（Non-Capturing Groups）：
- 使用 (?:…) 定義的組被稱為非捕獲組
- 非捕獲組不會捕獲匹配的部分，僅用于分組和邏輯處理

假設(shè)我們有一個(gè)字符串，包含一些日期格式，如 “2023-10-01”，我們想分別捕獲年、月和日

import re

# 捕獲組示例
text1 = "Today's date is 2023-10-01."
pattern1 = r'(\d{4})-(\d{2})-(\d{2})'
match1 = re.search(pattern1, text1)
if match1:
    year = match1.group(1)
    month = match1.group(2)
    day = match1.group(3)
    print(f'Year: {year}, Month: {month}, Day: {day}')

# 輸出結(jié)果
Year: 2023, Month: 10, Day: 01

代碼解析

text1：輸入字符串，包含日期。
pattern1：正則表達(dá)式模式，用于匹配日期格式。
- (\d{4})：匹配四位數(shù)字（年份），并將其捕獲為第一個(gè)組。
- (\d{2})：匹配兩位數(shù)字（月份），并將其捕獲為第二個(gè)組。
- (\d{2})：匹配兩位數(shù)字（日期），并將其捕獲為第三個(gè)組。
re.search(pattern1, text1)：在 text1 中搜索與 pattern1 匹配的第一個(gè)子串。
match1.group(1)：獲取第一個(gè)捕獲組（年份）。
match1.group(2)：獲取第二個(gè)捕獲組（月份）。
match1.group(3)：獲取第三個(gè)捕獲組（日期）。
print(f’Year: {year}, Month: {month}, Day: {day}')：打印捕獲的年、月、日。

假設(shè)我們有一個(gè)字符串，包含一些電話號碼，格式為 “123-456-7890”，我們想匹配這種格式，但不需要捕獲每個(gè)部分

import re

text = "Phone number: 123-456-7890."
pattern = r'(?:\d{3}-){2}\d{4}'

match = re.search(pattern, text)
if match:
    print(f'Matched phone number: {match.group(0)}')

# 輸出結(jié)果
Matched phone number: 123-456-7890

text2：輸入字符串，包含電話號碼。
pattern2：正則表達(dá)式模式，用于匹配電話號碼格式。
- (?:\d{3}-)：匹配三位數(shù)字后跟一個(gè)連字符，但不捕獲這個(gè)組（非捕獲組）。
- {2}：前面的非捕獲組重復(fù)兩次。
- \d{4}：匹配四位數(shù)字。
re.search(pattern2, text2)：在 text2 中搜索與 pattern2 匹配的第一個(gè)子串。
match2.group(0)：獲取整個(gè)匹配的子串（電話號碼）。
print(f’Matched phone number: {match2.group(0)}')：打印匹配的電話號碼。

假設(shè)我們有一個(gè)字符串，包含一些重復(fù)的單詞，我們想找到這些重復(fù)的單詞

import re

text = "This is a test test of repeated repeated words words."
pattern = r'\b(\w+)\b\s+\1\b'

matches = re.findall(pattern, text, re.IGNORECASE)
if matches:
    print(f'Repeated words: {matches}')

# 輸出結(jié)果
Repeated words: ['test', 'repeated', 'words']

text3：輸入字符串，包含重復(fù)的單詞。
pattern3：正則表達(dá)式模式，用于匹配重復(fù)的單詞。
- \b：單詞邊界。
- (\w+)：匹配一個(gè)或多個(gè)字母或數(shù)字，并將其捕獲為第一個(gè)組。
- \b：單詞邊界。
- \s+：匹配一個(gè)或多個(gè)空白字符。
- \1：反向引用第一個(gè)捕獲組，確保匹配的單詞相同。
- \b：單詞邊界。
re.findall(pattern3, text3, re.IGNORECASE)：在 text3 中查找所有與 pattern3 匹配的子串，忽略大小寫。
matches3：包含所有匹配的重復(fù)單詞。
print(f’Repeated words: {matches3}')：打印所有重復(fù)的單詞。

假設(shè)我們有一個(gè)字符串，包含一些日期格式，如 “2023-10-01”，我們想分別捕獲年、月和日，并使用命名組

import re

text = "Today's date is 2023-10-01."
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'

match = re.search(pattern, text)
if match:
    year = match.group('year')
    month = match.group('month')
    day = match.group('day')
    print(f'Year: {year}, Month: {month}, Day: {day}')

 # 輸出結(jié)果
 Year: 2023, Month: 10, Day: 01

text4：輸入字符串，包含日期。
pattern4：正則表達(dá)式模式，用于匹配日期格式。
- (?P\d{4})：匹配四位數(shù)字（年份），并將其捕獲為名為 year 的組。
- (?P\d{2})：匹配兩位數(shù)字（月份），并將其捕獲為名為 month 的組。
- (?P\d{2})：匹配兩位數(shù)字（日期），并將其捕獲為名為 day 的組。
re.search(pattern4, text4)：在 text4 中搜索與 pattern4 匹配的第一個(gè)子串。
match4.group(‘year’)：獲取名為 year 的捕獲組。
match4.group(‘month’)：獲取名為 month 的捕獲組。
match4.group(‘day’)：獲取名為 day 的捕獲組。
print(f’Year: {year}, Month: {month}, Day: {day}')：打印捕獲的年、月、日。

總結(jié)：

捕獲組：使用 () 定義，可以捕獲匹配的部分
非捕獲組：使用 (?:…) 定義，僅用于分組和邏輯處理
反向引用：使用 \n 引用第 n 個(gè)捕獲組
命名組：使用 (?P…) 定義，可以按名稱引用捕獲組

七、正則表達(dá)式之貪婪與非貪婪

貪婪匹配

默認(rèn)情況下，大多數(shù)量詞都是貪婪的，這意味著它們會盡可能多地匹配字符。例如：

*：匹配前面的表達(dá)式零次或多次
+：匹配前面的表達(dá)式一次或多次
?：匹配前面的表達(dá)式零次或一次
{m,n}：匹配前面的表達(dá)式至少 m 次，最多 n 次

假設(shè)我們有一個(gè)字符串，包含一些 HTML 標(biāo)簽，我們想提取標(biāo)簽內(nèi)的內(nèi)容

import re

text = '<div>Hello</div><div>World</div>'
pattern = r'<div>(.*)</div>'

matches = re.findall(pattern, text)
print(matches)  

# 輸出結(jié)果
['Hello</div><div>World']

在這個(gè)例子中，.* 是貪婪的，它會盡可能多地匹配字符，因此匹配結(jié)果是從第一個(gè) < div>到最后一個(gè)< /div>之間的所有內(nèi)容

非貪婪匹配

非貪婪匹配（也稱為懶惰匹配）是指量詞會盡可能少地匹配字符。非貪婪匹配可以通過在量詞后面加上 ? 來實(shí)現(xiàn)。例如：

*?：匹配前面的表達(dá)式零次或多次，但盡可能少地匹配
+?：匹配前面的表達(dá)式一次或多次，但盡可能少地匹配
??：匹配前面的表達(dá)式零次或一次，但盡可能少地匹配
{m,n}?：匹配前面的表達(dá)式至少 m 次，最多 n 次，但盡可能少地匹配

import re

text = '<div>Hello</div><div>World</div>'
pattern = r'<div>(.*?)</div>'

matches = re.findall(pattern, text)
print(matches)  

# 輸出結(jié)果: 
['Hello', 'World']