快捷導(dǎo)航

Java及python正則表達(dá)式詳解

更新時(shí)間：2017年12月27日 15:12:32 投稿：mdxy-dxy

正則表達(dá)式有元字符及不同組合來構(gòu)成，通過巧妙的構(gòu)造正則表達(dá)式可以匹配任意字符串，并完成復(fù)雜的字符串處理任務(wù)

正則表達(dá)式語法及常用元字符：

正則表達(dá)式有元字符及不同組合來構(gòu)成，通過巧妙的構(gòu)造正則表達(dá)式可以匹配任意字符串，并完成復(fù)雜的字符串處理任務(wù)。

常用的元字符有：

其中在使用反斜線時(shí)要注意：如果以‘\'開頭的元字符與轉(zhuǎn)義字符相同，則需要使用‘\\'或者原始字符串，在字符串前面加上字符‘r'或‘R'。原始字符串可以減少用戶的輸入，主要用于‘\\'，主要用于正則表達(dá)式和文件路徑字符串，如果字符串以一個(gè)‘\'結(jié)束，則需要多加一個(gè)斜線，以‘\\'結(jié)束。

\ :將下一個(gè)字符標(biāo)記為一個(gè)特殊字符、或一個(gè)原義字符、或一個(gè) 向后引用、或一個(gè)八進(jìn)制轉(zhuǎn)義符。例如，'n' 匹配字符 "n"。'\n' 匹配一個(gè)換行符。序列 '\\' 匹配 "\" 而 "\(" 則匹配 "("。

常用正則表達(dá)式的寫法：

‘[a-zA-Z0-9]'：匹配字母或數(shù)字
‘[^abc]'：匹配除abc之外的字母
‘p(ython|erl)'匹配Python和perl
‘(pattern)*'匹配0次或多次
‘(pattern)+'匹配1次或多次
‘(pattern){m,n}'匹配m_n次
‘(a|b)*c'匹配0-n次a或b后面緊跟c
‘^[a-zA-Z]{1}([a-zA-Z0-9\._]){4,19}$'匹配20個(gè)字符以字母開始
‘^(\w){6,20}$'匹配6-20個(gè)單詞字符
‘^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$'匹配IP
‘^[a-zA-Z]+$'檢查字符中只包含英文字母
‘\w+@(\w+\.)\w+$'匹配郵箱
‘[\u4e00-\u9fa5]'匹配漢字
‘^\d{18|\d{15}$'匹配身份證
‘\d{4}-\d{1,2}-\d{1,2}'匹配時(shí)間
‘^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[,._]).{8,}$)'判斷是否為強(qiáng)密碼
‘(.)\\1+'匹配任意字符的一次或多次出現(xiàn)

re模塊常用方法介紹：

compile(pattern[,flags])創(chuàng)建模式對象
search(pattern,string[,flags])在整個(gè)字符中尋找模式，返回match對象或者None
match(pattern,string[,flags])從字符串的開始處于匹配模式，返回匹配對象
findall(pattern,string[,flags])：列出匹配模式中的所有匹配項(xiàng)
split(pattern,string[,maxsplit=0])根據(jù)匹配模式分割字符串
sub(pat,repl,string[,count=0])將字符串中所有pat匹配項(xiàng)用repl替換
escape(string)將字符中的所有特殊正則表達(dá)式字符轉(zhuǎn)義

match，search，findall區(qū)別

match在字符串開頭或指定位置進(jìn)行搜索，模式必須出現(xiàn)在開頭或指定位置；
search方法在整個(gè)字符串或指定位置進(jìn)行搜索；
findall在字符串中查找所有符合正則表達(dá)式的字符串并以列表返回。

子模式與match對象

正則表達(dá)式中match和search方法匹配成功后都會返回match對象，其中match對象的主要方法有g(shù)roup()（返回匹配的一個(gè)或多個(gè)子模式內(nèi)容），groups()（方法返回一個(gè)包含匹配所有子模式內(nèi)容的元組），groupdict()（方法返回一個(gè)包含匹配所有子模式內(nèi)容的字典），start()（返回子模式內(nèi)容的起始位置），end()（返回子模式內(nèi)容的結(jié)束位置）span()（返回包含指定子模式內(nèi)容起始位置和結(jié)束位置前一個(gè)位置的元組）

代碼演示

>>> import re
>>> m = re.match(r'(\w+) (\w+)','Isaac Newton,physicist')
>>> m.group(0)
'Isaac Newton'
>>> m.group(1)
'Isaac'
>>> m.group(2)
'Newton'
>>> m.group(1,2)
('Isaac', 'Newton')
>>>m=re.match(r'(?P<first_name>\w+)(?P<last_name>\w+)','Malcolm Reynolds')
>>> m.group('first_name')
'Malcolm'
>>> m.group('last_name')
'Reynolds'
>>> m.groups()
('Malcolm', 'Reynolds')
>>> m.groupdict()
{'first_name': 'Malcolm', 'last_name': 'Reynolds'}

驗(yàn)證并理解子模式擴(kuò)展語法的功能

>>> import re
>>> exampleString = '''There should be one--and preferably only one--obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never. 
Although never never is often better than right now.'''
>>> pattern = re.compile(r'(?<=\w\s)never(?=\s\w)')
>>> matchResult = pattern.search(exampleString)
>>> matchResult.span()
(171, 176)
>>> pattern = re.compile(r'(?<=\w\s)never')
>>> matchResult = pattern.search(exampleString)
>>> matchResult.span()
(154, 159)
>>> pattern = re.compile(r'(?:is\s)better(\sthan)')
>>> matchResult = pattern.search(exampleString)
>>> matchResult.span()
(139, 153)
>>> matchResult.group(0)
'is better than'
>>> matchResult.group(1)
' than'
>>> pattern = re.compile(r'\b(?i)n\w+\b')
>>> index = 0
>>> while True:
 matchResult = pattern.search(exampleString,index)
 if not matchResult:
 break
 print(matchResult.group(0),':',matchResult.span(0))
 index = matchResult.end(0)

not : (90, 93)
Now : (135, 138)
never : (154, 159)
never : (171, 176)
never : (177, 182)
now : (210, 213)
>>> pattern = re.compile(r'(?<!not\s)be\b')
>>> index = 0
>>> while True:
 matchResult = pattern.search(exampleString,index)
 if not matchResult:
 break
 print(matchResult.group(0),':',matchResult.span(0))
 index = matchResult.end(0)

be : (13, 15)
>>> exampleString[13:20]
'be one-'
>>> pattern = re.compile(r'(\b\w*(?P<f>\w+)(?P=f)\w*\b)')
>>> index = 0
>>> while True:
 matchResult = pattern.search(exampleString,index)
 if not matchResult:
 break
 print(matchResult.group(0),':',matchResult.group(2))
 index = matchResult.end(0)+1

unless : s
better : t
better : t
>>> s = 'aabc abbcd abccd abbcd abcdd'
>>> p = re.compile(r'(\b\w*(?P<f>\w+)(?P=f)\w*\b)')
>>> p.findall(s)
[('aabc', 'a'), ('abbcd', 'b'), ('abccd', 'c'), ('abbcd', 'b'), ('abcdd', 'd')]

以上就是關(guān)于python正則表達(dá)式的相關(guān)內(nèi)容，更多資料請查看腳本之家以前的文章。

您可能感興趣的文章: