快捷導(dǎo)航

Python正則表達(dá)式常用函數(shù)總結(jié)

更新時(shí)間：2017年06月24日 08:56:52 作者：世界看我我看世界

這篇文章主要介紹了Python正則表達(dá)式常用函數(shù),結(jié)合實(shí)例形式總結(jié)分析了Python正則表達(dá)式常用函數(shù)功能、使用方法及相關(guān)注意事項(xiàng),需要的朋友可以參考下

本文實(shí)例總結(jié)了Python正則表達(dá)式常用函數(shù)。分享給大家供大家參考，具體如下：

re.match()

函數(shù)原型：

match(pattern, string, flags=0) Try to apply the pattern at the start of the string,
returning a match object, or None if no match was found.

函數(shù)作用：

re.match函數(shù)嘗試從字符串的開頭開始匹配一個(gè)模式，如果匹配成功，返回一個(gè)匹配成功的對(duì)象，否則返回None。

參數(shù)說明：

pattern：匹配的正則表達(dá)式
string：要匹配的字符串
flags：標(biāo)志位，用于控制正則表達(dá)式的匹配方式。如是否區(qū)分大小寫、是否多行匹配等。

我們可以使用group()或groups()匹配對(duì)象函數(shù)來獲取匹配后的結(jié)果。

group()

group(...)    group([group1, ...]) -> str or tuple.
    Return subgroup(s) of the match by indices or names.
    For 0 returns the entire match.

獲得一個(gè)或多個(gè)分組截獲的字符串；指定多個(gè)參數(shù)時(shí)將以元組形式返回。group1可以使用編號(hào)也可以使用別名；編號(hào)0代表匹配的整個(gè)子串；默認(rèn)返回group(0)；沒有截獲字符串的組返回None；截獲了多次的組返回最后一次截獲的子串。

groups()

groups(...)    groups([default=None]) -> tuple.
    Return a tuple containing all the subgroups of the match, from 1.
    The default argument is used for groups
    that did not participate in the match

以元組形式返回全部分組截獲的字符串。相當(dāng)于調(diào)用group(1,2,…last)。沒有截獲字符串的組以默認(rèn)值None代替。

實(shí)例

import re
line = "This is the last one"
res = re.match( r'(.*) is (.*?) .*', line, re.M|re.I)
if res:
 print "res.group() : ", res.group()
 print "res.group(1) : ", res.group(1)
 print "res.group(2) : ", res.group(2)
 print "res.groups() : ", res.groups()
else:
 print "No match!!"

re.M|re.I：這兩參數(shù)表示多行匹配|不區(qū)分大小寫，同時(shí)生效。

細(xì)節(jié)實(shí)例：

>>> re.match(r'.*','.*g3jl\nok').group()
'.*g3jl'

.（點(diǎn)）表示除換行符以外的任意一個(gè)字符，*（星號(hào)）表示匹配前面一個(gè)字符0次1次或多次，這兩聯(lián)合起來使用表示匹配除換行符意外的任意多個(gè)字符，所以出現(xiàn)以上的結(jié)果。

1、
re.match(r'.*..', '..').group()
'..'
2、
>>> re.match(r'.*g.','.*g3jlok').group()
'.*g3'
3、
>>> re.match(r'.*...', '..').group()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

上面兩例子為什么有結(jié)果呢？這是因?yàn)榈谝粋€(gè)例子.*..中的.*匹配了0次，后面的..匹配字符串中..，而第二個(gè)例子中的 .* 匹配了一次，匹配字符串中的 .*，g匹配了后面的g字符，最后一個(gè).號(hào)匹配了。
為什么第三個(gè)例子沒有匹配到結(jié)果呢？這是因?yàn)榫退阏齽t表達(dá)式中的 .* 匹配0次，后面的三個(gè)點(diǎn)也不能完全匹配原字符串中的兩個(gè)點(diǎn)，所以匹配失敗了。
從上面幾個(gè)例子可以看出，只有當(dāng)正則表達(dá)式中要匹配的字符數(shù)小于等于原字符串中的字符數(shù)，才能匹配出結(jié)果。并且 “.*” 在匹配的過程中會(huì)回溯，先匹配0次，如果整個(gè)表達(dá)式能匹配成功，再匹配一次，如果還是能匹配，那就匹配兩次，這樣一次下去，直到不能匹配成功時(shí)，返回最近一次匹配成功的結(jié)果，這就是”.*”的貪婪性。

匹配Python中的標(biāo)識(shí)符：

>>> re.match(r'^[a-zA-Z|_][\w_]*','_1name1').group()
'_1name1'
>>> re.match(r'^[a-zA-Z|_][\w_]*','_name1').group()
'_name1'
>>> re.match(r'^[a-zA-Z|_][\w_]*','num').group()
'num'
>>> re.match(r'^[a-zA-Z|_][\w_]*','1num').group()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

re.search()

函數(shù)原型：

search(pattern, string, flags=0) Scan through string looking for a match to the pattern,
returning a match object, or None if no match was found.

函數(shù)作用：

掃描整個(gè)字符串并返回第一次成功的匹配對(duì)象，如果匹配失敗，則返回None。

參數(shù)說明：

pattern：匹配的正則表達(dá)式
string：要匹配的字符串
flags：標(biāo)志位，用于控制正則表達(dá)式的匹配方式。如是否區(qū)分大小寫、是否多行匹配等。

跟re.match函數(shù)一樣，使用group()和groups()方法來獲取匹配后的結(jié)果。

>>> re.search(r'[abc]\*\d{2}','12a*23Gb*12ad').group()
'a*23'

從匹配結(jié)果看出，re.search返回了第一次匹配成功的結(jié)果'a*23'，如果盡可能多的匹配的話，還可以匹配后面的'b*12'。

re.match與re.search的區(qū)別

re.match只匹配字符串的開始，如果字符串開始不符合正則表達(dá)式，則匹配失敗，函數(shù)返回None；而re.search匹配整個(gè)字符串，直到找到一個(gè)匹配，否則也返回None。

>>> re.match(r'(.*)(are)',"Cats are smarter than dogs").group(2)
'are'
>>> re.search(r'(are)+',"Cats are smarter than dogs").group()
'are'

上面兩個(gè)例子是等價(jià)的。

re.sub()

Python的re模塊中提供了re.sub()函數(shù)用于替換字符串中的匹配項(xiàng)，如果沒有匹配的項(xiàng)則字符串將沒有匹配的返回。

函數(shù)原型：

sub(pattern, repl, string, count=0, flags=0)    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl. repl can be either a string or a callable;
    if a string, backslash escapes in it are processed. If it is
    a callable, it's passed the match object and must return
    a replacement string to be used.

參數(shù)說明：

pattern：匹配的正則表達(dá)式
repl：用于替換的字符串
string：要被替換的字符串
count：替換的次數(shù)，如果為0表示替換所有匹配到的字串，如果是1表示替換1次等,該參數(shù)必須是非負(fù)整數(shù)，默認(rèn)為0。
flags：標(biāo)志位，用于控制正則表達(dá)式的匹配方式。如是否區(qū)分大小寫、是否多行匹配等。

實(shí)例

將手機(jī)號(hào)的后4位替換成0

>>> re.sub('\d{4}$','0000','13549876489')
'13549870000'

將代碼后面的注釋信息去掉

>>> re.sub('#.*$','', 'num = 0 #a number')
'num = 0 '

re.split()

函數(shù)原型：

split(pattern, string, maxsplit=0, flags=0) Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings.

函數(shù)作用：

分割字符串，將字符串用給定的正則表達(dá)式匹配的字符串進(jìn)行分割，分割后返回結(jié)果list。

參數(shù)說明：

pattern：匹配的正則表達(dá)式
string：被分割的字符串
maxsplit：最大的分割次數(shù)
flags：標(biāo)志位，用于控制正則表達(dá)式的匹配方式。如是否區(qū)分大小寫、是否多行匹配等。

re.findall()

函數(shù)原型：

findall(pattern, string, flags=0)    Return a list of all non-overlapping matches in the string.
    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.
    Empty matches are included in the result.

函數(shù)的作用：

獲取字符串中所有匹配的字符串，并以列表的形式返回。列表中的元素有如下幾種情況：

當(dāng)正則表達(dá)式中含有多個(gè)圓括號(hào)()時(shí)，列表的元素為多個(gè)字符串組成的元組，而且元組中字符串個(gè)數(shù)與括號(hào)對(duì)數(shù)相同，并且字符串排放順序跟括號(hào)出現(xiàn)的順序一致（一般看左括號(hào)'(‘就行），字符串內(nèi)容與每個(gè)括號(hào)內(nèi)的正則表達(dá)式想對(duì)應(yīng)。
當(dāng)正則表達(dá)式中只帶有一個(gè)圓括號(hào)時(shí)，列表中的元素為字符串，并且該字符串的內(nèi)容與括號(hào)中的正則表達(dá)式相對(duì)應(yīng)。（注意：列表中的字符串只是圓括號(hào)中的內(nèi)容，不是整個(gè)正則表達(dá)式所匹配的內(nèi)容。）
當(dāng)正則表達(dá)式中沒有圓括號(hào)時(shí)，列表中的字符串表示整個(gè)正則表達(dá)式匹配的內(nèi)容。

參數(shù)說明：

pattern：匹配的正則表達(dá)式
string：被分割的字符串
flags：標(biāo)志位，用于控制正則表達(dá)式的匹配方式。如是否區(qū)分大小寫、是否多行匹配等。

實(shí)例：

1、匹配字符串中所有含有'oo'字符的單詞

#正則表達(dá)式中沒有括號(hào)
>>> re.findall(r'\w*oo\w*', 'woo this foo is too')
['woo', 'foo', 'too']

從結(jié)果可以看出，當(dāng)正則表達(dá)式中沒有圓括號(hào)時(shí)，列表中的字符串表示整個(gè)正則表達(dá)式匹配的內(nèi)容

2、獲取字符串中所有的數(shù)字字符串

#正則表達(dá)式中只有1個(gè)括號(hào)
>>> re.findall(r'.*?(\d+).*?','adsd12343.jl34d5645fd789')
['12343', '34', '5645', '789']

從上面結(jié)果可以看出，當(dāng)正則表達(dá)式中只帶有一個(gè)圓括號(hào)時(shí)，列表中的元素為字符串，并且該字符串的內(nèi)容與括號(hào)中的正則表達(dá)式相對(duì)應(yīng)。

3、提取字符串中所有的有效的域名地址

#正則表達(dá)式中有多個(gè)括號(hào)時(shí)
>>> add = 'https://www.net.com.edu//action=?asdfsd and other https://www.baidu.com//a=b'
>>> re.findall(r'((w{3}\.)(\w+\.)+(com|edu|cn|net))',add)
[('www.net.com.edu', 'www.', 'com.', 'edu'), ('www.baidu.com', 'www.', 'baidu.','com')]

從執(zhí)行結(jié)果可以看出，正則表達(dá)式中有多個(gè)圓括號(hào)時(shí)，返回匹配成功的列表中的每一個(gè)元素都是由一次匹配成功后，正則表達(dá)式中所有括號(hào)中匹配的內(nèi)容組成的元組。

re.finditer()

函數(shù)原型：

finditer(pattern, string, flags=0) Return an iterator over all non-overlapping matches in the string. For each match, the iterator
returns a match object.
Empty matches are included in the result.

函數(shù)作用：

跟re.findall()函數(shù)一樣，匹配字符串中所有滿足的字串，只是返回的是一個(gè)迭代器，而不是一個(gè)像findall函數(shù)那樣存有所有結(jié)果的list，這個(gè)迭代器里面存的是每一個(gè)結(jié)果的一個(gè)匹配對(duì)象，這樣可以節(jié)省空間，一般用在需要匹配大量的結(jié)果時(shí)，類似于range和xrange的區(qū)別。

參數(shù)說明：

pattern：匹配的正則表達(dá)式
string：被分割的字符串
flags：標(biāo)志位，用于控制正則表達(dá)式的匹配方式。如是否區(qū)分大小寫、是否多行匹配等。

如：匹配字符串中所有的數(shù)字字串

>>> for i in re.finditer(r'\d+','one12two34three56four') :
...  print i.group(),
...
12 34 56

start()

返回匹配的起始位置。如：

>>> re.search(r'\d+', 'asdf13df234').start()

注意，索引位置是從0開始計(jì)數(shù)的。

end()

返回匹配結(jié)束的下一個(gè)位置。如：

>>> re.search(r'\d+', 'asdf13df234').end()

span()

返回匹配的區(qū)間，左閉右開。如：

>>> re.search(r'\d+', 'asdf13df234').span()
(4, 6)

re.compile()

函數(shù)原型：

compile(pattern, flags=0) Compile a regular expression pattern, returning a pattern object.

函數(shù)作用：

編譯一個(gè)正則表達(dá)式語句，并返回編譯后的正則表達(dá)式對(duì)象。
這樣我們就可以將那些經(jīng)常使用的正則表達(dá)式編譯成正則表達(dá)式對(duì)象，可以提高一定的效率。如：
一句話包含五個(gè)英文單詞，長(zhǎng)度不一定，用空格分割，請(qǐng)把五個(gè)單詞匹配出來

>>> s = "this is  a python test"
>>> p = re.compile('\w+') #編譯正則表達(dá)式，獲得其對(duì)象
>>> res = p.findall(s)#用正則表達(dá)式對(duì)象去匹配內(nèi)容
>>> print res
['this', 'is', 'a', 'python', 'test']

PS：這里再為大家提供2款非常方便的正則表達(dá)式工具供大家參考使用：

JavaScript正則表達(dá)式在線測(cè)試工具：
http://tools.jb51.net/regex/javascript

正則表達(dá)式在線生成工具：
http://tools.jb51.net/regex/create_reg

更多關(guān)于Python相關(guān)內(nèi)容可查看本站專題：《Python正則表達(dá)式用法總結(jié)》、《Python數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Python函數(shù)使用技巧總結(jié)》、《Python字符串操作技巧匯總》、《Python入門與進(jìn)階經(jīng)典教程》及《Python文件與目錄操作技巧匯總》

希望本文所述對(duì)大家Python程序設(shè)計(jì)有所幫助。

您可能感興趣的文章: