python的re模塊使用方法詳解

更新時(shí)間：2019年07月26日 11:06:08 作者：bainianminguo

這篇文章主要介紹了python的re模塊使用方法詳解,文中通過示例代碼介紹的非常詳細(xì)，對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下

一、正則表達(dá)式的特殊字符介紹

正則表達(dá)式
^      匹配行首                  
$      匹配行尾                  
.      任意單個(gè)字符          
[]     匹配包含在中括號中的任意字符
[^]     匹配包含在中括號中的字符之外的字符
[-]     匹配指定范圍的任意單個(gè)字符
？     匹配之前項(xiàng)的1次或者0次
+      匹配之前項(xiàng)的1次或者多次
*      匹配之前項(xiàng)的0次或者多次
{n}     匹配之前項(xiàng)的n次
{m,n}    匹配之前項(xiàng)最大n次，最小m次
{n,}    配置之前項(xiàng)至少n次

二、re模塊的方法介紹

1、匹配類方法

a、findall方法

# findall方法，該方法在字符串中查找模式匹配，將所有的匹配字符串以列表的形式返回，如果文本中沒有任何字符串匹配模式，則返回一個(gè)空的列表，
# 如果有一個(gè)子字符串匹配模式，則返回包含一個(gè)元素的列表，所以，無論怎么匹配，我們都可以直接遍歷findall返回的結(jié)果而不會出錯(cuò)，這對工程師
# 編寫程序來說，減少了異常情況的處理，代碼邏輯更加簡潔

# re.findall() 用來輸出所有符合模式匹配的子串
 
re_str = "hello this is python 2.7.13 and python 3.4.5"
 
pattern = "python [0-9]\.[0-9]\.[0-9]"
res = re.findall(pattern=pattern,string=re_str)
print(res)
 
# ['python 2.7.1', 'python 3.4.5']
 
pattern = "python [0-9]\.[0-9]\.[0-9]{2,}"
res = re.findall(pattern=pattern,string=re_str)
print(res)
 
# ['python 2.7.13']
 
 
pattern = "python[0-9]\.[0-9]\.[0-9]{2,}"
res = re.findall(pattern=pattern,string=re_str)
print(res)
 
# []
 
# re.findall() 方法，返回一個(gè)列表，如果匹配到的話，列表中的元素為匹配到的子字符串，如果沒有匹配到，則返回一個(gè)空的列表
 
re_str = "hello this is python 2.7.13 and Python 3.4.5"
 
pattern = "python [0-9]\.[0-9]\.[0-9]"
res = re.findall(pattern=pattern,string=re_str,flags=re.IGNORECASE)
print(res)
 
# ['python 2.7.1', 'Python 3.4.5']
 
# 設(shè)置標(biāo)志flags=re.IGNORECASE，意思為忽略大小寫

b、編譯的方式使用正則表達(dá)式

# 我們一般采用編譯的方式使用python的正則模塊，如果在大量的數(shù)據(jù)量中，編譯的方式使用正則性能會提高很多，具體讀者們可以可以實(shí)際測試
re_str = "hello this is python 2.7.13 and Python 3.4.5"
re_obj = re.compile(pattern = "python [0-9]\.[0-9]\.[0-9]",flags=re.IGNORECASE)
res = re_obj.findall(re_str)
print(res)

c、match方法

# match方法，類似于字符串中的startwith方法，只是match應(yīng)用在正則表達(dá)式中更加強(qiáng)大，更富有表現(xiàn)力，match函數(shù)用以匹配字符串的開始部分，如果模式
# 匹配成功，返回一個(gè)SRE_Match類型的對象，如果模式匹配失敗，則返回一個(gè)None，因此對于普通的前綴匹配，他的用法幾乎和startwith一模一樣，例如我
# 們要判斷data字符串是否以what和是否以數(shù)字開頭

s_true = "what is a boy"
s_false = "What is a boy"
re_obj = re.compile("what")
 
print(re_obj.match(string=s_true))
# <_sre.SRE_Match object; span=(0, 4), match='what'
 
print(re_obj.match(string=s_false))
# None
 
s_true = "123what is a boy"
s_false = "what is a boy"
 
re_obj = re.compile("\d+")
 
print(re_obj.match(s_true))
# <_sre.SRE_Match object; span=(0, 3), match='123'>
 
print(re_obj.match(s_true).start())
# 0
print(re_obj.match(s_true).end())
# 3
print(re_obj.match(s_true).string)
# 123what is a boy
print(re_obj.match(s_true).group())
# 123
 
 
print(re_obj.match(s_false))
# None

d、search方法

# search方法，模式匹配成功后，也會返回一個(gè)SRE_Match對象，search方法和match的方法區(qū)別在于match只能從頭開始匹配，而search可以從
# 字符串的任意位置開始匹配，他們的共同點(diǎn)是，如果匹配成功，返回一個(gè)SRE_Match對象，如果匹配失敗，返回一個(gè)None，這里還要注意，
# search僅僅查找第一次匹配，也就是說一個(gè)字符串中包含多個(gè)模式的匹配，也只會返回第一個(gè)匹配的結(jié)果，如果要返回所有的結(jié)果，最簡單
# 的方法就是findall方法，也可以使用finditer方法

e、finditer方法

# finditer返回一個(gè)迭代器，遍歷迭代器可以得到一個(gè)SRE_Match對象，比如下面的例子

re_str = "what is a different between python 2.7.14 and python 3.5.4"
 
re_obj = re.compile("\d{1,}\.\d{1,}\.\d{1,}")
 
for i in re_obj.finditer(re_str):
  print(i)
 
# <_sre.SRE_Match object; span=(35, 41), match='2.7.14'>
# <_sre.SRE_Match object; span=(53, 58), match='3.5.4'>

2、修改類方法介紹

a、sub方法

# re模塊sub方法類似于字符串中的replace方法，只是sub方法支持使用正則表達(dá)式，所以，re模塊的sub方法使用場景更加廣泛

re_str = "what is a different between python 2.7.14 and python 3.5.4"
 
re_obj = re.compile("\d{1,}\.\d{1,}\.\d{1,}")
 
print(re_obj.sub("a.b.c",re_str,count=1))
# what is a different between python a.b.c and python 3.5.4
 
print(re_obj.sub("a.b.c",re_str,count=2))
# what is a different between python a.b.c and python a.b.c
 
print(re_obj.sub("a.b.c",re_str))
# what is a different between python a.b.c and python a.b.c

b、split方法

# re模塊的split方法和python字符串中的split方法功能是一樣的，都是將一個(gè)字符串拆分成子字符串的列表，區(qū)別在于re模塊的split方法能夠
# 使用正則表達(dá)式
# 比如下面的例子，使用. 空格 : !分割字符串，返回的是一個(gè)列表

re_str = "what is a different between python 2.7.14 and python 3.5.4 USA:NewYork!Zidan.FRA"
 
re_obj = re.compile("[. :!]")
 
print(re_obj.split(re_str))
# ['what', 'is', 'a', 'different', 'between', 'python', '2', '7', '14', 'and', 'python', '3', '5', '4', 'USA', 'NewYork', 'Zidan', 'FRA']

c、大小寫不敏感設(shè)置

# 3、大小寫不敏感
 
# re.compile(flags=re.IGNORECASE)

d、非貪婪匹配

# 4、非貪婪匹配，貪婪匹配總是匹配到最長的那個(gè)字符串，相應(yīng)的，非貪婪匹配是匹配到最小的那個(gè)字符串，只需要在匹配字符串的時(shí)候加一個(gè)？即可
 
# 下面的例子，注意兩個(gè).
s = "Beautiful is better than ugly.Explicit is better than impliciy."
 
 
re_obj = re.compile("Beautiful.*y\.")
 
print(re_obj.findall(s))
# ['Beautiful is better than ugly.Explicit is better than implicit.']
 
re_obj = re.compile("Beautiful.*?\.")
 
print(re_obj.findall(s))
# ['Beautiful is better than ugly.']

e、在正則匹配字符串中加一個(gè)小括號，會有什么的效果呢？

如果是要配置一個(gè)真正的小括號，那么就需要轉(zhuǎn)義符，下面的例子大家仔細(xì)看下，注意下search方法返回的對象的group（1）這個(gè)方法是報(bào)錯(cuò)的

import re
s = "=aa1239d&&& 0a ()--"
 
# obj = re.compile("\(\)")
# search
# rep = obj.search(s)
# print(rep)
# <_sre.SRE_Match object; span=(15, 17), match='()'>
# print(rep.group(1))
# IndexError: no such group
# print(rep.group())
# ()

# findall
 
rep = obj.findall(s)
print(rep)
# ['()']

如果是要返回括號中匹配的字符串中，則該小括號不需要轉(zhuǎn)義符，findall方法返回的是小伙好中匹配到的字符串，search.group（）方法的返回的整個(gè)模式匹配到字符串，search.group(1)這個(gè)是匹配第一個(gè)小括號中的模式匹配到的字符串，search.group(2)這個(gè)是匹配第二個(gè)小括號中的模式匹配到的字符串，以此類推

s = "=aa1239d&&& 0a ()--"
rep = re.compile("\w+(&+)")
 
print(rep.findall(s))
# ['&&&']
print(rep.search(s).group())
# aa1239d&&&
print(rep.search(s).group(1))
# &&&

以上就是本文的全部內(nèi)容，希望對大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章: