Python?正則表達(dá)式?re.findall()全面解析

更新時(shí)間：2025年09月16日 11:09:59 作者：doris8204

本文詳解Python?re.findall()方法,用于提取字符串中所有正則匹配項(xiàng),重點(diǎn)涵蓋分組處理、flags參數(shù)（如忽略大小寫(xiě)、多行模式、Unicode支持等）及應(yīng)用場(chǎng)景,如提取郵件、URL、日志信息等,并提醒注意性能與返回值格式差異,感興趣的朋友跟隨小編一起看看吧

基本語(yǔ)法

re.findall(pattern, string, flags=0)

參數(shù)說(shuō)明：

pattern: 正則表達(dá)式模式
string: 要搜索的字符串
flags: 可選標(biāo)志，如 re.IGNORECASE、re.MULTILINE 等

返回值：

如果模式中有分組，返回分組元組的列表
如果模式中沒(méi)有分組，返回所有匹配子串的列表
如果沒(méi)有找到匹配，返回空列表

1. 基本查找用法

示例1：查找所有數(shù)字

import re
text = "There are 3 apples and 5 oranges"
numbers = re.findall(r'\d+', text)
print(numbers)  # 輸出: ['3', '5']

示例2：查找所有單詞

text = "Hello world! Python is awesome."
words = re.findall(r'\w+', text)
print(words)  # 輸出: ['Hello', 'world', 'Python', 'is', 'awesome']

2. 使用分組時(shí)的行為

示例3：無(wú)分組情況

text = "John: 30, Jane: 25, Bob: 42"
ages = re.findall(r': \d+', text)
print(ages)  # 輸出: [': 30', ': 25', ': 42']

示例4：有分組情況（返回分組內(nèi)容）

text = "John: 30, Jane: 25, Bob: 42"
ages = re.findall(r': (\d+)', text)
print(ages)  # 輸出: ['30', '25', '42']

示例5：多個(gè)分組情況

text = "John: 30, Jane: 25, Bob: 42"
info = re.findall(r'(\w+): (\d+)', text)
print(info)  # 輸出: [('John', '30'), ('Jane', '25'), ('Bob', '42')]

3. 使用正則表達(dá)式高級(jí)特性

示例6：非貪婪匹配

html = "<div>content1</div><div>content2</div>"
contents = re.findall(r'<div>(.*?)</div>', html)
print(contents)  # 輸出: ['content1', 'content2']

示例7：使用字符類

text = "Colors: red, green, BLUE, Yellow"
colors = re.findall(r'[a-zA-Z]+', text)
print(colors)  # 輸出: ['Colors', 'red', 'green', 'BLUE', 'Yellow']

示例8：使用邊界匹配

text = "cat category concatenate"
words = re.findall(r'\bcat\b', text)
print(words)  # 輸出: ['cat']

4. 使用標(biāo)志(flags)

示例9：忽略大小寫(xiě)

text = "Apple orange BANANA Grape"
fruits = re.findall(r'[a-z]+', text, re.IGNORECASE)
print(fruits)  # 輸出: ['Apple', 'orange', 'BANANA', 'Grape']

示例10：多行模式

text = """First line
Second line
Third line"""
lines = re.findall(r'^\w+', text, re.MULTILINE)
print(lines)  # 輸出: ['First', 'Second', 'Third']

5. 實(shí)際應(yīng)用場(chǎng)景

示例11：提取電子郵件地址

text = "Contact us at info@example.com or support@test.org"
emails = re.findall(r'[\w\.-]+@[\w\.-]+', text)
print(emails)  # 輸出: ['info@example.com', 'support@test.org']

示例12：提取URL鏈接

text = "Visit https://www.example.com or http://test.org"
urls = re.findall(r'https?://[^\s/$.?#].[^\s]*', text)
print(urls)  # 輸出: ['https://www.example.com', 'http://test.org']

示例13：解析日志文件

log = """
2023-04-15 10:00:00 INFO System started
2023-04-15 10:01:23 ERROR Database connection failed
2023-04-15 10:02:45 WARNING Disk space low
"""
errors = re.findall(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} ERROR (.+)', log)
print(errors)  # 輸出: ['Database connection failed']

示例14：提取HTML標(biāo)簽內(nèi)容

html = "<h1>Title</h1><p>Paragraph 1</p><p>Paragraph 2</p>"
paragraphs = re.findall(r'<p>(.*?)</p>', html)
print(paragraphs)  # 輸出: ['Paragraph 1', 'Paragraph 2']

示例15：提取電話號(hào)碼

text = "Call 123-456-7890 or 987.654.3210"
phones = re.findall(r'\d{3}[-.]\d{3}[-.]\d{4}', text)
print(phones)  # 輸出: ['123-456-7890', '987.654.3210']

示例16：提取貨幣金額

text = "Prices: $12.99, €9.99, ￥1000"
prices = re.findall(r'[\$€￥]\d+\.?\d*', text)
print(prices)  # 輸出: ['$12.99', '€9.99', '￥1000']

示例17：提取日期

text = "Dates: 2023-04-15, 15/04/2023, 04.15.2023"
dates = re.findall(r'\d{4}[-/.]\d{2}[-/.]\d{2}|\d{2}[-/.]\d{2}[-/.]\d{4}', text)
print(dates)  # 輸出: ['2023-04-15', '15/04/2023', '04.15.2023']

6. 高級(jí)技巧

示例18：使用命名分組

text = "User: john_doe, Age: 30; User: jane_smith, Age: 25"
users = re.findall(r'User: (?P<name>\w+), Age: (?P<age>\d+)', text)
print(users)  # 輸出: [('john_doe', '30'), ('jane_smith', '25')]

示例19：復(fù)雜模式匹配

text = "Coordinates: (12.345, -67.890), (0, 42.123)"
coords = re.findall(r'\(([^,]+),\s*([^)]+)\)', text)
print(coords)  # 輸出: [('12.345', '-67.890'), ('0', '42.123')]

示例20：使用正向預(yù)查

text = "apple orange banana grape"
fruits_before_banana = re.findall(r'\w+(?=\sbanana)', text)
print(fruits_before_banana)  # 輸出: ['orange']

7. 注意事項(xiàng)

re.findall() 返回的是字符串列表或元組列表，不是匹配對(duì)象
當(dāng)模式中有分組時(shí)，只返回分組內(nèi)容，不是整個(gè)匹配
對(duì)于大文本，考慮使用 re.finditer() 節(jié)省內(nèi)存
復(fù)雜的正則表達(dá)式可能會(huì)影響性能
處理HTML/XML等結(jié)構(gòu)化數(shù)據(jù)時(shí)，最好使用專門的解析器

示例21：與re.finditer()對(duì)比

text = "Error 404, Error 500, Warning 302"
# 使用 findall
codes = re.findall(r'Error (\d+)', text)
print(codes)  # 輸出: ['404', '500']
# 使用 finditer
for match in re.finditer(r'Error (\d+)', text):
    print(f"Found at {match.start()}-{match.end()}: {match.group(1)}")
# 輸出:
# Found at 0-8: 404
# Found at 10-18: 500

8. 性能優(yōu)化建議

預(yù)編譯常用正則表達(dá)式：

pattern = re.compile(r'\d+')
numbers = pattern.findall(text)

盡量使用具體模式而非寬泛模式：

# 不好的寫(xiě)法
re.findall(r'.*:\s*(.*)', text)
# 更好的寫(xiě)法
re.findall(r'\w+:\s*(\w+)', text)

避免過(guò)度使用回溯：

# 可能導(dǎo)致性能問(wèn)題的寫(xiě)法
re.findall(r'(a+)+$', text)

Pythonre.findall()方法中的 flags 參數(shù)詳解

re.findall() 方法的 flags 參數(shù)可以修改正則表達(dá)式的匹配行為，使其更靈活地適應(yīng)不同的文本處理需求。下面我將詳細(xì)介紹各種 flag 的用法和實(shí)際應(yīng)用場(chǎng)景。

1. 常用 flags 概覽

標(biāo)志常量	簡(jiǎn)寫(xiě)	描述
`re.IGNORECASE`	`re.I`	忽略大小寫(xiě)
`re.MULTILINE`	`re.M`	多行模式，影響 `^` 和 `$`
`re.DOTALL`	`re.S`	使 `.` 匹配包括換行符在內(nèi)的所有字符
`re.VERBOSE`	`re.X`	允許編寫(xiě)更易讀的正則表達(dá)式
`re.ASCII`	`re.A`	使 `\w`, `\W`, `\b`, `\B`, `\d`, `\D`, `\s`, `\S` 只匹配 ASCII 字符
`re.LOCALE`	`re.L`	使 `\w`, `\W`, `\b`, `\B` 依賴當(dāng)前區(qū)域設(shè)置
`re.UNICODE`	`re.U`	使 `\w`, `\W`, `\b`, `\B`, `\d`, `\D`, `\s`, `\S` 匹配 Unicode 字符

2. 各 flag 詳細(xì)說(shuō)明及示例

2.1re.IGNORECASE(re.I) - 忽略大小寫(xiě)

作用：使匹配對(duì)大小寫(xiě)不敏感

示例：

import re
text = "Apple banana ORANGE Grape"
# 不使用 IGNORECASE
result = re.findall(r'apple', text)
print(result)  # []
# 使用 IGNORECASE
result = re.findall(r'apple', text, re.IGNORECASE)
print(result)  # ['Apple']
# 匹配所有水果名（忽略大小寫(xiě)）
fruits = re.findall(r'[a-z]+', text, re.I)
print(fruits)  # ['Apple', 'banana', 'ORANGE', 'Grape']

2.2re.MULTILINE(re.M) - 多行模式

作用：改變 ^ 和 $ 的行為，使它們分別匹配每一行的開(kāi)頭和結(jié)尾

示例：

text = """First line
Second line
Third line"""
# 不使用 MULTILINE
result = re.findall(r'^\w+', text)
print(result)  # ['First']
# 使用 MULTILINE
result = re.findall(r'^\w+', text, re.MULTILINE)
print(result)  # ['First', 'Second', 'Third']
# 匹配每行末尾的單詞
result = re.findall(r'\w+$', text, re.M)
print(result)  # ['line', 'line', 'line']

2.3re.DOTALL(re.S) - 點(diǎn)號(hào)匹配所有模式

作用：使 . 匹配包括換行符在內(nèi)的所有字符

示例：

text = """Start
Middle
End"""
# 不使用 DOTALL
result = re.findall(r'Start.*End', text)
print(result)  # []
# 使用 DOTALL
result = re.findall(r'Start.*End', text, re.DOTALL)
print(result)  # ['Start\nMiddle\nEnd']
# 提取多行注釋內(nèi)容
html = """<!-- 
這是多行
HTML注釋 
-->"""
comment = re.findall(r'<!--(.*?)-->', html, re.DOTALL)
print(comment)  # [' \n這是多行\(zhòng)nHTML注釋 \n']

2.4re.VERBOSE(re.X) - 詳細(xì)模式

作用：允許在正則表達(dá)式中添加空白和注釋，使其更易讀

示例：

# 復(fù)雜的電話號(hào)碼正則表達(dá)式
phone_re = re.compile(r'''
    ^(\+\d{1,3})?       # 國(guó)際區(qū)號(hào)
    [-\s]?              # 分隔符
    (\d{3})             # 前3位
    [-\s]?              # 分隔符
    (\d{3,4})           # 中間3或4位
    [-\s]?              # 分隔符
    (\d{4})             # 最后4位
    $''', re.VERBOSE)
text = "Phone numbers: +86-138-1234-5678, 010-87654321"
numbers = phone_re.findall(text)
print(numbers)  # [('+86', '138', '1234', '5678'), ('', '010', '8765', '4321')]

2.5re.ASCII(re.A) - ASCII 模式

作用：使 \w, \W, \b, \B, \d, \D, \s, \S 只匹配 ASCII 字符

示例：

text = "Python3 中文 Espa?ol café"
# 默認(rèn)模式（Unicode）
result = re.findall(r'\w+', text)
print(result)  # ['Python3', '中文', 'Espa?ol', 'café']
# ASCII 模式
result = re.findall(r'\w+', text, re.ASCII)
print(result)  # ['Python3', 'Espa', 'ol', 'caf']

2.6re.UNICODE(re.U) - Unicode 模式

作用：使 \w, \W, \b, \B, \d, \D, \s, \S 匹配 Unicode 字符（Python 3 默認(rèn)）

示例：

text = "Русский 中文 ελληνικ?"
# 默認(rèn)就是 UNICODE 模式
result = re.findall(r'\w+', text)
print(result)  # ['Русский', '中文', 'ελληνικ?']
# 顯式指定 UNICODE
result = re.findall(r'\w+', text, re.UNICODE)
print(result)  # ['Русский', '中文', 'ελληνικ?']

2.7re.LOCALE(re.L) - 區(qū)域設(shè)置模式

作用：使 \w, \W, \b, \B 依賴當(dāng)前區(qū)域設(shè)置（不推薦使用）

示例：

import locale
# 設(shè)置區(qū)域?yàn)榈抡Z(yǔ)
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
text = "stra?e café"
result = re.findall(r'\w+', text, re.LOCALE)
print(result)  # ['stra?e', 'café']

3. 組合使用多個(gè) flags

可以通過(guò)按位或 (|) 操作符組合多個(gè) flags

示例：

text = """Name: John
AGE: 30
name: Jane
age: 25"""
# 同時(shí)使用 IGNORECASE 和 MULTILINE
results = re.findall(r'^name:\s*(\w+)', text, re.I | re.M)
print(results)  # ['John', 'Jane']
# 解析多行配置項(xiàng)
config = """[Server]
host = example.com
port = 8080
timeout = 30"""
settings = re.findall(
    r'^(\w+)\s*=\s*(.*)$', 
    config, 
    re.MULTILINE | re.VERBOSE
)
print(settings)  # [('host', 'example.com'), ('port', '8080'), ('timeout', '30')]

4. 實(shí)際應(yīng)用場(chǎng)景

4.1 解析日志文件（多行模式）

log = """2023-04-15 10:00:00 [INFO] System started
2023-04-15 10:01:23 [ERROR] Database connection failed
2023-04-15 10:02:45 [WARN] Disk space low"""
# 提取所有錯(cuò)誤日志（帶時(shí)間戳）
errors = re.findall(
    r'^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[ERROR\] (.*)$',
    log,
    re.MULTILINE
)
print(errors)  # [('2023-04-15 10:01:23', 'Database connection failed')]

4.2 提取HTML內(nèi)容（點(diǎn)號(hào)匹配所有）

html = """<div>
    <p>First paragraph</p>
    <p>Second paragraph</p>
</div>"""
# 提取所有段落內(nèi)容
paragraphs = re.findall(
    r'<p>(.*?)</p>',
    html,
    re.DOTALL  # 使 . 匹配換行符
)
print(paragraphs)  # ['First paragraph', 'Second paragraph']

4.3 多語(yǔ)言文本處理（Unicode模式）

text = "English: hello, 中文: 你好, Fran?ais: bonjour"
# 提取所有非ASCII單詞
words = re.findall(
    r'[^\x00-\x7F]+',  # 匹配非ASCII字符
    text,
    re.UNICODE
)
print(words)  # ['你好', 'bonjour']

4.4 復(fù)雜模式匹配（詳細(xì)模式）

# 匹配各種格式的日期
date_re = re.compile(r'''
    ^
    (?:20\d{2}|19\d{2})  # 年份 1900-2099
    [-/.]                # 分隔符
    (?:0[1-9]|1[0-2])    # 月份 01-12
    [-/.]                # 分隔符
    (?:0[1-9]|[12][0-9]|3[01])  # 日 01-31
    $''', re.VERBOSE)
dates = ["2023-04-15", "1999/12/31", "2000.01.01"]
valid_dates = [d for d in dates if date_re.search(d)]
print(valid_dates)  # ['2023-04-15', '1999/12/31', '2000.01.01']

5. 注意事項(xiàng)

flag 的作用范圍：flags 會(huì)影響整個(gè)正則表達(dá)式的行為
flag 的組合：多個(gè) flags 可以組合使用，但要注意它們之間的交互
性能考慮：某些 flags（如 re.UNICODE）可能會(huì)影響性能
預(yù)編譯正則表達(dá)式：頻繁使用的正則表達(dá)式應(yīng)該先編譯再使用
```
pattern = re.compile(r'your_pattern', flags=re.I | re.M)
results = pattern.findall(text)
```
Python 3 的默認(rèn)行為：在 Python 3 中，re.UNICODE 是默認(rèn)啟用的

通過(guò)合理使用這些 flags，你可以編寫(xiě)出更強(qiáng)大、更靈活的正則表達(dá)式，適應(yīng)各種復(fù)雜的文本處理需求。

到此這篇關(guān)于Python 正則表達(dá)式 re.findall()全面解析的文章就介紹到這了,更多相關(guān)Python 正則表達(dá)式 re.findall()內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片