python中sql解析庫(kù)sqlparse基本操作指南

更新時(shí)間：2024年08月05日 09:32:25 作者：牧碼文

sqlparse是用于Python的非驗(yàn)證SQL解析器,它提供了對(duì)SQL語(yǔ)句的解析,拆分和格式化的支持,這篇文章主要給大家介紹了關(guān)于python中sql解析庫(kù)sqlparse基本操作指南的相關(guān)資料,需要的朋友可以參考下

前言

sqlparse 是一個(gè) Python 庫(kù)，是一個(gè)用于 Python 的非驗(yàn)證 SQL 解析器, 用于解析 SQL 語(yǔ)句并提供一個(gè)簡(jiǎn)單的 API 來(lái)訪問(wèn)解析后的 SQL 結(jié)構(gòu)?？梢詭椭馕鰪?fù)雜的 SQL 查詢，提取信息，或者對(duì) SQL 語(yǔ)句進(jìn)行一些基本的分析和操作。

一、基本方法:

sqlparse的__init__方法中提供了四個(gè)基礎(chǔ)方法

1.parse(sql)

用于將一個(gè)或多個(gè) SQL 語(yǔ)句的字符串解析成 Python 對(duì)象，這些對(duì)象構(gòu)成了一個(gè)抽象語(yǔ)法樹（AST）
源碼

def parse(sql, encoding=None):
    """Parse sql and return a list of statements.

    :param sql: A string containing one or more SQL statements.
    :param encoding: The encoding of the statement (optional).
    :returns: A tuple of :class:`~sqlparse.sql.Statement` instances.
    """
    return tuple(parsestream(sql, encoding))

按照符號(hào)分割sql后返回一個(gè)元組, 可以遞歸獲取所有的值

import sqlparse

SQL = """CREATE TABLE foo (
                 id integer primary key comment 'id_comm',
                 title varchar(200) not null comment 'id_comm',
                 description text comment 'id_comm');"""

parsed = sqlparse.parse(SQL)[0]

print(parsed)

2.format(sql)

格式化代碼, 返回格式化后的代碼字符串源碼:

def format(sql, encoding=None, **options):
    """Format *sql* according to *options*.

    Available options are documented in :ref:`formatting`.

    In addition to the formatting options this function accepts the
    keyword "encoding" which determines the encoding of the statement.

    :returns: The formatted SQL statement as string.
    """

參數(shù)說(shuō)明:

sql: 需要格式化的 SQL 語(yǔ)句字符串。
reindent=True: 自動(dòng)重新縮進(jìn) SQL 語(yǔ)句，使代碼塊對(duì)齊。
keyword_case=‘upper’: 將 SQL 關(guān)鍵字轉(zhuǎn)換為大寫?？蛇x值有’lower’、‘upper’ 或 ‘capitalize’。
其他可選參數(shù)還包括 indent_width（用于設(shè)置縮進(jìn)的空格數(shù)，默認(rèn)為 2）、wrap_after（設(shè)置換行的字符數(shù)限制）等，以進(jìn)一步定制輸出樣式。

import sqlparse

sql = """select * from tbl where id > 10;"""

format = sqlparse.format(sql, reindent=True, keyword_case='upper')

print(format)

# SELECT *
# FROM tbl
# WHERE id > 10;

3.split()

按照符號(hào)分割sql語(yǔ)句, 返回一個(gè)sql列表源碼:

def split(sql, encoding=None):
    """Split *sql* into single statements.

    :param sql: A string containing one or more SQL statements.
    :param encoding: The encoding of the statement (optional).
    :returns: A list of strings.
    """

import sqlparse

sql = """select * from tbl where id > 10;select * from tbl where id > 20;"""

split = sqlparse.split(sql)

print(split)
# ['select * from tbl where id > 10;', 'select * from tbl where id > 20;']

4.parsestream()

類似parse方法, 流式解析sql, 它的設(shè)計(jì)初衷是為了處理從流式輸入（如文件、網(wǎng)絡(luò)連接或任何可迭代的對(duì)象）讀取的 SQL 代碼，而不是一次性加載整個(gè) SQL 字符串到內(nèi)存中。這樣，在處理大型 SQL 文件或連續(xù)的數(shù)據(jù)流時(shí)，可以更有效地管理內(nèi)存。
源碼:

def parsestream(stream, encoding=None):
    """Parses sql statements from file-like object.

    :param stream: A file-like object.
    :param encoding: The encoding of the stream contents (optional).
    :returns: A generator of :class:`~sqlparse.sql.Statement` instances.
    """

with open('../static/pre_sql.sql', 'r', encoding='utf-8') as file:
    for statement in sqlparse.parse(file):
        print(statement)

二、Token

源碼:

class Token:
    """Base class for all other classes in this module.

    It represents a single token and has two instance attributes:
    ``value`` is the unchanged value of the token and ``ttype`` is
    the type of the token.
    """
    
    def __init__(self, ttype, value):
    value = str(value)
    self.value = value
    self.ttype = ttype
    self.parent = None
    self.is_group = False
    self.is_keyword = ttype in T.Keyword
    self.is_whitespace = self.ttype in T.Whitespace
    self.normalized = value.upper() if self.is_keyword else value

sqlparse.sql.Token: 這是最基本的Token類，表示SQL語(yǔ)句中的一個(gè)原子部分，如一個(gè)單詞或者符號(hào)。它包含以下屬性：

value: 該Token的實(shí)際文本內(nèi)容，比如一個(gè)關(guān)鍵字像SELECT或一個(gè)標(biāo)識(shí)符如表名。
token_type: 表示Token類型的枚舉值，比如Keyword、Identifier、Punctuation等。
position 或 start_pos: 表示Token在原始SQL文本中的起始位置信息，有助于追蹤Token的來(lái)源。
相關(guān)Token子類和概念
sqlparse.sql.Identifier: 專門表示SQL中的標(biāo)識(shí)符，如表名、列名等。這類Token可能會(huì)有額外的屬性來(lái)表示是否為 quoted identifier（被引號(hào)包圍的標(biāo)識(shí)符）。
sqlparse.sql.Keyword: 表示SQL關(guān)鍵字，如SELECT, FROM, WHERE等。
sqlparse.sql.Punctuation: 表示SQL中的標(biāo)點(diǎn)符號(hào)，如逗號(hào),、分號(hào);等。
sqlparse.sql.Comment: 用于表示SQL中的注釋內(nèi)容，可以是行內(nèi)注釋（-- …）或塊注釋（/* … */）。
sqlparse.sql.Comparison: 包含比較操作符（如=, !=, IN, BETWEEN等）以及它們兩邊的操作數(shù)，用于構(gòu)建更復(fù)雜的表達(dá)式分析。
sqlparse.sql.Statement: 表示整個(gè)SQL語(yǔ)句，通常是由多個(gè)Token和其他Statement對(duì)象組成的樹狀結(jié)構(gòu)，便于遞歸遍歷整個(gè)SQL語(yǔ)句的結(jié)構(gòu)。
這里就需要引入sql解析的過(guò)程

sql -> 語(yǔ)法分析器(Lexer) -> Token流 -> 語(yǔ)法分析器(Parse) -> 抽象語(yǔ)法樹(AST) -> 樹結(jié)構(gòu)(Tree Parse)

每個(gè)解析結(jié)果都會(huì)附帶一個(gè)tokens 的屬性，它是一個(gè)生成器，用于迭代解析后的Token序列, 包含了一些類型信息, 其中的類型信息有:

# Special token types
Text = Token.Text
Whitespace = Text.Whitespace
Newline = Whitespace.Newline
Error = Token.Error
# Text that doesn't belong to this lexer (e.g. HTML in PHP)
Other = Token.Other

# Common token types for source code
Keyword = Token.Keyword
Name = Token.Name
Literal = Token.Literal
String = Literal.String
Number = Literal.Number
Punctuation = Token.Punctuation
Operator = Token.Operator
Comparison = Operator.Comparison
Wildcard = Token.Wildcard
Comment = Token.Comment
Assignment = Token.Assignment

# Generic types for non-source code
Generic = Token.Generic
Command = Generic.Command

# String and some others are not direct children of Token.
# alias them:
Token.Token = Token
Token.String = String
Token.Number = Number

# SQL specific tokens
DML = Keyword.DML
DDL = Keyword.DDL
CTE = Keyword.CTE

Text: 基礎(chǔ)文本類型，通常用于表示SQL語(yǔ)句中的普通文本部分。
Whitespace: 空白字符，包括空格、制表符等，用于分隔SQL語(yǔ)句的不同部分。
Newline: 特指換行符，用于標(biāo)識(shí)新的一行開始。
Error: 表示解析過(guò)程中遇到的無(wú)法識(shí)別或錯(cuò)誤的文本。
Other: 表示不屬于當(dāng)前解析器（如SQL解析器）預(yù)期的文本，例如在嵌入式SQL中可能遇到的其他語(yǔ)言（如HTML在PHP中的情況）。
Keyword: SQL關(guān)鍵字，如 SELECT, FROM, WHERE 等。
DML: 數(shù)據(jù)操作語(yǔ)言（Data Manipulation Language）關(guān)鍵字，如 INSERT, UPDATE, DELETE, SELECT。
DDL: 數(shù)據(jù)定義語(yǔ)言（Data Definition Language）關(guān)鍵字，如 CREATE, ALTER, DROP。
CTE: 公共表達(dá)式（Common Table Expression）關(guān)鍵字，如 WITH。
Name: 數(shù)據(jù)庫(kù)對(duì)象名稱，如表名、列名等。
Literal: 字面量值，直接寫在SQL中的數(shù)據(jù)值。
String: 字符串字面量，如 'example string'。
Number: 數(shù)字字面量，如 42, 3.14。
Punctuation: 標(biāo)點(diǎn)符號(hào)，如逗號(hào)、括號(hào)等，用于分隔或包圍SQL的各個(gè)部分。
Operator: 操作符，如 +, -, *, /, = 等。
Comparison: 比較操作符，如 =, !=, <, > 等。
Wildcard: 通配符，如 % 在某些SQL上下文中的使用。
Comment: 注釋，SQL中的單行或多行注釋。
Assignment: 賦值操作符，如 := 在某些SQL方言中用于賦值。
Generic: 通用類型，適用于非特定源代碼的分隔。
Command: 命令，可能特指一些SQL命令或交互式shell命令。

Whitespace：空白字符（如空格、制表符、換行符等）
Keyword：SQL 關(guān)鍵字（如 SELECT、FROM、WHERE 等）
Name：標(biāo)識(shí)符（如表名、列名等）
String.Single：?jiǎn)我?hào)字符串字面量
String.Double：雙引號(hào)字符串字面量（在某些 SQL 方言中用于標(biāo)識(shí)符）
String.Backtick：反引號(hào)字符串字面量（如 MySQL 中的表名和列名）
Identifier: 表示SQL中的標(biāo)識(shí)符，包括但不限于表名、列名、數(shù)據(jù)庫(kù)名等。
Compound: 復(fù)合Token，可能包含多個(gè)子Token，用于更復(fù)雜的結(jié)構(gòu)，如 Case 語(yǔ)句、 When 條件等。
Number.Integer：整數(shù)
Number.Float：浮點(diǎn)數(shù)
Number.Hex：十六進(jìn)制數(shù)
Operator：操作符（如 =、<>、+、- 等）
Punctuation：標(biāo)點(diǎn)符號(hào)（如逗號(hào)、分號(hào)、括號(hào)等）
Comment.Single：?jiǎn)涡凶⑨?br />Comment.Multiline：多行注釋
Wildcard：通配符（如 *）
Function：函數(shù)名（如 COUNT()、MAX() 等）
DML、DDL、DCL 等：表示數(shù)據(jù)操作語(yǔ)言、數(shù)據(jù)定義語(yǔ)言、數(shù)據(jù)控制語(yǔ)言等的高級(jí)分類

三、其他類型

有些屬于token的屬性

但有些不屬于token, 比如Where、IdentifierList、Identifier、Parenthesis、Comment等

sql = 'select 1 as id, name, case when name = "" then 3 else 4 end as score from tbl where id > 10 limit 100'

stmts = sqlparse.parse(sql)[0].tokens

for stmt in stmts:

    print(f"{type(stmt)}::{stmt.ttype}::",stmt)
# <class 'sqlparse.sql.Token'>::Token.Keyword.DML:: select
# <class 'sqlparse.sql.Token'>::Token.Text.Whitespace::  
# <class 'sqlparse.sql.IdentifierList'>::None:: 1 as id, name, case when name = "" then 3 else 4 end as score
# <class 'sqlparse.sql.Token'>::Token.Text.Whitespace::  
# <class 'sqlparse.sql.Token'>::Token.Keyword:: from
# <class 'sqlparse.sql.Token'>::Token.Text.Whitespace::  
# <class 'sqlparse.sql.Identifier'>::None:: tbl
# <class 'sqlparse.sql.Token'>::Token.Text.Whitespace::  
# <class 'sqlparse.sql.Where'>::None:: where id > 10 
# <class 'sqlparse.sql.Token'>::Token.Keyword:: limit
# <class 'sqlparse.sql.Token'>::Token.Text.Whitespace::  
# <class 'sqlparse.sql.Token'>::Token.Literal.Number.Integer:: 100

當(dāng)查詢有多列或者有多表時(shí), 會(huì)將其封裝為IdentifierList, 單表時(shí)候會(huì)被封裝為Identifier, 過(guò)濾條件被封裝為Where, 括號(hào)會(huì)被封裝為Parenthesis, 注釋會(huì)被封裝為Comment

四、案例: 提取所有查詢的字段和表名

import sqlparse
import re

sql = 'insert into table inser_tbl partition (dt = dt) select 1 as id, name, case when （name = "" or name = "") then 3 else 4 end as score from tbl where id > 10 limit 100'

stmts = sqlparse.parse(sql)[0].tokens

cols = []
tbls = []
froms = []
wheres = []
last_key = ''
for stmt in stmts:
    if stmt.value == 'insert' or stmt.value == 'select' or stmt.value == 'from':
        last_key = stmt.value
    # 剔除空格和換行
    if stmt.ttype is sqlparse.tokens.Text.Whitespace:
        continue
    # 關(guān)鍵字
    elif stmt.ttype is sqlparse.tokens.Keyword.DML:
        dml = stmt.value
        last_key = dml
    # 字段
    elif isinstance(stmt, sqlparse.sql.IdentifierList):
        # 判斷上一個(gè)是什么類型
        if last_key == 'select':
            for identifier in stmt.get_identifiers():
                col_name = identifier.value
                if re.search('as', col_name, re.I):
                    col_name = re.search('as (.*)', col_name, re.I).group(1).strip()
                cols.append(col_name)
        elif last_key == 'from':
            for identifier in stmt.get_identifiers():
                froms.append(identifier.value)
        else:
            for identifier in stmt.get_identifiers():
                tbls.append(identifier.value)
    elif isinstance(stmt, sqlparse.sql.Identifier):
        if last_key == 'select':
            cols.append(stmt.value)
        elif last_key == 'from':
            froms.append(stmt.value)
        else:
            tbls.append(stmt.value)
    elif isinstance(stmt, sqlparse.sql.Where):
        wheres.append(stmt.value)
    # 表名
print("cols:", cols)
print("tbls:", tbls)
print("froms:", froms)
print("wheres:", wheres)

# cols: ['id', 'name', 'score']
# tbls: ['inser_tbl']
# froms: ['tbl']
# wheres: ['where id > 10 ']

總結(jié)

到此這篇關(guān)于python中sql解析庫(kù)sqlparse基本操作的文章就介紹到這了,更多相關(guān)python sql解析庫(kù)sqlparse內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

python中sql解析庫(kù)sqlparse基本操作指南

目錄

前言

一、基本方法:

1.parse(sql)

2.format(sql)

3.split()

4.parsestream()

二、Token

三、其他類型

四、案例: 提取所有查詢的字段和表名

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

python中sql解析庫(kù)sqlparse基本操作指南

目錄

前言

一、基本方法:

1.parse(sql)

2.format(sql)

3.split()

4.parsestream()

二、Token

三、其他類型

四、案例: 提取所有查詢的字段和表名

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

一、基本方法:

二、Token

三、其他類型