Python全面解析xml文件

更新時間：2024年02月10日 09:48:22 作者：AllardZhao

這篇文章主要介紹了Python全面解析xml文件方式,具有很好的參考價值,希望對大家有所幫助,如有錯誤或未考慮完全的地方,望不吝賜教

如何解析簡單的xml文檔？

實際案例

xml是一種十分常用的標記性語言，可提供統(tǒng)一的方法來描述應用程序的結(jié)構(gòu)化數(shù)據(jù)：

   <?xml version="1.0" encoding="utf-8" ?>
    <data>
        <country name="Liechtenstein">
            <rank updated="yes">2</rank>
            <year>2008</year>
            <gdppc>141100</gdppc>
            <neighbor name="Austria" direction="E"/>
            <neighbor name="Switzerland" direction="W"/>
        </country>
    </data>

python中如何解析xml文檔？

解決方案

使用標準庫中的xml.etree.ElementTree，其中的parse函數(shù)可以解析XML文檔。

代碼演示

（1）使用parse解析XML文檔

from xml.etree.ElementTree import parse
 
f = open('demo.xml')
# 第1個參數(shù)為輸入源，返回一個ElementTree對象
et = parse(f)
# 通過元素樹(ElementTree)得到根結(jié)點
root = et.getroot()
print(root)
# 查看標簽
print(root.tag)
# 查看屬性
print(root.attrib)
# 查看文本,去除空格
print(root.text.strip())
 
# 遍歷元素樹
# 得到節(jié)點的子元素,python3中g(shù)etchildren被廢棄
children = list(root)
print(children)
# 獲取每個子節(jié)點元素的屬性
for child in root:
    print(child.get('name'))
'''
find、findall和iterfind只能找對于
當前的元素它的直接子元素,不能查找孫子元素。
    
'''
# 根據(jù)標簽尋找子元素,find總是找到第1個碰到的元素
print(root.find('country'))
# findall是找到所有的的元素
print(root.findall('country'))
# 不需要列表，希望是一個可迭代對象,得到一個生成器對象
print(root.iterfind('country'))
 
for e in root.iterfind('country'):
    print(e.get('name'))
 
# 無論在那個層級下都能找到rank標簽
# 在默認情況下不輸入?yún)?shù)，會列出整個當前節(jié)點之下的所有元素
print(list(root.iter()))
# 遞歸的去尋找標簽為rank的子節(jié)點
print(list(root.iter('rank')))

（2）關(guān)于findall查找的高級用法

from xml.etree.ElementTree import parse
 
f = open('demo.xml')
# 第1個參數(shù)為輸入源，返回一個ElementTree對象
et = parse(f)
# 通過元素樹(ElementTree)得到根結(jié)點
root = et.getroot()
 
# *能匹配所有的child,只想找root的所有孫子節(jié)點
print(root.findall('country/*'))
# 查找任意層次下的子元素，.點為當前節(jié)點，..為父節(jié)點
print(root.findall('.//rank'))
print(root.findall('.//rank/..'))
# @描述包含某一屬性，[@attrib]
print(root.findall('country[@name]'))
# 指定屬性為特定值，[@attrib='value']
print(root.findall('country[@name="Singapore"]'))
# 指定一個元素必須包含一個指定的子元素，[tag]
print(root.findall('country[rank]'))
# 指定元素的文本必須等于特定的值，[tag='text']
print(root.findall('country[rank="5"]'))
# 找多個元素路徑指定相對位置，[position]
print(root.findall('country[1]'))
print(root.findall('country[2]'))
# last()為倒著找
print(root.findall('country[last()]'))
# 找倒數(shù)第二個
print(root.findall('country[last()-1]'))