python爬蟲之BeautifulSoup 使用select方法詳解
本文介紹了python爬蟲之BeautifulSoup 使用select方法詳解 ,分享給大家。具體如下:
<html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link1"><!-- Elsie --></a>, <a rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link2">Lacie</a> and <a rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """
我們在寫 CSS 時,標簽名不加任何修飾,類名前加點,id名前加 #,在這里我們也可以利用類似的方法來篩選元素,用到的方法是 soup.select(),返回類型是 list
(1)通過標簽名查找
print soup.select('title') #[<title>The Dormouse's story</title>] print soup.select('a') #[<a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>, <a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link2">Lacie</a>, <a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link3">Tillie</a>] print soup.select('b') #[<b>The Dormouse's story</b>]
(2)通過類名查找
print soup.select('.sister') #[<a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>, <a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link2">Lacie</a>, <a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link3">Tillie</a>]
(3)通過 id 名查找
print soup.select('#link1') #[<a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]
(4)組合查找
組合查找即和寫 class 文件時,標簽名與類名、id名進行的組合原理是一樣的,例如查找 p 標簽中,id 等于 link1的內(nèi)容,二者需要用空格分開
print soup.select('p #link1') #[<a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]
直接子標簽查找
print soup.select("head > title") #[<title>The Dormouse's story</title>]
(5)屬性查找
查找時還可以加入屬性元素,屬性需要用中括號括起來,注意屬性和標簽屬于同一節(jié)點,所以中間不能加空格,否則會無法匹配到。
print soup.select("head > title") #[<title>The Dormouse's story</title>] print soup.select('a[ rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ]') #[<a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]
同樣,屬性仍然可以與上述查找方式組合,不在同一節(jié)點的空格隔開,同一節(jié)點的不加空格
print soup.select('p a[ rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ]') #[<a class="sister" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]
以上就是本文的全部內(nèi)容,希望對大家的學習有所幫助,也希望大家多多支持腳本之家。
相關文章
python連接clickhouse數(shù)據(jù)庫的兩種方式小結(jié)
這篇文章主要介紹了python連接clickhouse數(shù)據(jù)庫的兩種方式小結(jié),具有很好的參考價值,希望對大家有所幫助。如有錯誤或未考慮完全的地方,望不吝賜教2022-05-05python類型強制轉(zhuǎn)換long to int的代碼
python的int型最大值和系統(tǒng)有關,32位和64位系統(tǒng)結(jié)果是不同的,分別為2的31次方減1和2的63次方減1,可以通過sys.maxint查看此值2013-02-02你應該知道的Python3.6、3.7、3.8新特性小結(jié)
這篇文章主要介紹了你應該知道的Python3.6、3.7、3.8新特性小結(jié),文中通過示例代碼介紹的非常詳細,對大家的學習或者工作具有一定的參考學習價值,需要的朋友們下面隨著小編來一起學習學習吧2020-05-05