Python無(wú)法用requests獲取網(wǎng)頁(yè)源碼的解決方法
最近在抓取http://skell.sketchengine.eu網(wǎng)頁(yè)時(shí),發(fā)現(xiàn)用requests無(wú)法獲得網(wǎng)頁(yè)的全部?jī)?nèi)容,所以我就用selenium先模擬瀏覽器打開(kāi)網(wǎng)頁(yè),再獲取網(wǎng)頁(yè)的源代碼,通過(guò)BeautifulSoup解析后拿到網(wǎng)頁(yè)中的例句,為了能讓循環(huán)持續(xù)進(jìn)行,我們?cè)谘h(huán)體中加了refresh(),這樣當(dāng)瀏覽器得到新網(wǎng)址時(shí)通過(guò)刷新再更新網(wǎng)頁(yè)內(nèi)容,注意為了更好地獲取網(wǎng)頁(yè)內(nèi)容,設(shè)定刷新后停留2秒,這樣可以降低抓不到網(wǎng)頁(yè)內(nèi)容的機(jī)率。為了減少被封的可能,我們還加入了Chrome,請(qǐng)看以下代碼:
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from bs4 import BeautifulSoup import time,re path = Service("D:\\MyDrivers\\chromedriver.exe")# # 配置不顯示瀏覽器 chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') chrome_options.add_argument('User-Agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36') # 創(chuàng)建Chrome實(shí)例 。 driver = webdriver.Chrome(service=path,options=chrome_options) lst=["happy","help","evening","great","think","adapt"] for word in lst: url="https://skell.sketchengine.eu/#result?lang=en&query="+word+"&f=concordance" driver.get(url) # 刷新網(wǎng)頁(yè)獲取新數(shù)據(jù) driver.refresh() time.sleep(2) # page_source——》獲得頁(yè)面源碼 resp=driver.page_source # 解析源碼 soup=BeautifulSoup(resp,"html.parser") table = soup.find_all("td") with open("eps.txt",'a+',encoding='utf-8') as f: f.write(f"\n{word}的例子\n") for i in table[0:6]: text=i.text #替換多余的空格 new=re.sub("\s+"," ",text) #寫(xiě)入txt文本 with open("eps.txt",'a+',encoding='utf-8') as f: f.write(re.sub(r"^(\d+\.)",r"\n\1",new)) driver.close()
1. 為了加快訪問(wèn)速度,我們?cè)O(shè)置不顯示瀏覽器,通過(guò)chrome.options實(shí)現(xiàn)
2. 最近通過(guò)re正則表達(dá)式來(lái)清理格式。
3. 我們?cè)O(shè)置table[0:6]來(lái)獲取前三個(gè)句子的內(nèi)容,最后顯示結(jié)果如下。
happy的例子
1. This happy mood lasted roughly until last autumn.
2. The lodging was neither convenient nor happy .
3. One big happy family "fighting communism".
help的例子
1. Applying hot moist towels may help relieve discomfort.
2. The intense light helps reproduce colors more effectively.
3. My survival route are self help books.
evening的例子
1. The evening feast costs another $10.
2. My evening hunt was pretty flat overall.
3. The area nightclubs were active during evenings .
great的例子
1. The three countries represented here are three great democracies.
2. Our three different tour guides were great .
3. Your receptionist "crew" is great !
think的例子
1. I said yes immediately without thinking everything through.
2. This book was shocking yet thought provoking.
3. He thought "disgusting" was more appropriate.
adapt的例子
1. The novel has been adapted several times.
2. There are many ways plants can adapt .
3. They must adapt quickly to changing deadlines.
補(bǔ)充:經(jīng)過(guò)代碼的優(yōu)化以后,例句的爬取更加快捷,代碼如下:
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from bs4 import BeautifulSoup import time,re import os # 配置模擬瀏覽器的位置 path = Service("D:\\MyDrivers\\chromedriver.exe")# # 配置不顯示瀏覽器 chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') chrome_options.add_argument('User-Agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36') # 創(chuàng)建Chrome實(shí)例 。 def get_wordlist(): wordlist=[] with open("wordlist.txt",'r',encoding='utf-8') as f: lines=f.readlines() for line in lines: word=line.strip() wordlist.append(word) return wordlist def main(lst): driver = webdriver.Chrome(service=path,options=chrome_options) for word in lst: url="https://skell.sketchengine.eu/#result?lang=en&query="+word+"&f=concordance" driver.get(url) driver.refresh() time.sleep(2) # page_source——》頁(yè)面源碼 resp=driver.page_source # 解析源碼 soup=BeautifulSoup(resp,"html.parser") table = soup.find_all("td") with open("examples.txt",'a+',encoding='utf-8') as f: f.writelines(f"\n{word}的例子\n") for i in table[0:6]: text=i.text new=re.sub("\s+"," ",text) with open("eps.txt",'a+',encoding='utf-8') as f: f.write(new) # f.writelines(re.sub("(\.\s)(\d+\.)","\1\n\2",new)) if __name__=="__main__": lst=get_wordlist() main(lst) os.startfile("examples.txt")
總結(jié)
到此這篇關(guān)于Python無(wú)法用requests獲取網(wǎng)頁(yè)源碼的文章就介紹到這了,更多相關(guān)requests獲取網(wǎng)頁(yè)源碼內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
對(duì)Python 除法負(fù)數(shù)取商的取整方式詳解
今天小編就為大家分享一篇對(duì)Python 除法負(fù)數(shù)取商的取整方式詳解,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2018-12-12Pandas時(shí)間序列:重采樣及頻率轉(zhuǎn)換方式
今天小編就為大家分享一篇Pandas時(shí)間序列:重采樣及頻率轉(zhuǎn)換方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-12-122021年最新用于圖像處理的Python庫(kù)總結(jié)
為了快速地處理大量信息,科學(xué)家需要利用圖像準(zhǔn)備工具來(lái)完成人工智能和深度學(xué)習(xí)任務(wù).在本文中,我將深入研究Python中最有用的圖像處理庫(kù),這些庫(kù)正在人工智能和深度學(xué)習(xí)任務(wù)中得到大力利用.我們開(kāi)始吧,需要的朋友可以參考下2021-06-06Python實(shí)現(xiàn)批量生成,重命名和刪除word文件
這篇文章主要為大家詳細(xì)介紹了Python如何利用第三方庫(kù)實(shí)現(xiàn)批量生成、重命名和刪除word文件的功能,文中的示例代碼講解詳細(xì),需要的可以參考一下2023-03-03利用Python繪制有趣的萬(wàn)圣節(jié)南瓜怪效果
這篇文章主要介紹了用Python繪制有趣的萬(wàn)圣節(jié)南瓜怪效果,本文實(shí)例圖文相結(jié)合給大家介紹的非常詳細(xì),具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2019-10-10Python基于argparse與ConfigParser庫(kù)進(jìn)行入?yún)⒔馕雠cini parser
這篇文章主要介紹了Python基于argparse與ConfigParser庫(kù)進(jìn)行入?yún)⒔馕雠cini parser,幫助大家更好的理解和使用python,感興趣的朋友可以了解下2021-02-02Python應(yīng)用03 使用PyQT制作視頻播放器實(shí)例
本篇文章主要介紹了Python使用PyQT制作視頻播放器實(shí)例,具有一定的參考價(jià)值,有興趣的可以了解一下。2016-12-12Python 獲取numpy.array索引值的實(shí)例
今天小編就為大家分享一篇Python 獲取numpy.array索引值的實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-12-12