python爬蟲今日熱榜數(shù)據(jù)到txt文件的源碼

更新時間：2021年02月23日 10:27:08 作者：一個超會寫B(tài)ug的安太狼

這篇文章主要介紹了python爬蟲今日熱榜數(shù)據(jù)到txt文件的源碼,本文給大家介紹的非常詳細，對大家的學習或工作具有一定的參考借鑒價值，需要的朋友可以參考下

今日熱榜：https://tophub.today/

在這里插入圖片描述

爬取數(shù)據(jù)及保存格式：

在這里插入圖片描述

爬取后保存為.txt文件：

在這里插入圖片描述

部分內容：

在這里插入圖片描述

源碼及注釋：

import requests
from bs4 import BeautifulSoup

def download_page(url):
  headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"}
  try:
    r = requests.get(url,timeout = 30,headers=headers)
    return r.text
  except:
    return "please inspect your url or setup"


def get_content(html,tag):
  output = """  排名：{}\n  標題：{} \n  熱度：{}\n  鏈接：{}\n  ------------\n"""
  output2 = """平臺：{}  榜單類型：{}  最近更新：{}\n------------\n"""
  num=[]
  title=[]
  hot=[]
  href=[]
  soup = BeautifulSoup(html, 'html.parser')
  con = soup.find('div',attrs={'class':'bc-cc'})
  con_list = con.find_all('div', class_="cc-cd")
  for i in con_list: 
    author = i.find('div', class_='cc-cd-lb').get_text() # 獲取平臺名字
    time = i.find('div', class_='i-h').get_text() # 獲取最近更新
    link = i.find('div', class_='cc-cd-cb-l').find_all('a') # 獲取所有鏈接 
    gender = i.find('span', class_='cc-cd-sb-st').get_text() # 獲取類型 
    save_txt(tag,output2.format(author, gender,time))
    for k in link:
      href.append(k['href'])
      num.append(k.find('span', class_='s').get_text())
      title.append(str(k.find('span', class_='t').get_text()))
      hot.append(str(k.find('span', class_='e').get_text()))
    for h in range(len(num)): 
      save_txt(tag,output.format(num[h], title[h], hot[h], href[h]))


def save_txt(tag,*args):
  for i in args:
    with open(tag+'.txt', 'a', encoding='utf-8') as f:
      f.write(i)


def main():
  #   綜合  科技  娛樂  社區(qū)    購物   財經(jīng)
  page=['news','tech','ent','community','shopping','finance']
  for tag in page:
    url = 'https://tophub.today/c/{}'.format(tag)
    html = download_page(url)
    get_content(html,tag)

if __name__ == '__main__':
  main()

到此這篇關于python爬蟲今日熱榜數(shù)據(jù)到txt文件的源碼的文章就介紹到這了,更多相關python爬蟲今日熱榜數(shù)據(jù)內容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: