Python模擬登陸淘寶并統(tǒng)計(jì)淘寶消費(fèi)情況的代碼實(shí)例分享
支付寶十年賬單上的數(shù)字有點(diǎn)嚇人,但它統(tǒng)計(jì)的項(xiàng)目太多,只是想看看到底單純?cè)谔詫毶现С隽硕嗌伲谑菍?xiě)了段腳本,統(tǒng)計(jì)任意時(shí)間段淘寶訂單的消費(fèi)情況,看那結(jié)果其實(shí)在淘寶上我還是相當(dāng)節(jié)約的說(shuō)。
腳本的主要工作是模擬了瀏覽器登錄,解析“已買(mǎi)到的寶貝”頁(yè)面以獲得指定的訂單及寶貝信息。
使用方法見(jiàn)代碼或執(zhí)行命令加參數(shù)-h,另外需要BeautifulSoup4支持,BeautifulSoup的官方項(xiàng)目列表頁(yè):https://www.crummy.com/software/BeautifulSoup/bs4/download/
首先來(lái)說(shuō)一下代碼使用方法:
python taobao.py -u USERNAME -p PASSWORD -s START-DATE -e END-DATE --verbose
所有參數(shù)均可選,如:
python taobao.py -u jinnlynn
統(tǒng)計(jì)用戶(hù)jinnlynn所有訂單的情況
python taobao.py -s 2014-12-12 -e 2014-12-12
統(tǒng)計(jì)用戶(hù)(用戶(hù)名在命令執(zhí)行時(shí)會(huì)要求輸入)在2014-12-12當(dāng)天的訂單情況
python taobao.py --verbose
這樣就可以統(tǒng)計(jì)并輸出訂單明細(xì)。
好了,說(shuō)了這么多我們就來(lái)看代碼吧:
from __future__ import unicode_literals, print_function, absolute_import, division import urllib import urllib2 import urlparse import cookielib import re import sys import os import json import subprocess import argparse import platform from getpass import getpass from datetime import datetime from pprint import pprint try: from bs4 import BeautifulSoup except ImportError: sys.exit('BeautifulSoup4 missing.') __version__ = '1.0.0' __author__ = 'JinnLynn' __copyright__ = 'Copyright (c) 2014 JinnLynn' __license__ = 'The MIT License' HEADERS = { 'x-requestted-with' : 'XMLHttpRequest', 'Accept-Language' : 'zh-cn', 'Accept-Encoding' : 'gzip, deflate', 'ContentType' : 'application/x-www-form-urlencoded; chartset=UTF-8', 'Cache-Control' : 'no-cache', 'User-Agent' :'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.38 Safari/537.36', 'Connection' : 'Keep-Alive' } DEFAULT_POST_DATA = { 'TPL_username' : '', #用戶(hù)名 'TPL_password' : '', #密碼 'TPL_checkcode' : '', 'need_check_code' : 'false', 'callback' : '0', # 有值返回JSON } # 無(wú)效訂單狀態(tài) INVALID_ORDER_STATES = [ 'CREATE_CLOSED_OF_TAOBAO', # 取消 'TRADE_CLOSED', # 訂單關(guān)閉 ] LOGIN_URL = 'https://login.taobao.com/member/login.jhtml' RAW_IMPUT_ENCODING = 'gbk' if platform.system() == 'Windows' else 'utf-8' def _request(url, data, method='POST'): if data: data = urllib.urlencode(data) if method == 'GET': if data: url = '{}?{}'.format(url, data) data = None # print(url) # print(data) req = urllib2.Request(url, data, HEADERS) return urllib2.urlopen(req) def stdout_cr(msg=''): sys.stdout.write('\r{:10}'.format(' ')) sys.stdout.write('\r{}'.format(msg)) sys.stdout.flush() def get(url, data=None): return _request(url, data, method='GET') def post(url, data=None): return _request(url, data, method='POST') def login_post(data): login_data = DEFAULT_POST_DATA login_data.update(data) res = post(LOGIN_URL, login_data) return json.load(res, encoding='gbk') def login(usr, pwd): data = { 'TPL_username' : usr.encode('utf-8' if platform.system() == 'Windows' else 'GB18030'), 'TPL_password' : pwd } # 1. 嘗試登錄 ret = login_post(data) while not ret.get('state', False): code = ret.get('data', {}).get('code', 0) if code == 3425 or code == 1000: print('INFO: {}'.format(ret.get('message'))) check_code = checkcode(ret.get('data', {}).get('ccurl')) data.update({'TPL_checkcode' : check_code, 'need_check_code' : 'true'}) ret = login_post(data) else: sys.exit('ERROR. code: {}, message:{}'.format(code, ret.get('message', ''))) token = ret.get('data', {}).get('token') print('LOGIN SUCCESS. token: {}'.format(token)) # 2. 重定向 # 2.1 st值 res = get('https://passport.alipay.com/mini_apply_st.js', { 'site' : '0', 'token' : token, 'callback' : 'stCallback4'}) content = res.read() st = re.search(r'"st":"(\S*)"( |})', content).group(1) # 2.1 重定向 get('http://login.taobao.com/member/vst.htm', {'st' : st, 'TPL_uesrname' : usr.encode('GB18030')}) def checkcode(url): filename, _ = urllib.urlretrieve(url) if not filename.endswith('.jpg'): old_fn = filename filename = '{}.jpg'.format(filename) os.rename(old_fn, filename) if platform.system() == 'Darwin': # mac 下直接preview打開(kāi) subprocess.call(['open', filename]) elif platform.system() == 'Windows': # windows 執(zhí)行文件用默認(rèn)程序打開(kāi) subprocess.call(filename, shell=True) else: # 其它系統(tǒng) 輸出文件名 print('打開(kāi)該文件獲取驗(yàn)證碼: {}'.format(filename)) return raw_input('輸入驗(yàn)證碼: '.encode(RAW_IMPUT_ENCODING)) def parse_bought_list(start_date=None, end_date=None): url = 'http://buyer.trade.taobao.com/trade/itemlist/list_bought_items.htm' # 運(yùn)費(fèi)險(xiǎn) 增值服務(wù) 分段支付(定金,尾款) extra_service = ['freight-info', 'service-info', 'stage-item'] stdout_cr('working... {:.0%}'.format(0)) # 1. 解析第一頁(yè) res = urllib2.urlopen(url) soup = BeautifulSoup(res.read().decode('gbk')) # 2. 獲取頁(yè)數(shù)相關(guān) page_jump = soup.find('span', id='J_JumpTo') jump_url = page_jump.attrs['data-url'] url_parts = urlparse.urlparse(jump_url) query_data = dict(urlparse.parse_qsl(url_parts.query)) total_pages = int(query_data['tPage']) # 解析 orders = [] cur_page = 1 out_date = False errors = [] while True: bought_items = soup.find_all('tbody', attrs={'data-orderid' : True}) # pprint(len(bought_items)) count = 0 for item in bought_items: count += 1 # pprint('{}.{}'.format(cur_page, count)) try: info = {} # 訂單在頁(yè)面上的位置 頁(yè)數(shù).排序號(hào) info['pos'] = '{}.{}'.format(cur_page, count) info['orderid'] = item.attrs['data-orderid'] info['status'] = item.attrs['data-status'] # 店鋪 node = item.select('tr.order-hd a.shopname') if not node: # 店鋪不存在,可能是贈(zèng)送彩票訂單,忽略 # print('ignore') continue info['shop_name'] = node[0].attrs['title'].strip() info['shop_url'] = node[0].attrs['href'] # 日期 node = item.select('tr.order-hd span.dealtime')[0] info['date'] = datetime.strptime(node.attrs['title'], '%Y-%m-%d %H:%M') if end_date and info['date'].toordinal() > end_date.toordinal(): continue if start_date and info['date'].toordinal() < start_date.toordinal(): out_date = True break # 寶貝 baobei = [] node = item.find_all('tr', class_='order-bd') # pprint(len(node)) for n in node: try: bb = {} if [True for ex in extra_service if ex in n.attrs['class']]: # 額外服務(wù)處理 # print('額外服務(wù)處理') name_node = n.find('td', class_='baobei') # 寶貝地址 bb['name'] = name_node.text.strip() bb['url'] = '' bb['spec'] = '' # 寶貝快照 bb['snapshot'] = '' # 寶貝價(jià)格 bb['price'] = 0.0 # 寶貝數(shù)量 bb['quantity'] = 1 bb['is_goods'] = False try: bb['url'] = name_node.find('a').attrs['href'] bb['price'] = float(n.find('td', class_='price').text) except: pass else: name_node = n.select('p.baobei-name a') # 寶貝地址 bb['name'] = name_node[0].text.strip() bb['url'] = name_node[0].attrs['href'] # 寶貝快照 bb['snapshot'] = '' if len(name_node) > 1: bb['snapshot'] = name_node[1].attrs['href'] # 寶貝規(guī)格 bb['spec'] = n.select('.spec')[0].text.strip() # 寶貝價(jià)格 bb['price'] = float(n.find('td', class_='price').attrs['title']) # 寶貝數(shù)量 bb['quantity'] = int(n.find('td', class_='quantity').attrs['title']) bb['is_goods'] = True baobei.append(bb) # 嘗試獲取實(shí)付款 # 實(shí)付款所在的節(jié)點(diǎn)可能跨越多個(gè)tr的td amount_node = n.select('td.amount em.real-price') if amount_node: info['amount'] = float(amount_node[0].text) except Exception as e: errors.append({ 'type' : 'baobei', 'id' : '{}.{}'.format(cur_page, count), 'node' : '{}'.format(n), 'error' : '{}'.format(e) }) except Exception as e: errors.append({ 'type' : 'order', 'id' : '{}.{}'.format(cur_page, count), 'node' : '{}'.format(item), 'error' : '{}'.format(e) }) info['baobei'] = baobei orders.append(info) stdout_cr('working... {:.0%}'.format(cur_page / total_pages)) # 下一頁(yè) cur_page += 1 if cur_page > total_pages or out_date: break query_data.update({'pageNum' : cur_page}) page_url = '{}?{}'.format(url, urllib.urlencode(query_data)) res = urllib2.urlopen(page_url) soup = BeautifulSoup(res.read().decode('gbk')) stdout_cr() if errors: print('INFO. 有錯(cuò)誤發(fā)生,統(tǒng)計(jì)結(jié)果可能不準(zhǔn)確。') # pprint(errors) return orders def output(orders, start_date, end_date): amount = 0.0 org_amount = 0 baobei_count = 0 order_count = 0 invaild_order_count = 0 for order in orders: if order['status'] in INVALID_ORDER_STATES: invaild_order_count += 1 continue amount += order['amount'] order_count += 1 for baobei in order.get('baobei', []): if not baobei['is_goods']: continue org_amount += baobei['price'] * baobei['quantity'] baobei_count += baobei['quantity'] print('{:<9} {}'.format('累計(jì)消費(fèi):', amount)) print('{:<9} {}/{}'.format('訂單/寶貝:', order_count, baobei_count)) if invaild_order_count: print('{:<9} {} (退貨或取消等, 不在上述訂單之內(nèi))'.format('無(wú)效訂單:', invaild_order_count)) print('{:<7} {}'.format('寶貝原始總價(jià):', org_amount)) print('{:<7} {:.2f}'.format('寶貝平均單價(jià):', 0 if baobei_count == 0 else org_amount / baobei_count)) print('{:<9} {} ({:.2%})'.format('節(jié)約了(?):', org_amount - amount, 0 if org_amount == 0 else (org_amount - amount) / org_amount)) from_date = start_date if start_date else orders[-1]['date'] to_date = end_date if end_date else datetime.now() print('{:<9} {:%Y-%m-%d} - {:%Y-%m-%d}'.format('統(tǒng)計(jì)區(qū)間:', from_date, to_date)) if not start_date: print('{:<9} {:%Y-%m-%d %H:%M}'.format('敗家始于:', orders[-1]['date'])) def ouput_orders(orders): print('所有訂單:') if not orders: print(' --') return for order in orders: print(' {:-^20}'.format('-')) print(' * 訂單號(hào): {orderid} 實(shí)付款: {amount} 店鋪: {shop_name} 時(shí)間: {date:%Y-%m-%d %H:%M}'.format(**order)) for bb in order['baobei']: if not bb['is_goods']: continue print(' - {name}'.format(**bb)) if bb['spec']: print(' {spec}'.format(**bb)) print(' {price} X {quantity}'.format(**bb)) def main(): parser = argparse.ArgumentParser( prog='python {}'.format(__file__) ) parser.add_argument('-u', '--username', help='淘寶用戶(hù)名') parser.add_argument('-p', '--password', help='淘寶密碼') parser.add_argument('-s', '--start', help='起始時(shí)間,可選, 格式如: 2014-11-11') parser.add_argument('-e', '--end', help='結(jié)束時(shí)間,可選, 格式如: 2014-11-11') parser.add_argument('--verbose', action='store_true', default=False, help='訂單詳細(xì)輸出') parser.add_argument('-v', '--version', action='version', version='v{}'.format(__version__), help='版本號(hào)') args = parser.parse_args() usr = args.username if not usr: usr = raw_input('輸入淘寶用戶(hù)名: '.encode(RAW_IMPUT_ENCODING)) usr = usr.decode('utf-8') # 中文輸入問(wèn)題 pwd = args.password if not pwd: if platform.system() == 'Windows': # Windows下中文輸出有問(wèn)題 pwd = getpass() else: pwd = getpass('輸入淘寶密碼: '.encode('utf-8')) pwd = pwd.decode('utf-8') verbose = args.verbose start_date = None if args.start: try: start_date = datetime.strptime(args.start, '%Y-%m-%d') except Exception as e: sys.exit('ERROR. {}'.format(e)) end_date = None if args.end: try: end_date = datetime.strptime(args.end, '%Y-%m-%d') except Exception as e: sys.exit('ERROR. {}'.format(e)) if start_date and end_date and start_date > end_date: sys.exit('ERROR, 結(jié)束日期必須晚于或等于開(kāi)始日期') cj_file = './{}.tmp'.format(usr) cj = cookielib.LWPCookieJar() try: cj.load(cj_file) except: pass opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), urllib2.HTTPHandler) urllib2.install_opener(opener) login(usr, pwd) try: cj.save(cj_file) except: pass orders = parse_bought_list(start_date, end_date) output(orders, start_date, end_date) # 輸出訂單明細(xì) if verbose: ouput_orders(orders) if __name__ == '__main__': main()
- python爬蟲(chóng)之模擬登陸csdn的實(shí)例代碼
- python編程使用selenium模擬登陸淘寶實(shí)例代碼
- 利用selenium 3.7和python3添加cookie模擬登陸的實(shí)現(xiàn)
- Python爬蟲(chóng)利用cookie實(shí)現(xiàn)模擬登陸實(shí)例詳解
- Python使用Srapy框架爬蟲(chóng)模擬登陸并抓取知乎內(nèi)容
- 分享一個(gè)常用的Python模擬登陸類(lèi)
- python模擬登陸阿里媽媽生成商品推廣鏈接
- python3.3教程之模擬百度登陸代碼分享
- python模擬新浪微博登陸功能(新浪微博爬蟲(chóng))
- 詳解python項(xiàng)目實(shí)戰(zhàn):模擬登陸CSDN
相關(guān)文章
Python UI自動(dòng)化測(cè)試Web frame及多窗口切換
這篇文章主要為大家介紹了Python UI自動(dòng)化測(cè)試Web frame及多窗口切換,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2022-11-11Python數(shù)據(jù)處理中pd.concat與pd.merge的區(qū)別及說(shuō)明
這篇文章主要介紹了Python數(shù)據(jù)處理中pd.concat與pd.merge的區(qū)別及說(shuō)明,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2024-02-02python中mechanize庫(kù)的簡(jiǎn)單使用示例
最近的項(xiàng)目中使用到了mechanize庫(kù),下面寫(xiě)個(gè)簡(jiǎn)單使用的小例子給大家參考2014-01-01Python json讀寫(xiě)方式和字典相互轉(zhuǎn)化
這篇文章主要介紹了Python json讀寫(xiě)方式和字典相互轉(zhuǎn)化,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2020-04-04Python語(yǔ)言基礎(chǔ)之函數(shù)語(yǔ)法
這篇文章主要介紹了Python語(yǔ)言基礎(chǔ)中的函數(shù)語(yǔ)法,文中有詳細(xì)的代碼示例供大家參考,對(duì)學(xué)習(xí)或工作有一定的幫助,需要的朋友可以參考閱讀下2023-05-05Python使用lambda表達(dá)式對(duì)字典排序操作示例
這篇文章主要介紹了Python使用lambda表達(dá)式對(duì)字典排序操作,結(jié)合實(shí)例形式分析了lambda表達(dá)式實(shí)現(xiàn)字典按鍵排序、按值排序、多條件排序相關(guān)操作技巧,需要的朋友可以參考下2019-07-07python實(shí)現(xiàn)靜態(tài)web服務(wù)器
這篇文章主要為大家詳細(xì)介紹了python實(shí)現(xiàn)靜態(tài)web服務(wù)器,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2019-09-09Python道路車(chē)道線檢測(cè)的實(shí)現(xiàn)
在本文中,我們將構(gòu)建一個(gè)機(jī)器學(xué)習(xí)項(xiàng)目來(lái)實(shí)時(shí)檢測(cè)車(chē)道線。我們將使用 OpenCV 庫(kù)使用計(jì)算機(jī)視覺(jué)的概念來(lái)做到這一點(diǎn),感興趣的可以了解一下2021-06-06cookies應(yīng)對(duì)python反爬蟲(chóng)知識(shí)點(diǎn)詳解
在本篇文章里小編給大家整理關(guān)于cookies應(yīng)對(duì)python反爬蟲(chóng)知識(shí)點(diǎn)詳解,有興趣的朋友們可以學(xué)習(xí)下。2020-11-11