.NET從優(yōu)酷專輯中采集所有視頻及信息(VB.NET代碼)
更新時間:2010年02月07日 11:50:59 作者:
因為想做一個視頻點播類的網(wǎng)站,所以開始研究視頻采集。
這個方法就是提取優(yōu)酷的專輯ID,然后一個個ID進行循環(huán)采集網(wǎng)頁代碼,從中提取title標簽和VID,沒什么技術(shù)含量。=..=
采集中應(yīng)用.NET中的HttpWebRequest和HttpWebResponse類,代碼分析用了正則表達式。
這個代碼效率不是很好,一個網(wǎng)頁的解析時間在0.5~2秒之間,不適合大量采集。也許將它轉(zhuǎn)換成JavaScript速度會快一點吧。
暫時就研究這么多,代碼直接發(fā)出來給大家共享一下。
代碼VB.NET,新建一個窗體frmMain,添加一個TextBox,一個ListBox,兩個Button,復(fù)制下面的代碼:
Imports System.Net
Imports System.IO
Imports System.Text
Imports System.Text.RegularExpressions
Public Class frmMain
Structure VList
Dim id As Integer
Dim title As String
Dim vid1 As String
Dim vid2 As String
Overloads Function ToString() As String
Return String.Format("{0}:<{1}> [{2}]", id, title, vid1)
End Function
End Structure
Dim myList As New List(Of VList)
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
' 防止重復(fù)創(chuàng)建變量
Dim wr1 As HttpWebRequest
Dim wr2 As HttpWebResponse
Dim ret As String
Dim reg As Match
Dim g As Group
Dim preVid As String = "" '上一個VID
Dim nowid As Integer = 0 '當前的視頻集數(shù)
Dim listUrl As String = TextBox1.Text '獲取專輯URL,如 http://www.youku.com/playlist_show/id_2350764.html
Dim tarUrl As String = "http://v.youku.com/v_playlist/f{0}" '{0}ListID
reg = Regex.Match(listUrl, "playlist_show/id_(\d+).*\.html")
If Not reg.Success Then
MsgBox("專輯列表提取失??!")
Exit Sub
End If
g = reg.Groups(1)
tarUrl = String.Format(tarUrl, g.Value) & "o{1}p{0}.html" '{0}集數(shù) {1}排序
wr1 = HttpWebRequest.Create(TextBox1.Text)
wr2 = wr1.GetResponse
ret = New StreamReader(wr2.GetResponseStream, Encoding.GetEncoding(wr2.CharacterSet)).ReadToEnd
reg = Regex.Match(ret, "<title>(.+) - 專輯 - 優(yōu)酷視頻</title>")
If Not reg.Success Then
MsgBox("專輯名稱提取失敗!")
Else
g = reg.Groups(1)
MsgBox("專輯名:《" & g.Value & "》")
End If
Do
' 從Web流中獲取頁面文本
wr1 = HttpWebRequest.Create(String.Format(tarUrl, nowid, "0")) '按倒序方式查找視頻
wr2 = wr1.GetResponse
ret = New StreamReader(wr2.GetResponseStream, Encoding.GetEncoding(wr2.CharacterSet)).ReadToEnd
'TextBox2.Text = ret
' 創(chuàng)建一個臨時視頻列表變量
Dim nlist As New VList
nlist.id = nowid '獲取ID
' 獲取videoId
reg = Regex.Match(ret, "var\s+videoId\s*=\s*""(\d+)""\s*;")
If Not reg.Success Then Exit Do
g = reg.Groups(1)
' 如果VID等于上一個VID最退出
If g.Value = preVid Then Exit Do
nlist.vid1 = g.Value
' 獲取videoId2
reg = Regex.Match(ret, "var\s+videoId2\s*=\s*""((\w|=)+)""\s*;") '"var\s+videoId2\s*=\s*""(\w+)""\s*;")
If Not reg.Success Then Exit Do
g = reg.Groups(1)
nlist.vid2 = g.Value
' 獲取標題
reg = Regex.Match(ret, "<title>(.+) - (.+) - 視頻 - 優(yōu)酷視頻 - 在線觀看 - </title>")
If Not reg.Success Then
nlist.title = "{名稱查找錯誤}"
Else
g = reg.Groups(2)
nlist.title = g.Value
End If
' 收尾工作
myList.Add(nlist) '添加到總列表中
preVid = nlist.vid1 '記錄最后一個VID
wr2.Close()
Me.Text = nowid & " : 處理完成!"
nowid += 1
Loop
wr2.Close()
MsgBox(nowid & " 個視頻全部采集處理完成!")
Button2_Click(sender, e)
End Sub
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
ListBox1.Items.Clear()
For Each ls As VList In myList
ListBox1.Items.Add(String.Format("{0}:<{1}> [{2}]", ls.id, ls.title, ls.vid1))
Next
myList.Clear()
End Sub
End Class
夜聞香原創(chuàng)
博客: http://clso.cnblogs.com
主頁: http://cleclso.cn
QQ:315514678 E-mail:clso#qq.com
歡迎技術(shù)交流!
采集中應(yīng)用.NET中的HttpWebRequest和HttpWebResponse類,代碼分析用了正則表達式。
這個代碼效率不是很好,一個網(wǎng)頁的解析時間在0.5~2秒之間,不適合大量采集。也許將它轉(zhuǎn)換成JavaScript速度會快一點吧。
暫時就研究這么多,代碼直接發(fā)出來給大家共享一下。
代碼VB.NET,新建一個窗體frmMain,添加一個TextBox,一個ListBox,兩個Button,復(fù)制下面的代碼:
復(fù)制代碼 代碼如下:
Imports System.Net
Imports System.IO
Imports System.Text
Imports System.Text.RegularExpressions
Public Class frmMain
Structure VList
Dim id As Integer
Dim title As String
Dim vid1 As String
Dim vid2 As String
Overloads Function ToString() As String
Return String.Format("{0}:<{1}> [{2}]", id, title, vid1)
End Function
End Structure
Dim myList As New List(Of VList)
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
' 防止重復(fù)創(chuàng)建變量
Dim wr1 As HttpWebRequest
Dim wr2 As HttpWebResponse
Dim ret As String
Dim reg As Match
Dim g As Group
Dim preVid As String = "" '上一個VID
Dim nowid As Integer = 0 '當前的視頻集數(shù)
Dim listUrl As String = TextBox1.Text '獲取專輯URL,如 http://www.youku.com/playlist_show/id_2350764.html
Dim tarUrl As String = "http://v.youku.com/v_playlist/f{0}" '{0}ListID
reg = Regex.Match(listUrl, "playlist_show/id_(\d+).*\.html")
If Not reg.Success Then
MsgBox("專輯列表提取失??!")
Exit Sub
End If
g = reg.Groups(1)
tarUrl = String.Format(tarUrl, g.Value) & "o{1}p{0}.html" '{0}集數(shù) {1}排序
wr1 = HttpWebRequest.Create(TextBox1.Text)
wr2 = wr1.GetResponse
ret = New StreamReader(wr2.GetResponseStream, Encoding.GetEncoding(wr2.CharacterSet)).ReadToEnd
reg = Regex.Match(ret, "<title>(.+) - 專輯 - 優(yōu)酷視頻</title>")
If Not reg.Success Then
MsgBox("專輯名稱提取失敗!")
Else
g = reg.Groups(1)
MsgBox("專輯名:《" & g.Value & "》")
End If
Do
' 從Web流中獲取頁面文本
wr1 = HttpWebRequest.Create(String.Format(tarUrl, nowid, "0")) '按倒序方式查找視頻
wr2 = wr1.GetResponse
ret = New StreamReader(wr2.GetResponseStream, Encoding.GetEncoding(wr2.CharacterSet)).ReadToEnd
'TextBox2.Text = ret
' 創(chuàng)建一個臨時視頻列表變量
Dim nlist As New VList
nlist.id = nowid '獲取ID
' 獲取videoId
reg = Regex.Match(ret, "var\s+videoId\s*=\s*""(\d+)""\s*;")
If Not reg.Success Then Exit Do
g = reg.Groups(1)
' 如果VID等于上一個VID最退出
If g.Value = preVid Then Exit Do
nlist.vid1 = g.Value
' 獲取videoId2
reg = Regex.Match(ret, "var\s+videoId2\s*=\s*""((\w|=)+)""\s*;") '"var\s+videoId2\s*=\s*""(\w+)""\s*;")
If Not reg.Success Then Exit Do
g = reg.Groups(1)
nlist.vid2 = g.Value
' 獲取標題
reg = Regex.Match(ret, "<title>(.+) - (.+) - 視頻 - 優(yōu)酷視頻 - 在線觀看 - </title>")
If Not reg.Success Then
nlist.title = "{名稱查找錯誤}"
Else
g = reg.Groups(2)
nlist.title = g.Value
End If
' 收尾工作
myList.Add(nlist) '添加到總列表中
preVid = nlist.vid1 '記錄最后一個VID
wr2.Close()
Me.Text = nowid & " : 處理完成!"
nowid += 1
Loop
wr2.Close()
MsgBox(nowid & " 個視頻全部采集處理完成!")
Button2_Click(sender, e)
End Sub
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
ListBox1.Items.Clear()
For Each ls As VList In myList
ListBox1.Items.Add(String.Format("{0}:<{1}> [{2}]", ls.id, ls.title, ls.vid1))
Next
myList.Clear()
End Sub
End Class
夜聞香原創(chuàng)
博客: http://clso.cnblogs.com
主頁: http://cleclso.cn
QQ:315514678 E-mail:clso#qq.com
歡迎技術(shù)交流!
相關(guān)文章
ASP.NET設(shè)置404頁面返回302HTTP狀態(tài)碼的解決方法
訪問網(wǎng)站時錯誤頁面可正常顯示,但HTTP狀態(tài)碼卻是302,對SEO很不友好,按下列步驟修改使錯誤頁面返回正確的利于SEO的404狀態(tài)碼,感興趣的朋友可以了解下2013-09-09asp.net 頁面延時五秒,跳轉(zhuǎn)到另外的頁面
asp.net 頁面延時五秒,跳轉(zhuǎn)到另外的頁面的實現(xiàn)代碼。2009-12-12asp.net中js+jquery添加下拉框值和后臺獲取示例
這篇文章主要介紹了asp.net中js+jquery添加下拉框值和后臺獲取的具體實現(xiàn),需要的朋友可以參考下2014-05-05阿里云上從ASP.NET線程角度對“黑色30秒”問題的全新分析
在這篇博文中,我們拋開對阿里云的懷疑,完全從ASP.NET的角度進行分析,看能不能找到針對問題現(xiàn)象的更合理的解釋2015-09-09Asp.Net Core 調(diào)用第三方Open API查詢物流數(shù)據(jù)的示例
這篇文章主要介紹了Asp.Net Core 調(diào)用第三方Open API查詢物流數(shù)據(jù)的示例,幫助大家更好的理解和學(xué)習(xí)使用Asp.Net Core,感興趣的朋友可以了解下2021-03-03