ASP獲取網(wǎng)頁全部圖片地址并保存為數(shù)組的正則
更新時(shí)間:2008年03月03日 19:30:06 作者:
ASP常用函數(shù):getIMG()獲取網(wǎng)頁全部圖片地址并保存為數(shù)組
目前還是有BUG的,最新的測試頁面在: http://www.reallydo.com/getimg.asp
正則分析頁面在: http://jorkin.reallydo.com/article.asp?id=380
發(fā)現(xiàn)BUG請?jiān)诤竺媪粞?謝謝.
1.31修正
src=后面有空格不能正確匹配.已修正.
src=''為空時(shí)出錯(cuò).已修正.
發(fā)現(xiàn)BUG: 圖片路徑有多個(gè)空格時(shí)只能保留一個(gè).未修正.
2.18修正
圖片路徑有多個(gè)空格時(shí)只能保留一個(gè)的BUG.已修正.
<%
'功能:獲取全部圖片地址,保存到一個(gè)數(shù)組.
'來源:http://jorkin.reallydo.com/article.asp?id=448
'需要ReplaceAll函數(shù):http://jorkin.reallydo.com/article.asp?id=406
Function getIMG(sString)
Dim sReallyDo, regEx, iReallyDo
Dim oMatches, cMatch
'//定義一個(gè)空數(shù)組
iReallyDo = -1
ReDim aReallyDo(iReallyDo)
If IsNull(sString) Then
getIMG = ""
Exit Function
End If
'//格式化HTML代碼
'//將每個(gè) <img 換行 方便正則替換
sReallyDo = sString
On Error Resume Next
sReallyDo = Replace(sReallyDo, vbCr, " ")
sReallyDo = Replace(sReallyDo, vbLf, " ")
sReallyDo = Replace(sReallyDo, vbTab, " ")
sReallyDo = Replace(sReallyDo, "<img ", vbCrLf & "<img ", 1, -1, 1)
sReallyDo = Replace(sReallyDo, "/>", " />", 1, -1, 1)
sReallyDo = ReplaceAll(sReallyDo, "= ", "=", True)
sReallyDo = ReplaceAll(sReallyDo, "> ", ">", True)
sReallyDo = Replace(sReallyDo, "><", ">" & vbCrLf & "<")
sReallyDo = Trim(sReallyDo)
On Error GoTo 0
Set regEx = New RegExp
regEx.IgnoreCase = True
regEx.Global = True
'//去除onclick,onload等腳本
regEx.Pattern = "\s[on].+?=([\""|\'])(.*?)\1"
sReallyDo = regEx.Replace(sReallyDo, "")
'//將SRC不帶引號的圖片地址加上引號
regEx.Pattern = "<img.*?\ssrc=([^\""\'\s][^\""\'\s>]*).*?>"
sReallyDo = regEx.Replace(sReallyDo, "<img src=""$1"" />")
'//正則匹配圖片SRC地址
regEx.Pattern = "<img.*?\ssrc=([\""\'])([^\""\']+?)\1.*?>"
Set oMatches = regEx.Execute(sReallyDo)
'//將圖片地址存入數(shù)組
For Each cMatch in oMatches
iReallyDo = iReallyDo + 1
ReDim Preserve aReallyDo(iReallyDo)
aReallyDo(iReallyDo) = regEx.Replace(cMatch.Value, "$2")
Next
getIMG = aReallyDo
End Function
%>
正則分析頁面在: http://jorkin.reallydo.com/article.asp?id=380
發(fā)現(xiàn)BUG請?jiān)诤竺媪粞?謝謝.
1.31修正
src=后面有空格不能正確匹配.已修正.
src=''為空時(shí)出錯(cuò).已修正.
發(fā)現(xiàn)BUG: 圖片路徑有多個(gè)空格時(shí)只能保留一個(gè).未修正.
2.18修正
圖片路徑有多個(gè)空格時(shí)只能保留一個(gè)的BUG.已修正.
復(fù)制代碼 代碼如下:
<%
'功能:獲取全部圖片地址,保存到一個(gè)數(shù)組.
'來源:http://jorkin.reallydo.com/article.asp?id=448
'需要ReplaceAll函數(shù):http://jorkin.reallydo.com/article.asp?id=406
Function getIMG(sString)
Dim sReallyDo, regEx, iReallyDo
Dim oMatches, cMatch
'//定義一個(gè)空數(shù)組
iReallyDo = -1
ReDim aReallyDo(iReallyDo)
If IsNull(sString) Then
getIMG = ""
Exit Function
End If
'//格式化HTML代碼
'//將每個(gè) <img 換行 方便正則替換
sReallyDo = sString
On Error Resume Next
sReallyDo = Replace(sReallyDo, vbCr, " ")
sReallyDo = Replace(sReallyDo, vbLf, " ")
sReallyDo = Replace(sReallyDo, vbTab, " ")
sReallyDo = Replace(sReallyDo, "<img ", vbCrLf & "<img ", 1, -1, 1)
sReallyDo = Replace(sReallyDo, "/>", " />", 1, -1, 1)
sReallyDo = ReplaceAll(sReallyDo, "= ", "=", True)
sReallyDo = ReplaceAll(sReallyDo, "> ", ">", True)
sReallyDo = Replace(sReallyDo, "><", ">" & vbCrLf & "<")
sReallyDo = Trim(sReallyDo)
On Error GoTo 0
Set regEx = New RegExp
regEx.IgnoreCase = True
regEx.Global = True
'//去除onclick,onload等腳本
regEx.Pattern = "\s[on].+?=([\""|\'])(.*?)\1"
sReallyDo = regEx.Replace(sReallyDo, "")
'//將SRC不帶引號的圖片地址加上引號
regEx.Pattern = "<img.*?\ssrc=([^\""\'\s][^\""\'\s>]*).*?>"
sReallyDo = regEx.Replace(sReallyDo, "<img src=""$1"" />")
'//正則匹配圖片SRC地址
regEx.Pattern = "<img.*?\ssrc=([\""\'])([^\""\']+?)\1.*?>"
Set oMatches = regEx.Execute(sReallyDo)
'//將圖片地址存入數(shù)組
For Each cMatch in oMatches
iReallyDo = iReallyDo + 1
ReDim Preserve aReallyDo(iReallyDo)
aReallyDo(iReallyDo) = regEx.Replace(cMatch.Value, "$2")
Next
getIMG = aReallyDo
End Function
%>
相關(guān)文章
PostHttpPage用asp是實(shí)現(xiàn)模擬登錄效果的代碼
PostHttpPage用asp是實(shí)現(xiàn)模擬登錄效果的代碼...2007-09-09在JScript中使用緩存技術(shù)的實(shí)際代碼
在編寫ASP程序時(shí),通常為了提高ASP程序的運(yùn)行效率及減少對數(shù)據(jù)庫的連接和查詢,會(huì)使用緩存技術(shù)來緩存一些需要從數(shù)據(jù)庫讀取的數(shù)據(jù)。而在ASP中實(shí)現(xiàn)緩存的方法常用的就是使用Application對象。在編寫ASP程序時(shí),我們有兩種語言可以選擇,分別是VBScript和JScript。2008-05-05隨機(jī)調(diào)用n條數(shù)據(jù)的方法分析
隨機(jī)調(diào)用n條數(shù)據(jù)的方法分析...2007-07-07Access數(shù)據(jù)庫中“所有記錄中均未找到搜索關(guān)鍵字”的解決方法
這個(gè)是Access一個(gè)天生不足的表現(xiàn),出現(xiàn)此錯(cuò)誤是因?yàn)槟愕腁ccess數(shù)據(jù)庫有錯(cuò)誤了。2008-08-08asp中判斷服務(wù)器是否安裝了某種組件的函數(shù)
檢查是否存在系統(tǒng)組件或組件是否安裝成功,方便繼續(xù)的操作。給用戶更好的信息指示。2010-12-12asp偽靜態(tài)情況下實(shí)現(xiàn)的utf-8文件緩存實(shí)現(xiàn)代碼
該程序通過使用ASP的FSO功能,減少數(shù)據(jù)庫的讀取。經(jīng)測試,可以減少90%的服務(wù)器負(fù)荷。頁面訪問速度基本與靜態(tài)頁面相當(dāng)。2011-01-01