Python聚類算法之DBSACN實例分析
本文實例講述了Python聚類算法之DBSACN。分享給大家供大家參考,具體如下:
DBSCAN:是一種簡單的,基于密度的聚類算法。本次實現(xiàn)中,DBSCAN使用了基于中心的方法。在基于中心的方法中,每個數(shù)據(jù)點的密度通過對以該點為中心以邊長為2*EPs的網(wǎng)格(鄰域)內(nèi)的其他數(shù)據(jù)點的個數(shù)來度量。根據(jù)數(shù)據(jù)點的密度分為三類點:
核心點:該點在鄰域內(nèi)的密度超過給定的閥值MinPs。
邊界點:該點不是核心點,但是其鄰域內(nèi)包含至少一個核心點。
噪音點:不是核心點,也不是邊界點。
有了以上對數(shù)據(jù)點的劃分,聚合可以這樣進行:各個核心點與其鄰域內(nèi)的所有核心點放在同一個簇中,把邊界點跟其鄰域內(nèi)的某個核心點放在同一個簇中。
# scoding=utf-8 import pylab as pl from collections import defaultdict,Counter points = [[int(eachpoint.split("#")[0]), int(eachpoint.split("#")[1])] for eachpoint in open("points","r")] # 計算每個數(shù)據(jù)點相鄰的數(shù)據(jù)點,鄰域定義為以該點為中心以邊長為2*EPs的網(wǎng)格 Eps = 10 surroundPoints = defaultdict(list) for idx1,point1 in enumerate(points): for idx2,point2 in enumerate(points): if (idx1 < idx2): if(abs(point1[0]-point2[0])<=Eps and abs(point1[1]-point2[1])<=Eps): surroundPoints[idx1].append(idx2) surroundPoints[idx2].append(idx1) # 定義鄰域內(nèi)相鄰的數(shù)據(jù)點的個數(shù)大于4的為核心點 MinPts = 5 corePointIdx = [pointIdx for pointIdx,surPointIdxs in surroundPoints.iteritems() if len(surPointIdxs)>=MinPts] # 鄰域內(nèi)包含某個核心點的非核心點,定義為邊界點 borderPointIdx = [] for pointIdx,surPointIdxs in surroundPoints.iteritems(): if (pointIdx not in corePointIdx): for onesurPointIdx in surPointIdxs: if onesurPointIdx in corePointIdx: borderPointIdx.append(pointIdx) break # 噪音點既不是邊界點也不是核心點 noisePointIdx = [pointIdx for pointIdx in range(len(points)) if pointIdx not in corePointIdx and pointIdx not in borderPointIdx] corePoint = [points[pointIdx] for pointIdx in corePointIdx] borderPoint = [points[pointIdx] for pointIdx in borderPointIdx] noisePoint = [points[pointIdx] for pointIdx in noisePointIdx] # pl.plot([eachpoint[0] for eachpoint in corePoint], [eachpoint[1] for eachpoint in corePoint], 'or') # pl.plot([eachpoint[0] for eachpoint in borderPoint], [eachpoint[1] for eachpoint in borderPoint], 'oy') # pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok') groups = [idx for idx in range(len(points))] # 各個核心點與其鄰域內(nèi)的所有核心點放在同一個簇中 for pointidx,surroundIdxs in surroundPoints.iteritems(): for oneSurroundIdx in surroundIdxs: if (pointidx in corePointIdx and oneSurroundIdx in corePointIdx and pointidx < oneSurroundIdx): for idx in range(len(groups)): if groups[idx] == groups[oneSurroundIdx]: groups[idx] = groups[pointidx] # 邊界點跟其鄰域內(nèi)的某個核心點放在同一個簇中 for pointidx,surroundIdxs in surroundPoints.iteritems(): for oneSurroundIdx in surroundIdxs: if (pointidx in borderPointIdx and oneSurroundIdx in corePointIdx): groups[pointidx] = groups[oneSurroundIdx] break # 取簇規(guī)模最大的5個簇 wantGroupNum = 3 finalGroup = Counter(groups).most_common(3) finalGroup = [onecount[0] for onecount in finalGroup] group1 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[0]] group2 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[1]] group3 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[2]] pl.plot([eachpoint[0] for eachpoint in group1], [eachpoint[1] for eachpoint in group1], 'or') pl.plot([eachpoint[0] for eachpoint in group2], [eachpoint[1] for eachpoint in group2], 'oy') pl.plot([eachpoint[0] for eachpoint in group3], [eachpoint[1] for eachpoint in group3], 'og') # 打印噪音點,黑色 pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok') pl.show()
運行效果截圖如下:
希望本文所述對大家Python程序設計有所幫助。
相關文章
Windows10下Tensorflow2.0 安裝及環(huán)境配置教程(圖文)
這篇文章主要介紹了Windows10下Tensorflow2.0 安裝及環(huán)境配置教程(圖文),文中通過示例代碼介紹的非常詳細,對大家的學習或者工作具有一定的參考學習價值,需要的朋友們下面隨著小編來一起學習學習吧2019-11-11Python實現(xiàn)批量上傳本地maven庫到nexus
這篇文章主要為大家詳細介紹了如何使用Python實現(xiàn)批量上傳本地maven庫到nexus,文中的示例代碼講解詳細,具有一定的借鑒價值,需要的小伙伴可以參考下2024-01-01Django獲取該數(shù)據(jù)的上一條和下一條方法
今天小編就為大家分享一篇Django獲取該數(shù)據(jù)的上一條和下一條方法,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧2019-08-08Python 使用PIL numpy 實現(xiàn)拼接圖片的示例
今天小編就為大家分享一篇Python 使用PIL numpy 實現(xiàn)拼接圖片的示例,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧2018-05-05