Python聚类图实战:从入门到精通
Python 数据可视化之聚类图技术详解
聚类图(Cluster Plot)是一种用于展示数据分组结构的可视化方法,常用于无监督学习中的聚类分析。Python 提供了多种库(如 scikit-learn、seaborn、matplotlib)来实现聚类图的绘制。以下是具体实现方法和技术要点。
数据准备与预处理
聚类分析通常需要标准化或归一化数据。使用 scipy 或 scikit-learn 进行数据预处理:
from sklearn.preprocessing import StandardScaler
import pandas as pd
data = pd.read_csv('your_data.csv')
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
聚类算法选择
常见的聚类算法包括 K-Means、层次聚类(Hierarchical Clustering)和 DBSCAN。以 K-Means 为例:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(scaled_data)
绘制聚类图
使用 matplotlib 和 seaborn 绘制聚类结果。对于二维数据,可直接散点图展示:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 6))
sns.scatterplot(x=scaled_data[:, 0], y=scaled_data[:, 1], hue=clusters, palette='viridis')
plt.title('K-Means Clustering Results')
plt.show()
层次聚类图(树状图)
层次聚类可通过树状图(Dendrogram)展示。使用 scipy 的 dendrogram 函数:
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
linked = linkage(scaled_data, method='ward')
plt.figure(figsize=(10, 6))
dendrogram(linked, orientation='top')
plt.title('Hierarchical Clustering Dendrogram')
plt.show()
高维数据可视化
对于高维数据,可先降维(如 PCA 或 t-SNE)再绘图:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(scaled_data)
sns.scatterplot(x=pca_result[:, 0], y=pca_result[:, 1], hue=clusters, palette='Set2')
plt.title('PCA + K-Means Clustering')
plt.show()
聚类评估与调优
使用轮廓系数(Silhouette Score)或肘部法(Elbow Method)评估聚类效果:
from sklearn.metrics import silhouette_score
score = silhouette_score(scaled_data, clusters)
print(f'Silhouette Score: {score:.2f}')
实战案例:鸢尾花数据集
以鸢尾花数据集为例,展示完整流程:
from sklearn.datasets import load_iris
iris = load_iris()
data = iris.data
target = iris.target
# K-Means 聚类
kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(data)
# 可视化
pca = PCA(n_components=2)
pca_result = pca.fit_transform(data)
sns.scatterplot(x=pca_result[:, 0], y=pca_result[:, 1], hue=clusters, palette='deep')
plt.title('Iris Dataset Clustering')
plt.show()
注意事项
- 聚类结果可能因初始质心选择而不同,建议多次运行取最优解。
- 高维数据需结合降维技术,避免“维度灾难”。
- 树状图适合小规模数据(样本量 < 1000),否则可视化效果较差。
通过以上方法,可以灵活实现不同场景下的聚类分析与可视化需求。
BbS.okapop113.sbs/PoSt/1122_849321.HtM
BbS.okapop114.sbs/PoSt/1122_576817.HtM
BbS.okapop115.sbs/PoSt/1122_174012.HtM
BbS.okapop116.sbs/PoSt/1122_506183.HtM
BbS.okapop117.sbs/PoSt/1122_345998.HtM
BbS.okapop118.sbs/PoSt/1122_340952.HtM
BbS.okapop119.sbs/PoSt/1122_466756.HtM
BbS.okapop120.sbs/PoSt/1122_492480.HtM
BbS.okapop121.sbs/PoSt/1122_587322.HtM
BbS.okapop122.sbs/PoSt/1122_272086.HtM
BbS.okapop113.sbs/PoSt/1122_436935.HtM
BbS.okapop114.sbs/PoSt/1122_402609.HtM
BbS.okapop115.sbs/PoSt/1122_346832.HtM
BbS.okapop116.sbs/PoSt/1122_316965.HtM
BbS.okapop117.sbs/PoSt/1122_361477.HtM
BbS.okapop118.sbs/PoSt/1122_561263.HtM
BbS.okapop119.sbs/PoSt/1122_891432.HtM
BbS.okapop120.sbs/PoSt/1122_203978.HtM
BbS.okapop121.sbs/PoSt/1122_276166.HtM
BbS.okapop122.sbs/PoSt/1122_445024.HtM
BbS.okapop113.sbs/PoSt/1122_387334.HtM
BbS.okapop114.sbs/PoSt/1122_852273.HtM
BbS.okapop115.sbs/PoSt/1122_814355.HtM
BbS.okapop116.sbs/PoSt/1122_938096.HtM
BbS.okapop117.sbs/PoSt/1122_700198.HtM
BbS.okapop118.sbs/PoSt/1122_027626.HtM
BbS.okapop119.sbs/PoSt/1122_644587.HtM
BbS.okapop120.sbs/PoSt/1122_978610.HtM
BbS.okapop121.sbs/PoSt/1122_785004.HtM
BbS.okapop122.sbs/PoSt/1122_210039.HtM
BbS.okapop113.sbs/PoSt/1122_793430.HtM
BbS.okapop114.sbs/PoSt/1122_520282.HtM
BbS.okapop115.sbs/PoSt/1122_263826.HtM
BbS.okapop116.sbs/PoSt/1122_564132.HtM
BbS.okapop117.sbs/PoSt/1122_762613.HtM
BbS.okapop118.sbs/PoSt/1122_569883.HtM
BbS.okapop119.sbs/PoSt/1122_749528.HtM
BbS.okapop120.sbs/PoSt/1122_168836.HtM
BbS.okapop121.sbs/PoSt/1122_979589.HtM
BbS.okapop122.sbs/PoSt/1122_918363.HtM
BbS.okapop113.sbs/PoSt/1122_203653.HtM
BbS.okapop114.sbs/PoSt/1122_786542.HtM
BbS.okapop115.sbs/PoSt/1122_426384.HtM
BbS.okapop116.sbs/PoSt/1122_094605.HtM
BbS.okapop117.sbs/PoSt/1122_481446.HtM
BbS.okapop118.sbs/PoSt/1122_886348.HtM
BbS.okapop119.sbs/PoSt/1122_281326.HtM
BbS.okapop120.sbs/PoSt/1122_093141.HtM
BbS.okapop121.sbs/PoSt/1122_017423.HtM
BbS.okapop122.sbs/PoSt/1122_823485.HtM
BbS.okapop113.sbs/PoSt/1122_281371.HtM
BbS.okapop114.sbs/PoSt/1122_807772.HtM
BbS.okapop115.sbs/PoSt/1122_984033.HtM
BbS.okapop116.sbs/PoSt/1122_260876.HtM
BbS.okapop117.sbs/PoSt/1122_078266.HtM
BbS.okapop118.sbs/PoSt/1122_583220.HtM
BbS.okapop119.sbs/PoSt/1122_269500.HtM
BbS.okapop120.sbs/PoSt/1122_326415.HtM
BbS.okapop121.sbs/PoSt/1122_064661.HtM
BbS.okapop122.sbs/PoSt/1122_678401.HtM
BbS.okapop113.sbs/PoSt/1122_926708.HtM
BbS.okapop114.sbs/PoSt/1122_933465.HtM
BbS.okapop115.sbs/PoSt/1122_142055.HtM
BbS.okapop116.sbs/PoSt/1122_267467.HtM
BbS.okapop117.sbs/PoSt/1122_031661.HtM
BbS.okapop118.sbs/PoSt/1122_964707.HtM
BbS.okapop119.sbs/PoSt/1122_177564.HtM
BbS.okapop120.sbs/PoSt/1122_654664.HtM
BbS.okapop121.sbs/PoSt/1122_207841.HtM
BbS.okapop122.sbs/PoSt/1122_968297.HtM
BbS.okapop113.sbs/PoSt/1122_758193.HtM
BbS.okapop114.sbs/PoSt/1122_088793.HtM
BbS.okapop115.sbs/PoSt/1122_432900.HtM
BbS.okapop116.sbs/PoSt/1122_678049.HtM
BbS.okapop117.sbs/PoSt/1122_402701.HtM
BbS.okapop118.sbs/PoSt/1122_845981.HtM
BbS.okapop119.sbs/PoSt/1122_840548.HtM
BbS.okapop120.sbs/PoSt/1122_557917.HtM
BbS.okapop121.sbs/PoSt/1122_399152.HtM
BbS.okapop122.sbs/PoSt/1122_147713.HtM