Python聚类分析:从数据到可视化
Python 数据可视化之聚类图
聚类图(Cluster Plot)是数据分析和机器学习中常用的可视化工具,用于展示数据的分组结构。Python 提供了多种库(如 scikit-learn、seaborn、matplotlib)来实现聚类分析及可视化。
数据准备与预处理
聚类分析通常需要标准化或归一化数据,以避免量纲差异对聚类结果的影响。使用 sklearn.preprocessing 的 StandardScaler 或 MinMaxScaler 对数据进行预处理。
from sklearn.preprocessing import StandardScaler
import pandas as pd
data = pd.read_csv('data.csv')
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
聚类算法选择与训练
常见的聚类算法包括 K-Means、层次聚类(Hierarchical Clustering)和 DBSCAN。以 K-Means 为例,使用 sklearn.cluster 进行训练并获取聚类标签。
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(scaled_data)
data['Cluster'] = clusters # 将聚类结果加入原始数据
可视化聚类结果
1. 散点图展示聚类分布
使用 matplotlib 或 seaborn 绘制散点图,不同颜色代表不同簇。
import matplotlib.pyplot as plt
import seaborn as sns
sns.scatterplot(data=data, x='Feature1', y='Feature2', hue='Cluster', palette='viridis')
plt.title('K-Means Clustering')
plt.show()
2. 层次聚类热图(Dendrogram)
层次聚类可通过树状图展示数据点之间的层次关系。
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
linked = linkage(scaled_data, method='ward')
plt.figure(figsize=(10, 6))
dendrogram(linked, orientation='top', labels=np.arange(len(data)))
plt.title('Hierarchical Clustering Dendrogram')
plt.show()
3. 高维数据降维可视化
对于高维数据,可使用 PCA 或 t-SNE 降维后绘图。
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(scaled_data)
data['PCA1'] = pca_result[:, 0]
data['PCA2'] = pca_result[:, 1]
sns.scatterplot(data=data, x='PCA1', y='PCA2', hue='Cluster', palette='Set1')
plt.title('PCA Reduced Clusters')
plt.show()
聚类评估
通过轮廓系数(Silhouette Score)或 Calinski-Harabasz 指数评估聚类质量。
from sklearn.metrics import silhouette_score
score = silhouette_score(scaled_data, clusters)
print(f'Silhouette Score: {score:.2f}')
优化与调参
通过肘部法则(Elbow Method)选择最佳聚类数 K。
inertia = []
for k in range(1, 10):
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(scaled_data)
inertia.append(kmeans.inertia_)
plt.plot(range(1, 10), inertia, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method')
plt.show()
总结
Python 提供了强大的工具链实现聚类分析与可视化,涵盖数据预处理、算法选择、结果展示及评估。通过合理调整参数和选择合适的可视化方法,可以清晰呈现数据的内在结构。
BbS.okacop060.info/PoSt/1120_374760.HtM
BbS.okacop061.info/PoSt/1120_346041.HtM
BbS.okacop062.info/PoSt/1120_684654.HtM
BbS.okacop063.info/PoSt/1120_576247.HtM
BbS.okacop065.info/PoSt/1120_061004.HtM
BbS.okacop066.info/PoSt/1120_771984.HtM
BbS.okacop067.info/PoSt/1120_516391.HtM
BbS.okacop068.info/PoSt/1120_067795.HtM
BbS.okacop069.info/PoSt/1120_384175.HtM
BbS.okacop070.info/PoSt/1120_606376.HtM
BbS.okacop060.info/PoSt/1120_421618.HtM
BbS.okacop061.info/PoSt/1120_075462.HtM
BbS.okacop062.info/PoSt/1120_752722.HtM
BbS.okacop063.info/PoSt/1120_122160.HtM
BbS.okacop065.info/PoSt/1120_551690.HtM
BbS.okacop066.info/PoSt/1120_752170.HtM
BbS.okacop067.info/PoSt/1120_091773.HtM
BbS.okacop068.info/PoSt/1120_294113.HtM
BbS.okacop069.info/PoSt/1120_194942.HtM
BbS.okacop070.info/PoSt/1120_291021.HtM
BbS.okacop060.info/PoSt/1120_459164.HtM
BbS.okacop061.info/PoSt/1120_255298.HtM
BbS.okacop062.info/PoSt/1120_538225.HtM
BbS.okacop063.info/PoSt/1120_621368.HtM
BbS.okacop065.info/PoSt/1120_319171.HtM
BbS.okacop066.info/PoSt/1120_270113.HtM
BbS.okacop067.info/PoSt/1120_906894.HtM
BbS.okacop068.info/PoSt/1120_329719.HtM
BbS.okacop069.info/PoSt/1120_973768.HtM
BbS.okacop070.info/PoSt/1120_534979.HtM
BbS.okacop071.info/PoSt/1120_161239.HtM
BbS.okacop072.info/PoSt/1120_368171.HtM
BbS.okacop073.info/PoSt/1120_068690.HtM
BbS.okacop074.info/PoSt/1120_694415.HtM
BbS.okacop075.info/PoSt/1120_842084.HtM
BbS.okacop076.info/PoSt/1120_340432.HtM
BbS.okacop077.info/PoSt/1120_128484.HtM
BbS.okacop078.info/PoSt/1120_955667.HtM
BbS.okacop079.info/PoSt/1120_839486.HtM
BbS.okacop080.info/PoSt/1120_344488.HtM
BbS.okacop071.info/PoSt/1120_216419.HtM
BbS.okacop072.info/PoSt/1120_081587.HtM
BbS.okacop073.info/PoSt/1120_355225.HtM
BbS.okacop074.info/PoSt/1120_639010.HtM
BbS.okacop075.info/PoSt/1120_367809.HtM
BbS.okacop076.info/PoSt/1120_230484.HtM
BbS.okacop077.info/PoSt/1120_886831.HtM
BbS.okacop078.info/PoSt/1120_155159.HtM
BbS.okacop079.info/PoSt/1120_357583.HtM
BbS.okacop080.info/PoSt/1120_847139.HtM
BbS.okacop071.info/PoSt/1120_858963.HtM
BbS.okacop072.info/PoSt/1120_445416.HtM
BbS.okacop073.info/PoSt/1120_117842.HtM
BbS.okacop074.info/PoSt/1120_305775.HtM
BbS.okacop075.info/PoSt/1120_913416.HtM
BbS.okacop076.info/PoSt/1120_015776.HtM
BbS.okacop077.info/PoSt/1120_421666.HtM
BbS.okacop078.info/PoSt/1120_948912.HtM
BbS.okacop079.info/PoSt/1120_670567.HtM
BbS.okacop080.info/PoSt/1120_659479.HtM
BbS.okacop071.info/PoSt/1120_941307.HtM
BbS.okacop072.info/PoSt/1120_386276.HtM
BbS.okacop073.info/PoSt/1120_498024.HtM
BbS.okacop074.info/PoSt/1120_995372.HtM
BbS.okacop075.info/PoSt/1120_041387.HtM
BbS.okacop076.info/PoSt/1120_603689.HtM
BbS.okacop077.info/PoSt/1120_600193.HtM
BbS.okacop078.info/PoSt/1120_849300.HtM
BbS.okacop079.info/PoSt/1120_049874.HtM
BbS.okacop080.info/PoSt/1120_437571.HtM
BbS.okacop071.info/PoSt/1120_697105.HtM
BbS.okacop072.info/PoSt/1120_201069.HtM
BbS.okacop073.info/PoSt/1120_599851.HtM
BbS.okacop074.info/PoSt/1120_126313.HtM
BbS.okacop075.info/PoSt/1120_074107.HtM
BbS.okacop076.info/PoSt/1120_334421.HtM
BbS.okacop077.info/PoSt/1120_802018.HtM
BbS.okacop078.info/PoSt/1120_840610.HtM
BbS.okacop079.info/PoSt/1120_527562.HtM
BbS.okacop080.info/PoSt/1120_262899.HtM



查看11道真题和解析