使用 Python 中的 sklearn 进行同质性评分

原文:https://www . geesforgeks . org/同质 _ score-using-sklearn-in-python/

一个完全同质的聚类是指每个聚类都有指向相似类别标签的信息。同质性描绘了聚类算法对这种(同质性 _ 得分)完美性的接近程度。

这个度量独立于标签的直接值。聚类标签值的排列不会以任何方式改变分值。

语法: sklearn.metrics .同质性 _score(labels_true,labels_pred)

度量不对称,将标签 _ 真切换为标签 _ 预解码将返回完整性分数

参数:

  • 标签 _ 真:< int array,shape =【n _ samples】>:它接受地面真类标签作为参考。
  • 标签 _pred: < 阵列状的形状(n_samples), > : 它接受要评估的聚类标签。

返回:

同质:< 浮动 > : 其返回 0.0 到 1.0 之间的分数代表完全同质标注。

示例 1:

Python 3

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import homogeneity_score

# Changing the location file
# cd C:\Users\Dev\Desktop\Credit Card Fraud

# Loading the data
df = pd.read_csv('creditcard.csv')

# Separating the dependent and independent variables
y = df['Class']
X = df.drop('Class', axis=1)

# Building the clustering model
kmeans = KMeans(n_clusters=2)

# Training the clustering model
kmeans.fit(X)

# Storing the predicted Clustering labels
labels = kmeans.predict(X)

# Evaluating the performance
homogeneity_score(y, labels)

输出:

0.00496764949717645

实施例 2: 完全均匀:

Python 3

from sklearn.metrics.cluster import homogeneity_score

# Evaluate the score
hscore = homogeneity_score([0, 1, 0, 1], [1, 0, 1, 0])

print(hscore)

输出:

1.0

示例 3: 进一步将类分成更多簇的非完美标签可以是完全同质的:

Python 3

from sklearn.metrics.cluster import homogeneity_score

# Evaluate the score
hscore = homogeneity_score([0, 0, 1, 1], [0, 1, 2, 3])

print(hscore)

输出:

0.9999999999999999

示例 4: 包含不同类别的样本不适合同质标记:

Python 3

from sklearn.metrics.cluster import homogeneity_score

# Evaluate the score
hscore = homogeneity_score([0, 0, 1, 1], [0, 1, 0, 1])

print(hscore)

输出:

0.0