Metrics

accuracy

评价指标accuracy类

class mindnlp.engine.metrics.accuracy.Accuracy(name='Accuracy')[源代码]

基类：Metric

计算准确率accuracy。函数如下所示：

\[\text{ACC} =\frac{\text{TP} + \text{TN}} {\text{TP} + \text{TN} + \text{FP} + \text{FN}}\]

其中`ACC`是准确率accuracy，`TP`是正确预测的正样本数量，`TN`是正确预测的负样本数量，`FP`是错误预测的正样本数量，`FN`是错误预测的负样本数量。

参数: name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, Tensor
>>> from mindnlp.common.metrics import Accuracy
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> metric = Accuracy()
>>> metric.update(preds, labels)
>>> acc = metric.eval()
>>> print(acc)
0.6666666666666666

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回准确率accuracy。

返回

acc (float) - 计算得到的结果。

抛出

RuntimeError – 如果样本数为0。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`preds`和`labels`。 - preds (Union[Tensor, list, numpy.ndarray]): 预测值。范围在`[0, 1]`的浮点数列表，大多数情况下（不严格要求），shape是`(N, C)`，其中`N`是样本数， ` C ` 是类别数， - labels (Union[Tensor, list, numpy.ndarray]): 真实值。必须是one-hot格式，shape为 ` (N, C) ` ，或能够转换为one-hot格式，shape为`(N,)`。

抛出

ValueError – 如果`inputs`数量不为2。
ValueError – 上一个输入的预测值的类的数量与当前预测值不匹配。

bleu

评价指标BleuScore类

class mindnlp.engine.metrics.bleu.BleuScore(n_size=4, weights=None, name='BleuScore')[源代码]

基类：Metric

计算BLEU分数。BLEU (bilingual evaluation understudy)是一种用于评价机器翻译文本质量的指标。它使用一种精确度precision的修正形式来比较候选翻译与多个参考翻译。函数如下所示：

\[ \begin{align}\begin{aligned}\begin{split}BP & = \begin{cases} 1, & \text{if }c>r \\ e_{1-r/c}, & \text{if }c\leq r \end{cases}\end{split}\\BLEU & = BP\exp(\sum_{n=1}^N w_{n} \log{p_{n}})\end{aligned}\end{align} \]

其中`c`是候选句子的长度，`r`是参考句子的长度。

参数

n_size (int) – N_gram值的范围从1到4。默认：4。
weights (Union[list, None]) – 每个gram的precision的权重。默认为None。
name (str) – metric名称。

抛出

ValueError – 如果`n_size`的值范围不是从1到4。
ValueError – 如果`weights`的长度不等于`n_size`。

示例

>>> from mindnlp.common.metrics import BleuScore
>>> cand = [["The", "cat", "The", "cat", "on", "the", "mat"]]
>>> ref_list = [[["The", "cat", "is", "on", "the", "mat"],
                ["There", "is", "a", "cat", "on", "the", "mat"]]]
>>> metric = BleuScore()
>>> metric.update(cand, ref_list)
>>> bleu_score = metric.eval()
>>> print(bleu_score)
0.46713797772820015

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回BLEU分数。

返回

bleu_score (float) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`cand`和`ref_list`。 - cand (list): 分词后的候选句子列表。 - ref_list (list): 分词后的真实句子列表。

抛出

ValueError – 如果输入数量不为2。
ValueError – 如果`cand`和`ref_list`的长度不相等

confusion_matrix

评价指标ConfusionMatrix类

class mindnlp.engine.metrics.confusion_matrix.ConfusionMatrix(class_num=2, name='ConfusionMatrix')[源代码]

基类：Metric

计算混淆矩阵confusion matrix。混淆矩阵confusion matrix被广泛用于评价分类模型的性能，包括二分类和多分类。

参数

class_num (int) – 数据集中类的数量。默认：2。
name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import ConfusionMatrix
>>> preds = Tensor(np.array([1, 0, 1, 0]))
>>> labels = Tensor(np.array([1, 0, 0, 1]))
>>> metric = ConfusionMatrix()
>>> metric.update(preds, labels)
>>> conf_mat = metric.eval()
>>> print(conf_mat)
[[1. 1.]
 [1. 1.]]

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回混淆矩阵。

返回

conf_mat (np.ndarray) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`preds`和`labels`。 - preds (Union[Tensor, list, np.ndarray]): 预测值。浮点数列表，shape是`(N, C)`或`(N,)`。 - labels (Union[Tensor, list, np.ndarray]): 真实值。shape是`(N,)`。

抛出

ValueError – 如果输入数量不为2。
ValueError – 如果`preds`和`labels`没有有效的维度。

distinct

评价指标Distinct类

class mindnlp.engine.metrics.distinct.Distinct(n_size=2, name='Distinct')[源代码]

基类：Metric

计算Distinct-N。Distinct-N是一种测量句子多样性的指标。它关注一个句子中不同n-gram的数量。不同n-gram的数量越多，文本的多样性越高。函数如下所示：

参数

n_size (int) – N_gram值。默认：2。
name (str) – metric名称。

示例

>>> from mindnlp.common.metrics import Distinct
>>> cand_list = ["The", "cat", "The", "cat", "on", "the", "mat"]
>>> metric = Distinct()
>>> metric.update(cand_list)
>>> distinct_score = metric.eval()
>>> print(distinct_score)
0.8333333333333334

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回Distinct-N。

返回

distinct_score (float) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数: inputs – 输入`cand_list`。 - cand_list (list): 分词后的候选句子列表。
抛出: ValueError – 如果输入数量不为1。

em_score

评价指标EmScore类

class mindnlp.engine.metrics.em_score.EmScore(name='EmScore')[源代码]

基类：Metric

计算exact match(EM)分数。这个指标测量预测值精准匹配任一真实值的百分比。

参数: name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import EmScore
>>> preds = "this is the best span"
>>> examples = ["this is a good span", "something irrelevant"]
>>> metric = EmScore()
>>> metric.update(preds, examples)
>>> em_score = metric.eval()
>>> print(em_score)
0.0

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回EM分数。

返回: - exact_match (float) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`preds`和`examples`。 - preds (Union[str, list]): 预测值。- examples (list): 真实值。

抛出

ValueError – 如果输入数量不为2。
RuntimeError – 如果`preds`和`examples`长度不同。

f1

评价指标F1Score类

class mindnlp.engine.metrics.f1.F1Score(name='F1Score')[源代码]

基类：Metric

计算F1分数。Fbeta分数是precision和recall的加权平均，F1分数是Fbeta的一种特殊情况，此时beta为1。函数如下所示：

\[F_1=\frac{2\cdot TP}{2\cdot TP + FN + FP}\]

其中`TP`是正确预测的正样本数量，`FN`是错误预测的负样本数量，`FP`是错误预测的正样本数量。

参数: name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import F1Score
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> labels = Tensor(np.array([1, 0, 1]))
>>> metric = F1Score()
>>> metric.update(preds, labels)
>>> f1_s = metric.eval()
>>> print(f1_s)
[0.6666666666666666 0.6666666666666666]

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回F1分数。

返回

f1_s (numpy.ndarray) - 计算得到的结果。

抛出

RuntimeError – 如果样本数为0。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`preds`和`labels`。 - preds (Union[Tensor, list, np.ndarray]): 预测值。范围在`[0, 1]`的浮点数列表，大多数情况下（不严格要求），shape是`(N, C)`，其中`N`是样本数，C`是类别数。 - labels (Union[Tensor, list, np.ndarray]): 真实值。必须是one-hot格式，shape为`(N, C)，或能够转换为one-hot格式，shape为`(N,)`。

抛出

ValueError – 如果输入数量不为2。
ValueError – 上一输入预测值的类的数量与当前预测值不匹配。
ValueError – 如果`preds`和`labels`没有相同数量的类。

matthews

评价指标MatthewsCorrelation类

class mindnlp.engine.metrics.matthews.MatthewsCorrelation(name='MatthewsCorrelation')[源代码]

基类：Metric

计算Matthews相关系数(MCC)。MCC本质上是观测的和预测的二分类之间的相关系数；它返回了一个介于-1和+1之间的值。系数+1表示一次完美的预测，0不比随机预测好，−1表示预测值和观测值完全不同。函数如下所示：

\[MCC=\frac{TP \times TN-FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}\]

其中`TP`是正确预测的正样本数，`TN`是正确预测的负样本数，`FN`是错误预测的负样本数，`FP`是错误预测的正样本数。

参数: name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import MatthewsCorrelation
>>> preds = [[0.8, 0.2], [-0.5, 0.5], [0.1, 0.4], [0.6, 0.3], [0.6, 0.3]]
>>> labels = [0, 1, 0, 1, 0]
>>> metric = MatthewsCorrelation()
>>> metric.update(preds, labels)
>>> m_c_c = metric.eval()
>>> print(m_c_c)
0.16666666666666666

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回MCC。

返回

m_c_c (float) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数: inputs – 输入`preds` and labels。 - preds (Union[Tensor, list, numpy.ndarray]): 预测值。范围在`[0, 1]`的浮点数列表，大多数情况下（不严格要求），shape是`(N, C)`，其中`N`是样本数，C`是类别数。 - labels (Union[Tensor, list, numpy.ndarray]): 真实值。必须是one-hot格式，shape为`(N, C)，或能够转换为one-hot格式，shape为`(N,)`。
抛出: ValueError – 如果输入数量不为2。

pearson

评价指标PearsonCorrelation类

class mindnlp.engine.metrics.pearson.PearsonCorrelation(name='PearsonCorrelation')[源代码]

基类：Metric

计算Pearson相关系数(PCC)。PCC是两组数据之间线性相关系数的测量方式。它是两个变量之间的协方差之比，他们标准差的乘积；因此，它本质上是协方差的归一化测量，结果值总在−1和1之间。

参数: name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import PearsonCorrelation
>>> preds = Tensor(np.array([[0.1], [1.0], [2.4], [0.9]]), mindspore.float32)
>>> labels = Tensor(np.array([[0.0], [1.0], [2.9], [1.0]]), mindspore.float32)
>>> metric = PearsonCorrelation()
>>> metric.update(preds, labels)
>>> p_c_c = metric.eval()
>>> print(p_c_c)
0.9985229081857804

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回PCC。

返回

p_c_c (float) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`preds`和`labels`。 - preds (Union[Tensor, list, np.ndarray]): 预测值。浮点数列表，shape为`(N, 1)`。 - labels (Union[Tensor, list, np.ndarray]): 真实值。浮点数列表，shape为`(N, 1)`。

抛出

ValueError – 如果输入数量不为2。
RuntimeError – 如果`preds`和`labels`长度不同。

perplexity

评价指标Perplexity类

class mindnlp.engine.metrics.perplexity.Perplexity(ignore_label=None, name='Perplexity')[源代码]

基类：Metric

计算perplexity。Perplexity衡量概率模型预测样本的能力。低的perplexity说明模型善于预测样本。函数如下所示：

\[PP(W)=P(w_{1}w_{2}...w_{N})^{-\frac{1}{N}}=\sqrt[N]{\frac{1}{P(w_{1}w_{2}...w_{N})}}\]

其中`w`表示语料库中的词。

参数

ignore_label (Union[int, None]) – 计数时要忽略的无效标签的索引。如果设置为`None`，它意味着没有无效标签。默认：None。
name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import Perplexity
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> labels = Tensor(np.array([1, 0, 1]))
>>> metric = Perplexity()
>>> metric.update(preds, labels)
>>> ppl = metric.eval()
>>> print(ppl)
2.231443166940565

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回perplexity。

返回

ppl (float) - 计算得到的结果。

抛出

RuntimeError – 如果样本数为0。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`preds` and labels。 - preds (Union[Tensor, list, np.ndarray]): 预测值。范围在`[0, 1]`的浮点数列表，大多数情况下（不严格要求），shape是`(N, C)`，其中`N`是样本数，C`是类别数。 - labels (Union[Tensor, list, np.ndarray]): 真实值。必须是one-hot格式，shape为`(N, C)，或能够转换为one-hot格式，shape为`(N,)`。

抛出

ValueError – 如果`inputs`数量不为2。
RuntimeError – 如果`preds`和`labels`长度不同。
RuntimeError – 如果`pred`和`label`有不同的shape。

precision

评价指标Precision类。

class mindnlp.engine.metrics.precision.Precision(name='Precision')[源代码]

基类：Metric

计算精确度precision。精确度precision（也称为正预测值）是预测的正样本中的实际正样本比例。它只被用来评价二分类任务的精确度precision分数。函数如下所示：

\[\text{Precision} =\frac{\text{TP}} {\text{TP} + \text{FP}}\]

其中`TP`是正确预测的正样本数，`FP`是错误预测的正样本数。

参数: name (str) – metric名称。

示例

>>> from mindnlp.common.metrics import Precision
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> metric = Precision()
>>> metric.update(preds, labels)
>>> prec = metric.eval()
>>> print(prec)
[0.5 1. ]

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回精确度precision。

返回

prec (numpy.ndarray) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。如果预测值的最大值的索引与标签匹配，则预测结果是正确的。

参数

inputs – 输入`preds`和`labels`。 - preds (Union[Tensor, list, numpy.ndarray]): 预测值。范围在:math:[0, 1]`的浮点数列表，大多数情况下（不严格要求），shape是`(N, C)，其中`N`是样本数，C`是类别数。 - labels (Union[Tensor, list, numpy.ndarray]): 真实值。必须是one-hot格式，shape为`(N, C)，或能够转换为one-hot格式，shape为`(N,)`。

抛出

ValueError – 如果输入数量不为2。
ValueError – 如果`preds`和`labels`没有相同数量的类。

recall

评价指标Recall类

class mindnlp.engine.metrics.recall.Recall(name='Recall')[源代码]

基类：Metric

计算召回率recall。召回率也指真的正确率或灵敏度。函数如下所示：

\[\text{Recall} =\frac{\text{TP}} {\text{TP} + \text{FN}}\]

其中`TP`是正确预测的正样本数，`FN`是错误预测的负样本数。

参数: name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import Recall
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> metric = Recall()
>>> metric.update(preds, labels)
>>> rec = metric.eval()
>>> print(rec)
[1. 0.5]

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回召回率recall。

返回

rec (numpy.ndarray) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`preds`和`labels`。 - preds (Union[Tensor, list, np.ndarray]): 预测值。范围在`[0, 1]`的浮点数列表，大多数情况下（不严格要求），shape是`(N, C)`，其中`N`是样本数，C`是类别数。 - labels (Union[Tensor, list, np.ndarray]): 真实值。必须是one-hot格式，shape为`(N, C)，或能够转换为one-hot格式，shape为`(N,)`。

抛出

ValueError – 如果输入数量不为2。
ValueError – 如果`preds`和`labels`没有相同数量的类。

rouge

评价指标RougeN和RougeL类

class mindnlp.engine.metrics.rouge.RougeL(beta=1.2, name='RougeL')[源代码]

基类：Metric

计算ROUGE-L分数。ROUGE (Recall-Oriented Understudy for Gisting Evaluation)是一组用于评估自动摘要和机器翻译模型的指标。ROUGE-L基于最长公共子序列(LCS)计算。函数如下所示：

\[ \begin{align}\begin{aligned}R_{l c s}=\frac{L C S(X, Y)}{m}\\p_{l c s}=\frac{L C S(X, Y)}{n}\\F_{l c s}=\frac{\left(1+\beta^{2}\right) R_{l c s} P_{l c s}}{R_{l c s}+\beta^{2} P_{l c s}}\end{aligned}\end{align} \]

其中`X`是候选句子，`Y`是参考句子。`m`和`n`分别表示`X`和`Y`的长度。`LCS`意味着最长公共子序列。

参数

beta (float) – 一个决定召回率recall权重的超参数。默认：1.2。
name (str) – metric名称。

示例

>>> from mindnlp.common.metrics import RougeL
>>> cand_list = ["The","cat","The","cat","on","the","mat"]
>>> ref_list = [["The","cat","is","on","the","mat"],
                ["There","is","a","cat","on","the","mat"]]
>>> metric = RougeL()
>>> metric.update(cand_list, ref_list)
>>> rougel_score = metric.eval()
>>> print(rougel_score)
0.7800511508951408

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回Rouge-L分数。

返回

rougel_score (float) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数: inputs – 输入`cand_list`和`ref_list`。 cand_list (list): 分词后的候选句子列表。 ref_list (list): 分词后的真实句子列表。
抛出: ValueError – 如果输入数量不为2。

class mindnlp.engine.metrics.rouge.RougeN(n_size=1, name='RougeN')[源代码]

基类：Metric

计算ROUGE-N分数。ROUGE(Recall-Oriented Understudy for Gisting Evaluation)是一组用于评估自动摘要和机器翻译模型的指标。ROUGE-N指的是候选句子和参考摘要之间的n-gram重叠。

参数

n_size (int) – N_gram值。默认：1。
name (str) – metric名称。

示例

>>> from mindnlp.common.metrics import RougeN
>>> cand_list = ["the", "cat", "was", "found", "under", "the", "bed"]
>>> ref_list = [["the", "cat", "was", "under", "the", "bed"]]
>>> metric = RougeN(2)
>>> metric.update(cand_list, ref_list)
>>> rougen_score = metric.eval()
>>> print(rougen_score)
0.8

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回Rouge-N分数。

返回

rougen_score (float) - 计算得到的结果。

抛出

RuntimeError – 如果参考句子的长度为0。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数: inputs – 输入`cand_list`和`ref_list`。 - cand_list (list): 分词后的候选句子列表。 - ref_list (list): 分词后的真实句子列表。
抛出: ValueError – 如果输入数量不为2。

spearman

评价指标Spearman类

class mindnlp.engine.metrics.spearman.SpearmanCorrelation(name='SpearmanCorrelation')[源代码]

基类：Metric

计算斯皮尔曼秩相关系数（Spearman’s rank correlation coefficient，SRCC）。它是一种非参数的秩相关性（两个变量的秩之间的统计相关性）度量。它使用单调函数，评估了模型描述两个变量之间的关系的能力。如果不存在重复的数值，当每一个变量都是另一个变量的一个完美的单调函数时，此时的斯皮尔曼相关系数是完美的，为+1或−1。

参数: name (str) – metric名称。

示例

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import SpearmanCorrelation
>>> preds = Tensor(np.array([[0.1], [1.0], [2.4], [0.9]]), mindspore.float32)
>>> labels = Tensor(np.array([[0.0], [1.0], [2.9], [1.0]]), mindspore.float32)
>>> metric = SpearmanCorrelation()
>>> metric.update(preds, labels)
>>> s_r_c_c = metric.eval()
>>> print(s_r_c_c)
1.0

clear()[源代码]: 清空中间评估结果。

eval()[源代码]

计算并返回SRCC。

返回

s_r_c_c (float) - 计算得到的结果。

get_metric_name()[源代码]: 返回metric名称。

update(*inputs)[源代码]

更新局部变量。

参数

inputs – 输入`preds`和`labels`。 - preds (Union[Tensor, list, np.ndarray]): 预测值。浮点数列表，shape为`(N, 1)`。 - labels (Union[Tensor, list, np.ndarray]): 真实值。浮点数列表，shape为`(N, 1)`。

抛出

ValueError – 如果输入数量不为2。
RuntimeError – 如果`preds`和`labels`长度不同。

Callbacks.