Attentions

注意力机制模块

class mindnlp.modules.attentions.AdditiveAttention(hidden_dims, dropout=0.9)[源代码]

基类:Cell

加性注意力机制由文章《Neural Machine Translation by Jointly Learning to Align and Translate》提出

\[加性注意力公式:Attention(Q,K,V) = (W_v)T *(tanh(W_q * Q + W_k * K))\]
参数
  • hidden_dims (int) – 向量hidden state的维度大小

  • dropout (float) – 保留率,数值在0和1之间的,例如0.9表示dropout率为10%,保留率为90%

示例

>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import AdditiveAttention
>>> model = AdditiveAttention(hidden_dims=512, dropout=0.9)
>>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32)
>>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> v = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> output, attn = model(q, k, v, mask)
>>> print(output.shape, attn.shape)
(2, 32, 512) (2, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[源代码]

Additive attention网络前向传播

参数
  • query (mindspore.Tensor) – query向量,一般为[batch size, query_size, hidden_size]

  • key (mindspore.Tensor) – key向量,一般为[batch size, key_size, hidden_size]

  • value (mindspore.Tensor) – value向量,一般为[batch size, seq_len, value_hidden_size]

  • Optional[mindspore.Tensor[bool]] (mask) – mask张量,一般为[batch size, query_size, key_size]

返回

线性注意力输出output和注意力分数attn线性注意力输出output,维度参考[batch_size, query_size, value_hidden_size]线性注意力分数attn,维度参考[batch_size, query_size, key_size]

class mindnlp.modules.attentions.BinaryAttention[源代码]

基类:Cell

针对所给的2个序列所表示的向量如x_i和y_iBiAttentio模块会根据以下公式计算注意力结果公式:

\[\begin{split} \begin{array}{ll} \\ e_{ij} = {x}^{\mathrm{T}}_{i}{y}_{j} \\ {\hat{x}}_{i} = \sum_{j=1}^{\mathcal{l}_{y}}{\frac{ \mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{y}}{\mathrm{exp}(e_{ik})}}}{y}_{j} \\ {\hat{y}}_{j} = \sum_{i=1}^{\mathcal{l}_{x}}{\frac{ \mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{x}}{\mathrm{exp}(e_{ik})}}}{x}_{i} \\ \end{array}\end{split}\]

示例

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import BinaryAttention
>>> model = BinaryAttention()
>>> standard_normal = ops.StandardNormal(seed=0)
>>> x = standard_normal((2, 30, 512))
>>> y = standard_normal((2, 20, 512))
>>> x_mask = Tensor(np.zeros_like(x.shape[:-1]), mindspore.float32)
>>> y_mask = Tensor(np.zeros_like(y.shape[:-1]), mindspore.float32)
>>> output_x, output_y = model(x, x_mask, y, y_mask)
>>> print(output_x.shape, output_y.shape)
(2, 30, 512) (2, 20, 512)
construct(x_batch, x_mask, y_batch, y_mask)[源代码]

计算注意力结果

参数
  • x_batch (mindspore.Tensor) – 参考shape维度 [batch_size, x_seq_len, hidden_size]

  • x_mask (mindspore.Tensor) – 参考shape维度 [batch_size, x_seq_len]

  • y_batch (mindspore.Tensor) – 参考shape维度 [batch_size, y_seq_len, hidden_size]

  • y_mask (mindspore.Tensor) – 参考shape维度 [batch_size, y_seq_len]

返回

attended_x,x向量的注意力在y向量下的注意力结果attended_y,y向量的注意力在x向量下的注意力结果

class mindnlp.modules.attentions.CosineAttention(dropout=0.9)[源代码]

基类:Cell

Cosine Attention由论文《Neural Turing Machines》提出

\[Sim(Q, K) = (Q * (K)T) / |Q| * |K| Attention(Q,K,V) = softmax(Sim(Q, K)) * V\]
参数

dropout (float) – 保留率,数值在0和1之间的,例如0.9表示dropout率为10%,保留率为90%

示例

>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import CosineAttention
>>> model = CosineAttention(dropout=0.9)
>>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32)
>>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> v = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> output, attn = model(q, k, v, mask)
>>> print(output.shape, attn.shape)
(2, 32, 512) (2, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[源代码]

Consine attention 网络前向传播

参数
  • query (mindspore.Tensor) – query向量,一般为[batch size, query_size, hidden_size]

  • key (mindspore.Tensor) – key向量,一般为[batch size, key_size, hidden_size]

  • value (mindspore.Tensor) – value向量,一般为[batch size, seq_len, value_hidden_size]

  • Optional[mindspore.Tensor[bool]] (mask) – mask张量,一般为[batch size, query_size, key_size]

返回

线性注意力输出output和注意力分数attn线性注意力输出output,维度参考[batch_size, query_size, value_hidden_size]线性注意力分数attn,维度参考[batch_size, query_size, key_size]

class mindnlp.modules.attentions.LinearAttention(query_dim, key_dim, hidden_dim, dropout=0.9)[源代码]

基类:Cell

线性注意力计算的方式采用对query向量和key向量进行拼接进行计算

参数
  • query_size (int) – query的序列长度,通常为query.shape[-2]

  • key_size (int) – key的序列长度,通常为key.shape[-2]

  • hidden_dim (int) – The dimension of hidden vector

  • dropout (float) – 保留率,数值在0和1之间的,例如0.9

示例

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import LinearAttention
>>> standard_normal = ops.StandardNormal(seed=0)
>>> query = standard_normal((2, 32, 512))
>>> key = standard_normal((2, 20, 512))
>>> value = standard_normal((2, 20, 500))
>>> net = LinearAttention(batch_size=2, query_dim=32, key_dim=20, hidden_dim=512)
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> output, attn = net(query, key, value, mask)
>>> print(output.shape, attn.shape)
(2, 32, 512) (2, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[源代码]

linear attention网络前向传播

参数
  • query (mindspore.Tensor) – query向量,一般为[batch size, query_size, hidden_size]

  • key (mindspore.Tensor) – key向量,一般为[batch size, key_size, hidden_size]

  • value (mindspore.Tensor) – value向量,一般为[batch size, seq_len, value_hidden_size]

  • Optional[mindspore.Tensor[bool]] (mask) – mask张量,一般为[batch size, query_size, key_size]

返回

线性注意力输出output和注意力分数attn线性注意力输出output,维度参考[batch_size, query_size, value_hidden_size]线性注意力分数attn,维度参考[batch_size, query_size, key_size]

class mindnlp.modules.attentions.LocationAwareAttention(hidden_dim, smoothing=False)[源代码]

基类:Cell

Location Aware Attention由论文《Attention-Based Models for Speech Recognition》提出

参数
  • hidden_dim (int) – 隐藏状态维度

  • smoothing (bool) – 《Attention-Based Models for Speech Recognition》论文中的Smoothing label

示例

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops, Tensor
>>> from mindspore.text.modules.attentions import LocationAwareAttention
>>> hidden_dim = 20
>>> standard_normal = ops.StandardNormal(seed=0)
>>> query = standard_normal((batch_size, 1, hidden_dims))
>>> value = standard_normal((batch_size, seq_len, hidden_dims))
>>> last_attn = standard_normal((batch_size, seq_len))
>>> net = LocationAwareAttention(
    hidden_dim=20,
    smoothing=False)
>>> mask_shape = (batch_size, seq_len)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> net.set_mask(mask)
>>> cont, attn = net(query, value, last_attn)
>>> print(cont.shape, attn.shape)
(2, 1, 20) (2, 40)
construct(query, value, last_attn=None)[源代码]

Location aware attention网络前向传播

参数
  • query (mindspore.Tensor) – query向量,同时也是Decoder的hidden states,shape参考[batch_size, 1, decoder_dim]

  • value (mindspore.Tensor) – key向量,同时也是Encoder的hidden states,shape参考[batch_size, seq_len, encoder_dim]

  • last_attn (mindspore.Tensor) – 最后一个状态的注意力分数,shape参考[batch_size, seq_len]

返回

context上下文向量,shape参考[batch_size, 1, decoder_dim]attn当前状态下的注意力分数,shape参考[batch_size, seq_len]

set_mask(mask)[源代码]

设置mask张量

mask张量类型为mindspore.Tensor[bool]

class mindnlp.modules.attentions.MutiHeadAttention(heads=8, d_model=512, dropout=0.9, bias=False, attention_mode='dot')[源代码]

基类:Cell

多头注意力机制由论文《attention is all you need》提出当head数为1时,多头注意力机制及正常的自注意力机制

参数
  • head (int) – The number of head. Default: 8.

  • d_model (int) – query、key、value向量隐藏维度大小,默认为512

  • dropout (float) – 保留率,数值在0和1之间的,例如0.9

  • bias (bool) – whether to use a bias vector. Default: True.

  • attention_mode (str) – attention mode. Default: “dot”.

示例

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import MutiHeadAttention
>>> standard_normal = ops.StandardNormal(seed=0)
>>> # query is [batch_size, seq_len_q, hidden_size]
>>> q = standard_normal((2, 32, 512))
>>> # key is [batch_size, seq_len_k, hidden_size]
>>> k = standard_normal((2, 20, 512))
>>> # value is [batch_size, seq_len_k, hidden_size]
>>> v = standard_normal((2, 20, 512))
>>> # query shape is (2, 32 ,512)->(2, 8, 32, 64) and key shape is (2, 20 ,512)->(2, 8, 20, 64)
>>> # query * key.transpose(-1, -2): (2, 8, 32, 64) * (2, 8, 64, 20) ->(2, 8, 32, 20)
>>> # equal with mask shape
>>> # [batch_size, seq_len_q, seq_len_k]
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> # use additive attention
>>> net = MutiHeadAttention(heads=8, attention_mode="add")
>>> x, attn = net(query, key, value, mask)
>>> print(x.shape, attn.shape)
(2, 32, 512) (2, 8, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[源代码]

获取多头注意力输出和其注意力分数

参数
  • query (mindspore.Tensor) – query向量,维度参考[batch_size, seq_len, d_model]

  • key (mindspore.Tensor) – key向量,维度参考[batch_size, seq_len, d_model]

  • value (mindspore.Tensor) – value向量,维度参考[batch_size, seq_len, d_model]

  • Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, seq_len, seq_len]

返回

output多头注意力输出attn多头注意力的注意力分数

class mindnlp.modules.attentions.ScaledDotAttention(dropout=0.9)[源代码]

基类:Cell

缩放点乘注意力机制由论文《Attention Is All You Need》提出

\[Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V\]
参数

dropout (float) – 保留率,数值在0和1之间的,例如0.9表示dropout率为10%,保留率为90%

示例

>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import ScaledDotAttention
>>> model = ScaledDotAttention(dropout=0.9)
>>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32)
>>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> v = Tensor(np.ones((2, 20, 400)), mindspore.float32)
>>> output, att = model(q, k, v)
>>> print(output.shape)
# (2, 1024, 512)
>>> print(att.shape)
# (2, 1024, 32)
construct(query, key, value, mask: Optional[Tensor] = None)[源代码]

缩放点乘注意力网络前向传播

参数
  • query (mindspore.Tensor) – query向量,一般为[batch size, query_size, hidden_size]

  • key (mindspore.Tensor) – key向量,一般为[batch size, key_size, hidden_size]

  • value (mindspore.Tensor) – value向量,一般为[batch size, seq_len, value_hidden_size]

  • Optional[mindspore.Tensor[bool]] (mask) – mask张量,一般为[batch size, query_size, key_size]

返回

线性注意力输出output和注意力分数attn线性注意力输出output,维度参考[batch_size, query_size, value_hidden_size]线性注意力分数attn,维度参考[batch_size, query_size, key_size]

class mindnlp.modules.attentions.SelfAttention(d_model=512, dropout_rate=0.1, bias=False, attention_mode='dot')[源代码]

基类:Cell

自注意力机制由论文《Attention Is All You Need》提出”

参数
  • d_model (int) – query、key、value向量隐藏维度大小,默认为512

  • dropout (float) – 保留率,数值在0和1之间的,例如0.9

  • bias (bool) – whether to use a bias vector. Default: True.

  • attention_mode (str) – attention mode. Default: “dot”.

示例

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import SelfAttention
>>> standard_normal = ops.StandardNormal(seed=0)
>>> query = standard_normal((2, 32, 512))
>>> key = standard_normal((2, 20, 512))
>>> value = standard_normal((2, 20, 512))
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> net = SelfAttention()
>>> output, attn = net(query, key, value, mask)
>>> print(x.shape, attn.shape)
(2, 32, 512) (2, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[源代码]

获取自注意力输出和其注意力分数

参数
  • query (mindspore.Tensor) – The query vector.

  • key (mindspore.Tensor) – The key vector.

  • value (mindspore.Tensor) – value向量,维度参考[batch_size, seq_len, d_model]

  • Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, seq_len, seq_len]

返回

output多头注意力输出attn多头注意力的注意力分数