Attentions
注意力机制模块
- class mindnlp.modules.attentions.AdditiveAttention(hidden_dims, dropout=0.9)[源代码]
基类:
Cell加性注意力机制由文章《Neural Machine Translation by Jointly Learning to Align and Translate》提出
\[加性注意力公式:Attention(Q,K,V) = (W_v)T *(tanh(W_q * Q + W_k * K))\]- 参数
hidden_dims (int) – 向量hidden state的维度大小
dropout (float) – 保留率,数值在0和1之间的,例如0.9表示dropout率为10%,保留率为90%
示例
>>> import mindspore >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import AdditiveAttention >>> model = AdditiveAttention(hidden_dims=512, dropout=0.9) >>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32) >>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> v = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> output, attn = model(q, k, v, mask) >>> print(output.shape, attn.shape) (2, 32, 512) (2, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[源代码]
Additive attention网络前向传播
- 参数
query (mindspore.Tensor) – query向量,一般为[batch size, query_size, hidden_size]
key (mindspore.Tensor) – key向量,一般为[batch size, key_size, hidden_size]
value (mindspore.Tensor) – value向量,一般为[batch size, seq_len, value_hidden_size]
Optional[mindspore.Tensor[bool]] (mask) – mask张量,一般为[batch size, query_size, key_size]
- 返回
线性注意力输出output和注意力分数attn线性注意力输出output,维度参考[batch_size, query_size, value_hidden_size]线性注意力分数attn,维度参考[batch_size, query_size, key_size]
- class mindnlp.modules.attentions.BinaryAttention[源代码]
基类:
Cell针对所给的2个序列所表示的向量如x_i和y_iBiAttentio模块会根据以下公式计算注意力结果公式:
\[\begin{split} \begin{array}{ll} \\ e_{ij} = {x}^{\mathrm{T}}_{i}{y}_{j} \\ {\hat{x}}_{i} = \sum_{j=1}^{\mathcal{l}_{y}}{\frac{ \mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{y}}{\mathrm{exp}(e_{ik})}}}{y}_{j} \\ {\hat{y}}_{j} = \sum_{i=1}^{\mathcal{l}_{x}}{\frac{ \mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{x}}{\mathrm{exp}(e_{ik})}}}{x}_{i} \\ \end{array}\end{split}\]示例
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import BinaryAttention >>> model = BinaryAttention() >>> standard_normal = ops.StandardNormal(seed=0) >>> x = standard_normal((2, 30, 512)) >>> y = standard_normal((2, 20, 512)) >>> x_mask = Tensor(np.zeros_like(x.shape[:-1]), mindspore.float32) >>> y_mask = Tensor(np.zeros_like(y.shape[:-1]), mindspore.float32) >>> output_x, output_y = model(x, x_mask, y, y_mask) >>> print(output_x.shape, output_y.shape) (2, 30, 512) (2, 20, 512)
- construct(x_batch, x_mask, y_batch, y_mask)[源代码]
计算注意力结果
- 参数
x_batch (mindspore.Tensor) – 参考shape维度 [batch_size, x_seq_len, hidden_size]
x_mask (mindspore.Tensor) – 参考shape维度 [batch_size, x_seq_len]
y_batch (mindspore.Tensor) – 参考shape维度 [batch_size, y_seq_len, hidden_size]
y_mask (mindspore.Tensor) – 参考shape维度 [batch_size, y_seq_len]
- 返回
attended_x,x向量的注意力在y向量下的注意力结果attended_y,y向量的注意力在x向量下的注意力结果
- class mindnlp.modules.attentions.CosineAttention(dropout=0.9)[源代码]
基类:
CellCosine Attention由论文《Neural Turing Machines》提出
\[Sim(Q, K) = (Q * (K)T) / |Q| * |K| Attention(Q,K,V) = softmax(Sim(Q, K)) * V\]- 参数
dropout (float) – 保留率,数值在0和1之间的,例如0.9表示dropout率为10%,保留率为90%
示例
>>> import mindspore >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import CosineAttention >>> model = CosineAttention(dropout=0.9) >>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32) >>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> v = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> output, attn = model(q, k, v, mask) >>> print(output.shape, attn.shape) (2, 32, 512) (2, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[源代码]
Consine attention 网络前向传播
- 参数
query (mindspore.Tensor) – query向量,一般为[batch size, query_size, hidden_size]
key (mindspore.Tensor) – key向量,一般为[batch size, key_size, hidden_size]
value (mindspore.Tensor) – value向量,一般为[batch size, seq_len, value_hidden_size]
Optional[mindspore.Tensor[bool]] (mask) – mask张量,一般为[batch size, query_size, key_size]
- 返回
线性注意力输出output和注意力分数attn线性注意力输出output,维度参考[batch_size, query_size, value_hidden_size]线性注意力分数attn,维度参考[batch_size, query_size, key_size]
- class mindnlp.modules.attentions.LinearAttention(query_dim, key_dim, hidden_dim, dropout=0.9)[源代码]
基类:
Cell线性注意力计算的方式采用对query向量和key向量进行拼接进行计算
- 参数
query_size (int) – query的序列长度,通常为query.shape[-2]
key_size (int) – key的序列长度,通常为key.shape[-2]
hidden_dim (int) – The dimension of hidden vector
dropout (float) – 保留率,数值在0和1之间的,例如0.9
示例
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import LinearAttention >>> standard_normal = ops.StandardNormal(seed=0) >>> query = standard_normal((2, 32, 512)) >>> key = standard_normal((2, 20, 512)) >>> value = standard_normal((2, 20, 500)) >>> net = LinearAttention(batch_size=2, query_dim=32, key_dim=20, hidden_dim=512) >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> output, attn = net(query, key, value, mask) >>> print(output.shape, attn.shape) (2, 32, 512) (2, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[源代码]
linear attention网络前向传播
- 参数
query (mindspore.Tensor) – query向量,一般为[batch size, query_size, hidden_size]
key (mindspore.Tensor) – key向量,一般为[batch size, key_size, hidden_size]
value (mindspore.Tensor) – value向量,一般为[batch size, seq_len, value_hidden_size]
Optional[mindspore.Tensor[bool]] (mask) – mask张量,一般为[batch size, query_size, key_size]
- 返回
线性注意力输出output和注意力分数attn线性注意力输出output,维度参考[batch_size, query_size, value_hidden_size]线性注意力分数attn,维度参考[batch_size, query_size, key_size]
- class mindnlp.modules.attentions.LocationAwareAttention(hidden_dim, smoothing=False)[源代码]
基类:
CellLocation Aware Attention由论文《Attention-Based Models for Speech Recognition》提出
- 参数
hidden_dim (int) – 隐藏状态维度
smoothing (bool) – 《Attention-Based Models for Speech Recognition》论文中的Smoothing label
示例
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops, Tensor >>> from mindspore.text.modules.attentions import LocationAwareAttention >>> hidden_dim = 20 >>> standard_normal = ops.StandardNormal(seed=0) >>> query = standard_normal((batch_size, 1, hidden_dims)) >>> value = standard_normal((batch_size, seq_len, hidden_dims)) >>> last_attn = standard_normal((batch_size, seq_len)) >>> net = LocationAwareAttention( hidden_dim=20, smoothing=False) >>> mask_shape = (batch_size, seq_len) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> net.set_mask(mask) >>> cont, attn = net(query, value, last_attn) >>> print(cont.shape, attn.shape) (2, 1, 20) (2, 40)
- construct(query, value, last_attn=None)[源代码]
Location aware attention网络前向传播
- 参数
query (mindspore.Tensor) – query向量,同时也是Decoder的hidden states,shape参考[batch_size, 1, decoder_dim]
value (mindspore.Tensor) – key向量,同时也是Encoder的hidden states,shape参考[batch_size, seq_len, encoder_dim]
last_attn (mindspore.Tensor) – 最后一个状态的注意力分数,shape参考[batch_size, seq_len]
- 返回
context上下文向量,shape参考[batch_size, 1, decoder_dim]attn当前状态下的注意力分数,shape参考[batch_size, seq_len]
- class mindnlp.modules.attentions.MutiHeadAttention(heads=8, d_model=512, dropout=0.9, bias=False, attention_mode='dot')[源代码]
基类:
Cell多头注意力机制由论文《attention is all you need》提出当head数为1时,多头注意力机制及正常的自注意力机制
- 参数
head (int) – The number of head. Default: 8.
d_model (int) – query、key、value向量隐藏维度大小,默认为512
dropout (float) – 保留率,数值在0和1之间的,例如0.9
bias (bool) – whether to use a bias vector. Default: True.
attention_mode (str) – attention mode. Default: “dot”.
示例
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import MutiHeadAttention >>> standard_normal = ops.StandardNormal(seed=0) >>> # query is [batch_size, seq_len_q, hidden_size] >>> q = standard_normal((2, 32, 512)) >>> # key is [batch_size, seq_len_k, hidden_size] >>> k = standard_normal((2, 20, 512)) >>> # value is [batch_size, seq_len_k, hidden_size] >>> v = standard_normal((2, 20, 512)) >>> # query shape is (2, 32 ,512)->(2, 8, 32, 64) and key shape is (2, 20 ,512)->(2, 8, 20, 64) >>> # query * key.transpose(-1, -2): (2, 8, 32, 64) * (2, 8, 64, 20) ->(2, 8, 32, 20) >>> # equal with mask shape >>> # [batch_size, seq_len_q, seq_len_k] >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> # use additive attention >>> net = MutiHeadAttention(heads=8, attention_mode="add") >>> x, attn = net(query, key, value, mask) >>> print(x.shape, attn.shape) (2, 32, 512) (2, 8, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[源代码]
获取多头注意力输出和其注意力分数
- 参数
query (mindspore.Tensor) – query向量,维度参考[batch_size, seq_len, d_model]
key (mindspore.Tensor) – key向量,维度参考[batch_size, seq_len, d_model]
value (mindspore.Tensor) – value向量,维度参考[batch_size, seq_len, d_model]
Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, seq_len, seq_len]
- 返回
output多头注意力输出attn多头注意力的注意力分数
- class mindnlp.modules.attentions.ScaledDotAttention(dropout=0.9)[源代码]
基类:
Cell缩放点乘注意力机制由论文《Attention Is All You Need》提出
\[Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V\]- 参数
dropout (float) – 保留率,数值在0和1之间的,例如0.9表示dropout率为10%,保留率为90%
示例
>>> import mindspore >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import ScaledDotAttention >>> model = ScaledDotAttention(dropout=0.9) >>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32) >>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> v = Tensor(np.ones((2, 20, 400)), mindspore.float32) >>> output, att = model(q, k, v) >>> print(output.shape) # (2, 1024, 512) >>> print(att.shape) # (2, 1024, 32)
- construct(query, key, value, mask: Optional[Tensor] = None)[源代码]
缩放点乘注意力网络前向传播
- 参数
query (mindspore.Tensor) – query向量,一般为[batch size, query_size, hidden_size]
key (mindspore.Tensor) – key向量,一般为[batch size, key_size, hidden_size]
value (mindspore.Tensor) – value向量,一般为[batch size, seq_len, value_hidden_size]
Optional[mindspore.Tensor[bool]] (mask) – mask张量,一般为[batch size, query_size, key_size]
- 返回
线性注意力输出output和注意力分数attn线性注意力输出output,维度参考[batch_size, query_size, value_hidden_size]线性注意力分数attn,维度参考[batch_size, query_size, key_size]
- class mindnlp.modules.attentions.SelfAttention(d_model=512, dropout_rate=0.1, bias=False, attention_mode='dot')[源代码]
基类:
Cell自注意力机制由论文《Attention Is All You Need》提出”
- 参数
d_model (int) – query、key、value向量隐藏维度大小,默认为512
dropout (float) – 保留率,数值在0和1之间的,例如0.9
bias (bool) – whether to use a bias vector. Default: True.
attention_mode (str) – attention mode. Default: “dot”.
示例
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import SelfAttention >>> standard_normal = ops.StandardNormal(seed=0) >>> query = standard_normal((2, 32, 512)) >>> key = standard_normal((2, 20, 512)) >>> value = standard_normal((2, 20, 512)) >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> net = SelfAttention() >>> output, attn = net(query, key, value, mask) >>> print(x.shape, attn.shape) (2, 32, 512) (2, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[源代码]
获取自注意力输出和其注意力分数
- 参数
query (mindspore.Tensor) – The query vector.
key (mindspore.Tensor) – The key vector.
value (mindspore.Tensor) – value向量,维度参考[batch_size, seq_len, d_model]
Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, seq_len, seq_len]
- 返回
output多头注意力输出attn多头注意力的注意力分数