问答任务

本节介绍机器阅读理解（MRC）中的问答（QA）任务，也称为答案抽取：给定一段文本和一个问题，要求机器根据这个问题根据问题从文字中找出一段连续的片段作为答案。下面是使用SQuAD数据集和BiDAF双向注意流模型进行问答任务训练的示例：

备注

本教程推荐使用GPU进行实验

SQuAD数据集

SQuAD数据集是斯坦福大学2016年推出的阅读理解数据集。给定一篇文章，准备相应的问题，需要算法给出问题的答案。该数据集中的所有文章均来自维基百科。

以下是该数据集中的一条训练集数据：

列名	数据
id	5733be284776f41900661182
context	Architecturally, the school has a Catholic character. Atop the Main Building’s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend “Venite Ad Me Omnes”. Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
question	To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?
answer	Saint Bernadette Soubirous
answer_start	515

这是训练集中的原始数据。answer_start代表context中答案的起始字符所在的位置。数据预处理后，将添加s_idx和e_idx两列作为标签列，表示答案的第一个单词和最后一个单词所在的位置。下面以使用SQuAD数据集和BiDAF模型训练问答任务为例进行演示。当给定context和question时，将预测出s_idx和e_idx。

训练步骤

加载数据集:

MindNLP 提供API来加载和处理各种常见的数据集，如SQuAD、IMDB、Multi30K、AG_News等。

调用来自 dataset 模块的 load() 函数加载SQuAD数据集。该函数会返回SQuAD数据集的训练集和开发集。

加载数据集的代码如下所示：

from mindnlp.dataset import load
squad_train, squad_dev = load('squad1')

处理数据集：

首先通过调用来自 Glove 的 from_pretrained() 函数获取embedding和单词级词汇表。由于没有现成的字符级的词汇表，你可以自己定义一个：

from mindnlp.modules import Glove

word_embeddings, word_vocab = Glove.from_pretrained('6B', 100, special_tokens=["<unk>", "<pad>"])
char_dic = {"<unk>": 0, "<pad>": 1, "e": 2, "t": 3, "a": 4, "i": 5, "n": 6,\
                    "o": 7, "s": 8, "r": 9, "h": 10, "l": 11, "d": 12, "c": 13, "u": 14,\
                    "m": 15, "f": 16, "p": 17, "g": 18, "w": 19, "y": 20, "b": 21, ",": 22,\
                    "v": 23, ".": 24, "k": 25, "1": 26, "0": 27, "x": 28, "2": 29, "\"": 30, \
                    "-": 31, "j": 32, "9": 33, "'": 34, ")": 35, "(": 36, "?": 37, "z": 38,\
                    "5": 39, "8": 40, "q": 41, "3": 42, "4": 43, "7": 44, "6": 45, ";": 46,\
                    ":": 47, "\u2013": 48, "%": 49, "/": 50, "]": 51, "[": 52}
char_vocab = text.Vocab.from_dict(char_dic)

初始化分词器：

from mindnlp.dataset.transforms import BasicTokenizer

tokenizer = BasicTokenizer(True)

接下来，调用 process() 函数对训练集进行处理：

from mindnlp.dataset import process
squad_train = process('squad1', squad_train, char_vocab, word_vocab, tokenizer=tokenizer,\
                   max_context_len=768, max_question_len=64, max_char_len=48,\
                   batch_size=64, drop_remainder=False )

定义模型

使用MindNLP定义双向注意力流（BiDAF）模型的代码如下所示：

import mindspore.nn as nn
from mindspore import Tensor
from mindspore import Parameter
from mindspore.common.initializer import Uniform, HeUniform, initializer

from mindnlp.abc import Seq2vecModel
from mindnlp.modules.embeddings import Word2vec

class Encoder(nn.Cell):
    """
    Encoder for BiDAF model
    """
    def __init__(self, char_vocab_size, char_vocab, char_dim, char_channel_size, char_channel_width, word_vocab,
                  word_embeddings, hidden_size, dropout):
        super().__init__()
        self.char_vocab = char_vocab
        self.char_dim = char_dim
        self.char_channel_width = char_channel_width
        self.char_channel_size = char_channel_size
        self.word_vocab = word_vocab
        self.hidden_size = hidden_size
        self.dropout = nn.Dropout(1 - dropout)
        self.init_embed = initializer(Uniform(0.001), [char_vocab_size, char_dim])
        self.embed = Parameter(self.init_embed, name='embed')

        # 1. Character Embedding Layer
        self.char_emb = Word2vec(char_vocab, init_embed=self.embed, dropout=0.0)
        self.char_conv = nn.SequentialCell(
            nn.Conv2d(1, char_channel_size, (char_dim, char_channel_width), pad_mode="pad",
                      weight_init=HeUniform(math.sqrt(5)), bias_init=Uniform(1 / math.sqrt(1))),
            nn.ReLU()
            )

        # 2. Word Embedding Layer
        self.word_emb = words_embeddings

        # highway network
        self.highway_linear0 = nn.Dense(hidden_size * 2, hidden_size * 2,
                                        weight_init=HeUniform(math.sqrt(5)),
                                        bias_init=Uniform(1 / math.sqrt(hidden_size * 2)),
                                        activation=nn.ReLU())
        self.highway_linear1 = nn.Dense(hidden_size * 2, hidden_size * 2,
                                        weight_init=HeUniform(math.sqrt(5)),
                                        bias_init=Uniform(1 / math.sqrt(hidden_size * 2)),
                                        activation=nn.ReLU())
        self.highway_gate0 = nn.Dense(hidden_size * 2, hidden_size * 2,
                                      weight_init=HeUniform(math.sqrt(5)),
                                      bias_init=Uniform(1 / math.sqrt(hidden_size * 2)),
                                      activation=nn.Sigmoid())
        self.highway_gate1 = nn.Dense(hidden_size * 2, hidden_size * 2,
                                      weight_init=HeUniform(math.sqrt(5)),
                                      bias_init=Uniform(1 / math.sqrt(hidden_size * 2)),
                                      activation=nn.Sigmoid())

        # 3. Contextual Embedding Layer
        self.context_LSTM = nn.LSTM(input_size=hidden_size * 2, hidden_size=hidden_size,
                                    bidirectional=True, batch_first=True, dropout=dropout)

    def construct(self, c_char, q_char, c_word, q_word, c_lens, q_lens):
        # 1. Character Embedding Layer
        c_char = self.char_emb_layer(c_char)
        q_char = self.char_emb_layer(q_char)

        # 2. Word Embedding Layer
        c_word = self.word_emb(c_word)
        q_word = self.word_emb(q_word)

        # Highway network
        c = self.highway_network(c_char, c_word)
        q = self.highway_network(q_char, q_word)

        # 3. Contextual Embedding Layer
        c, _ = self.context_LSTM(c, seq_length=c_lens)
        q, _ = self.context_LSTM(q, seq_length=q_lens)

        return c, q

    def char_emb_layer(self, x):
        """
        param x: (batch, seq_len, word_len)
        return: (batch, seq_len, char_channel_size)
        """
        batch_size = x.shape[0]
        # x: [batch, seq_len, word_len, char_dim]
        x = self.dropout(self.char_emb(x))
        # x: [batch, seq_len, char_dim, word_len]
        x = ops.transpose(x, (0, 1, 3, 2))
        # x: [batch * seq_len, 1, char_dim, word_len]
        x = x.view(-1, self.char_dim, x.shape[3]).expand_dims(1)
        # x: [batch * seq_len, char_channel_size, 1, conv_len] -> [batch * seq_len, char_channel_size, conv_len]
        x = self.char_conv(x).squeeze(2)
        # x: [batch * seq_len, char_channel_size]
        x = ops.max(x, axis=2)[1]
        # x: [batch, seq_len, char_channel_size]
        x = x.view(batch_size, -1, self.char_channel_size)

        return x

    def highway_network(self, x1, x2):
        """
        param x1: (batch, seq_len, char_channel_size)
        param x2: (batch, seq_len, word_dim)
        return: (batch, seq_len, hidden_size * 2)
        """
        # [batch, seq_len, char_channel_size + word_dim]
        x = ops.concat((x1, x2), axis=-1)
        h = self.highway_linear0(x)
        g = self.highway_gate0(x)
        x = g * h + (1 - g) * x
        h = self.highway_linear1(x)
        g = self.highway_gate1(x)
        x = g * h + (1 - g) * x

        # [batch, seq_len, hidden_size * 2]
        return x


class Head(nn.Cell):
    """
    Head for BiDAF model
    """
    def __init__(self, hidden_size, dropout):
        super().__init__()
        # 4. Attention Flow Layer
        self.att_weight_c = nn.Dense(hidden_size * 2, 1,
                                     weight_init=HeUniform(math.sqrt(5)),
                                     bias_init=Uniform(1 / math.sqrt(hidden_size * 2)))
        self.att_weight_q = nn.Dense(hidden_size * 2, 1,
                                     weight_init=HeUniform(math.sqrt(5)),
                                     bias_init=Uniform(1 / math.sqrt(hidden_size * 2)))
        self.att_weight_cq = nn.Dense(hidden_size * 2, 1,
                                      weight_init=HeUniform(math.sqrt(5)),
                                      bias_init=Uniform(1 / math.sqrt(hidden_size * 2)))
        self.softmax = nn.Softmax(axis=-1)
        self.batch_matmul = ops.BatchMatMul()

        # 5. Modeling Layer
        self.modeling_LSTM1 = nn.LSTM(input_size=hidden_size * 8, hidden_size=hidden_size,
                                      bidirectional=True, batch_first=True, dropout=dropout)
        self.modeling_LSTM2 = nn.LSTM(input_size=hidden_size * 2, hidden_size=hidden_size,
                                      bidirectional=True, batch_first=True, dropout=dropout)

        # 6. Output Layer
        self.p1_weight_g = nn.Dense(hidden_size * 8, 1,
                                    weight_init=HeUniform(math.sqrt(5)),
                                    bias_init=Uniform(1 / math.sqrt(hidden_size * 8)))
        self.p1_weight_m = nn.Dense(hidden_size * 2, 1,
                                    weight_init=HeUniform(math.sqrt(5)),
                                    bias_init=Uniform(1 / math.sqrt(hidden_size * 2)))
        self.p2_weight_g = nn.Dense(hidden_size * 8, 1,
                                    weight_init=HeUniform(math.sqrt(5)),
                                    bias_init=Uniform(1 / math.sqrt(hidden_size * 8)))
        self.p2_weight_m = nn.Dense(hidden_size * 2, 1,
                                    weight_init=HeUniform(math.sqrt(5)),
                                    bias_init=Uniform(1 / math.sqrt(hidden_size * 2)))

        self.output_LSTM = nn.LSTM(input_size=hidden_size * 2, hidden_size=hidden_size,
                                   bidirectional=True, batch_first=True, dropout=dropout)

    def construct(self, c, q, c_lens):
        # 4. Attention Flow Layer
        g = self.att_flow_layer(c, q)  #c, q are generated from Contextual Embedding Layer in Encoder

        # 5. Modeling Layer
        m, _ = self.modeling_LSTM2(self.modeling_LSTM1(g, seq_length=c_lens)[0], seq_length=c_lens)

        # 6. Output Layer
        p1, p2 = self.output_layer(g, m, c_lens)

        # [batch, c_len], [batch, c_len]
        return p1, p2

    def att_flow_layer(self, c, q):
        """
        param c: (batch, c_len, hidden_size * 2)
        param q: (batch, q_len, hidden_size * 2)
        return: (batch, c_len, q_len)
        """
        c_len = c.shape[1]
        q_len = q.shape[1]

        cq = []
        for i in range(q_len):
            # qi: [batch, 1, hidden_size * 2]
            qi = q.gather(mindspore.Tensor(i), axis=1).expand_dims(1)
            # ci: [batch, c_len, 1] -> [batch, c_len]
            ci = self.att_weight_cq(c * qi).squeeze(2)
            cq.append(ci)
        # cq: [batch, c_len, q_len]
        cq = ops.stack(cq, -1)

        # s: [batch, c_len, q_len]
        s = self.att_weight_c(c).broadcast_to((-1, -1, q_len)) + \
            self.att_weight_q(q).transpose((0, 2, 1)).broadcast_to((-1, c_len, -1)) + cq

        # a: [batch, c_len, q_len]
        a = self.softmax(s)
        # c2q_att: [batch, c_len, hidden_size * 2]
        c2q_att = self.batch_matmul(a, q)
        # b: [batch, 1, c_len]
        b = self.softmax(ops.max(s, axis=2)[1]).expand_dims(1)
        # q2c_att: [batch, hidden_size * 2]
        q2c_att = self.batch_matmul(b, c).squeeze(1)
        # q2c_att: [batch, c_len, hidden_size * 2]
        q2c_att = q2c_att.expand_dims(1).broadcast_to((-1, c_len, -1))

        # x: [batch, c_len, hidden_size * 8]
        x = ops.concat([c, c2q_att, c * c2q_att, c * q2c_att], axis=-1)
        return x

    def output_layer(self, g, m, l):
        """
        param g: (batch, c_len, hidden_size * 8)
        param m: (batch, c_len ,hidden_size * 2)
        return: p1: (batch, c_len), p2: (batch, c_len)
        """
        # p1: [batch, c_len]
        p1 = (self.p1_weight_g(g) + self.p1_weight_m(m)).squeeze(2)
        # m2: [batch, c_len, hidden_size * 2]
        m2, _ = self.output_LSTM(m, seq_length=l)
        # p2: [batch, c_len]
        p2 = (self.p2_weight_g(g) + self.p2_weight_m(m2)).squeeze(2)

        return p1, p2

class BiDAF(Seq2vecModel):
    def __init__(self, encoder, head):
        super().__init__(encoder, head)
        self.encoder = encoder
        self.head = head

    def construct(self, c_char, q_char, c_word, q_word, c_lens, q_lens):
        c, q = self.encoder(c_char, q_char, c_word, q_word, c_lens, q_lens)
        p1, p2 = self.head(c, q, c_lens)
        return p1, p2

实例化模型

定义超参数：

char_vocab_size = len(char_vocab.vocab())
char_dim = 8
char_channel_width = 5
char_channel_size = 100
hidden_size = 100
dropout = 0.2
lr = 0.5
epoch = 6

实例化模型：

encoder = Encoder(char_vocab_size, char_vocab, char_dim, char_channel_size, char_channel_width, word_vocab,
                  word_embeddings, hidden_size, dropout)
head = Head(hidden_size, dropout)
net = BiDAF(encoder, head)

定义损失函数和优化器

训练模型时需要损失函数，我们使用MindSpore提供的 CrossEntropyLoss 来定义一个损失函数类，并实例化：

class Loss(nn.Cell):
    def __init__(self):
        super().__init__()

    def construct(self, logit1, logit2, s_idx, e_idx):
        loss_fn = nn.CrossEntropyLoss()
        loss = loss_fn(logit1, s_idx) + loss_fn(logit2, e_idx)
        return loss

loss = Loss()

定义优化器

optimizer = nn.Adadelta(net.trainable_params(), learning_rate=lr)

定义模型

定义好网络、损失函数、优化器后，我们使用 Trainer 来训练模型

from mindnlp.engine.trainer import Trainer

trainer = Trainer(network=net, train_dataset=squad_train, epochs=epoch, loss_fn=loss, optimizer=optimizer)
trainer.run(tgt_columns=["s_idx", "e_idx"], jit=True)