Models

deepke.name_entity_re.few_shot.models.model module

class deepke.name_entity_re.few_shot.models.model.PromptBartEncoder(encoder)[source]

Bases: torch.nn.modules.module.Module

forward(src_tokens, attention_mask=None, past_key_values=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class deepke.name_entity_re.few_shot.models.model.PromptBartDecoder(decoder, pad_token_id, label_ids, use_prompt=False, prompt_len=10, learn_weights=False)[source]

Bases: torch.nn.modules.module.Module

forward(tgt_tokens, prompt_state)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

decode(tokens, state)[source]
training: bool
class deepke.name_entity_re.few_shot.models.model.PromptBartModel(tokenizer, label_ids, args)[source]

Bases: torch.nn.modules.module.Module

forward(src_tokens, tgt_tokens, src_seq_len, first)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

generator(src_tokens, src_seq_len, first)[source]
get_prompt(batch_size)[source]
training: bool
class deepke.name_entity_re.few_shot.models.model.PromptBartState(encoder_output, encoder_mask, past_key_values, src_tokens, first, src_embed_outputs, preseqlen)[source]

Bases: object

reorder_state(indices: torch.LongTensor)[source]
num_samples()[source]
class deepke.name_entity_re.few_shot.models.model.PromptGeneratorModel(prompt_model, max_length=20, max_len_a=0.0, num_beams=1, do_sample=False, bos_token_id=None, eos_token_id=None, repetition_penalty=1, length_penalty=1.0, pad_token_id=0, restricter=None)[source]

Bases: torch.nn.modules.module.Module

forward(src_tokens, tgt_tokens, src_seq_len=None, tgt_seq_len=None, first=None)[source]
Parameters
  • src_tokens (torch.LongTensor) – bsz x max_len

  • tgt_tokens (torch.LongTensor) – bsz x max_len’

  • src_seq_len (torch.LongTensor) – bsz

  • tgt_seq_len (torch.LongTensor) – bsz

Returns

predict(src_tokens, src_seq_len=None, first=None)[source]
Parameters
  • src_tokens (torch.LongTensor) – bsz x max_len

  • src_seq_len (torch.LongTensor) – bsz

Returns

training: bool
deepke.name_entity_re.few_shot.models.model.greedy_generate(decoder, tokens=None, state=None, max_length=20, max_len_a=0.0, num_beams=1, bos_token_id=None, eos_token_id=None, pad_token_id=0, repetition_penalty=1, length_penalty=1.0, restricter=None)[source]
class deepke.name_entity_re.few_shot.models.model.BeamHypotheses(num_beams, max_length, length_penalty, early_stopping)[source]

Bases: object

add(hyp, sum_logprobs)[source]

Add a new hypothesis to the list.

is_done(best_sum_logprobs)[source]

If there are enough hypotheses and that none of the hypotheses being generated can become better than the worst one in the heap, then we are done with this sentence.

deepke.name_entity_re.few_shot.models.modeling_bart module

PyTorch BART model, ported from the fairseq repo.

deepke.name_entity_re.few_shot.models.modeling_bart.invert_mask(attention_mask)[source]

Turns 1->0, 0->1, False->True, True-> False

class deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel(config: transformers.configuration_utils.PretrainedConfig, *inputs, **kwargs)[source]

Bases: transformers.modeling_utils.PreTrainedModel

config_class

alias of transformers.configuration_bart.BartConfig

base_model_prefix = 'model'
property dummy_inputs

Dummy inputs to do a forward pass in the network.

Type

Dict[str, torch.Tensor]

training: bool
deepke.name_entity_re.few_shot.models.modeling_bart.shift_tokens_right(input_ids, pad_token_id)[source]

Shift input ids one token to the right, and wrap the last non pad token (usually <eos>).

deepke.name_entity_re.few_shot.models.modeling_bart.make_padding_mask(input_ids, padding_idx=1)[source]

True for pad tokens

class deepke.name_entity_re.few_shot.models.modeling_bart.EncoderLayer(config: transformers.configuration_bart.BartConfig)[source]

Bases: torch.nn.modules.module.Module

forward(idx, x, encoder_padding_mask, layer_state, output_attentions=False)[source]
Parameters
  • x (Tensor) – input to the layer of shape (seq_len, batch, embed_dim)

  • encoder_padding_mask (ByteTensor) – binary ByteTensor of shape (batch, src_len) where padding elements are indicated by 1.

  • t_tgt (for) –

  • excluded (t_src is) –

  • attention (included in) –

Returns

encoded output of shape (seq_len, batch, embed_dim)

training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.BartEncoder(config: transformers.configuration_bart.BartConfig, embed_tokens)[source]

Bases: torch.nn.modules.module.Module

Transformer encoder consisting of config.encoder_layers self attention layers. Each layer is a EncoderLayer.

Parameters

config – BartConfig

forward(input_ids, attention_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]
Parameters
  • input_ids (LongTensor) – tokens in the source language of shape (batch, src_len)

  • attention_mask (torch.LongTensor) – indicating which indices are padding tokens.

Returns

  • x (Tensor): the last encoder layer’s output of shape (src_len, batch, embed_dim)

  • encoder_states (tuple(torch.FloatTensor)): all intermediate hidden states of shape (src_len, batch, embed_dim). Only populated if output_hidden_states: is True.

  • all_attentions (tuple(torch.FloatTensor)): Attention weights for each layer.

During training might not be of length n_layers because of layer dropout.

Return type

BaseModelOutput or Tuple comprised of

forward_with_encoder_past(input_ids, attention_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]
Parameters
  • input_ids (LongTensor) – tokens in the source language of shape (batch, src_len)

  • attention_mask (torch.LongTensor) – indicating which indices are padding tokens.

Returns

  • x (Tensor): the last encoder layer’s output of shape (src_len, batch, embed_dim)

  • encoder_states (tuple(torch.FloatTensor)): all intermediate hidden states of shape (src_len, batch, embed_dim). Only populated if output_hidden_states: is True.

  • all_attentions (tuple(torch.FloatTensor)): Attention weights for each layer.

During training might not be of length n_layers because of layer dropout.

Return type

BaseModelOutput or Tuple comprised of

training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.DecoderLayer(config: transformers.configuration_bart.BartConfig)[source]

Bases: torch.nn.modules.module.Module

forward(idx, x, encoder_hidden_states, encoder_attn_mask=None, layer_state=None, causal_mask=None, decoder_padding_mask=None, output_attentions=False)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.BartDecoder(config: transformers.configuration_bart.BartConfig, embed_tokens: torch.nn.modules.sparse.Embedding)[source]

Bases: torch.nn.modules.module.Module

Transformer decoder consisting of config.decoder_layers layers. Each layer is a DecoderLayer. :param config: BartConfig :param embed_tokens: output embedding :type embed_tokens: torch.nn.Embedding

forward(input_ids, encoder_hidden_states, encoder_padding_mask, decoder_padding_mask, decoder_causal_mask, past_key_values=None, use_cache=False, use_prompt=False, output_attentions=False, output_hidden_states=False, return_dict=False, **unused)[source]

Includes several features from “Jointly Learning to Align and Translate with Transformer Models” (Garg et al., EMNLP 2019).

Parameters
  • input_ids (LongTensor) – previous decoder outputs of shape (batch, tgt_len), for teacher forcing

  • encoder_hidden_states – output from the encoder, used for encoder-side attention

  • encoder_padding_mask – for ignoring pad tokens

  • past_key_values (dict or None) – dictionary used for storing state during generation

Returns

  • the decoder’s features of shape (batch, tgt_len, embed_dim)

  • the cache

  • hidden states

  • attentions

Return type

BaseModelOutputWithPast or tuple

training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.Attention(embed_dim, num_heads, dropout=0.0, bias=True, encoder_decoder_attention=False, cache_key=None, preseqlen=- 1, use_prompt=True)[source]

Bases: torch.nn.modules.module.Module

Multi-headed attention from ‘Attention Is All You Need’ paper

forward(idx, query, key: Optional[torch.Tensor], key_padding_mask: Optional[torch.Tensor] = None, layer_state: Optional[Dict[str, Optional[torch.Tensor]]] = None, attn_mask: Optional[torch.Tensor] = None, output_attentions=False) Tuple[torch.Tensor, Optional[torch.Tensor]][source]

Input shape: Time(SeqLen) x Batch x Channel

training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.BartClassificationHead(input_dim, inner_dim, num_classes, pooler_dropout)[source]

Bases: torch.nn.modules.module.Module

Head for sentence-level classification tasks.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.LearnedPositionalEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: int, offset)[source]

Bases: torch.nn.modules.sparse.Embedding

This module learns positional embeddings up to a fixed maximum size. Padding ids are ignored by either offsetting based on padding_idx or by setting padding_idx to None and ensuring that the appropriate position ids are passed to the forward function.

forward(input_ids, use_cache=False)[source]

Input is expected to be of size [bsz x seqlen].

num_embeddings: int
embedding_dim: int
padding_idx: Optional[int]
max_norm: Optional[float]
norm_type: float
scale_grad_by_freq: bool
weight: torch.Tensor
sparse: bool
deepke.name_entity_re.few_shot.models.modeling_bart.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True)[source]
deepke.name_entity_re.few_shot.models.modeling_bart.fill_with_neg_inf(t)[source]

FP16-compatible function that fills a input_ids with -inf.

class deepke.name_entity_re.few_shot.models.modeling_bart.BartModel(config: transformers.configuration_bart.BartConfig)[source]

Bases: deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel

The bare BART Model outputting raw hidden-states without any specific head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (BartConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids, attention_mask=None, decoder_input_ids=None, encoder_outputs: Optional[Tuple] = None, decoder_attention_mask=None, past_key_values=None, use_cache=None, use_prompt=None, output_attentions=None, output_hidden_states=None, return_dict=None, **kwargs)[source]

The BartModel forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

    Indices can be obtained using BartTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

    What are input IDs?

  • attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    What are attention masks?

  • decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting the input_ids to the right, following the paper.

  • decoder_attention_mask (torch.BoolTensor of shape (batch_size, tgt_seq_len), optional) –

    Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.

    If you want to change padding behavior, you should read modeling_bart._prepare_decoder_inputs() and modify to your needs. See diagram 1 in the paper for more information on the default strategy.

  • encoder_outputs (tuple(tuple(torch.FloatTensor), optional) – Tuple consists of (last_hidden_state, optional: hidden_states, optional: attentions) last_hidden_state of shape (batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.

  • past_key_values (tuple(tuple(torch.FloatTensor)) of length config.n_layers with each tuple having 4 tensors of shape (batch_size, num_heads, sequence_length - 1, embed_size_per_head)) –

    Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.

    If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

  • use_cache (bool, optional) – If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

  • return_dict (bool, optional) – Whether or not to return a ModelOutput instead of a plain tuple.

Returns

A Seq2SeqModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (BartConfig) and inputs.

  • last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the decoder of the model.

    If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.

  • past_key_values (List[torch.FloatTensor], optional, returned when use_cache=True is passed or when config.use_cache=True) – List of torch.FloatTensor of length config.n_layers, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)).

    Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see past_key_values input) to speed up sequential decoding.

  • decoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.

  • decoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

  • encoder_last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.

  • encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.

  • encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

Seq2SeqModelOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import BartTokenizer, BartModel
>>> import torch

>>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
>>> model = BartModel.from_pretrained('facebook/bart-large', return_dict=True)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state
get_input_embeddings()[source]

Returns the model’s input embeddings.

Returns

A torch module mapping vocabulary to hidden states.

Return type

nn.Module

set_input_embeddings(value)[source]

Set model’s input embeddings.

Parameters

value (nn.Module) – A module mapping vocabulary to hidden states.

get_output_embeddings()[source]

Returns the model’s output embeddings.

Returns

A torch module mapping hidden states to vocabulary.

Return type

nn.Module

training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.BartForConditionalGeneration(config: transformers.configuration_bart.BartConfig)[source]

Bases: deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel

The BART Model with a language modeling head. Can be used for summarization.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (BartConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

base_model_prefix = 'model'
authorized_missing_keys = ['final_logits_bias', 'encoder\\.version', 'decoder\\.version']
resize_token_embeddings(new_num_tokens: int) torch.nn.modules.sparse.Embedding[source]

Resizes input token embeddings matrix of the model if new_num_tokens != config.vocab_size.

Takes care of tying weights embeddings afterwards if the model class has a tie_weights() method.

Parameters

new_num_tokens (int, optional) – The number of new tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided or None, just returns a pointer to the input tokens torch.nn.Embedding module of the model wihtout doing anything.

Returns

Pointer to the input tokens Embeddings Module of the model.

Return type

torch.nn.Embedding

forward(input_ids, attention_mask=None, encoder_outputs=None, decoder_input_ids=None, decoder_attention_mask=None, past_key_values=None, labels=None, use_cache=None, use_prompt=None, output_attentions=None, output_hidden_states=None, return_dict=None, **unused)[source]

The BartForConditionalGeneration forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

    Indices can be obtained using BartTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

    What are input IDs?

  • attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    What are attention masks?

  • decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting the input_ids to the right, following the paper.

  • decoder_attention_mask (torch.BoolTensor of shape (batch_size, tgt_seq_len), optional) –

    Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.

    If you want to change padding behavior, you should read modeling_bart._prepare_decoder_inputs() and modify to your needs. See diagram 1 in the paper for more information on the default strategy.

  • encoder_outputs (tuple(tuple(torch.FloatTensor), optional) – Tuple consists of (last_hidden_state, optional: hidden_states, optional: attentions) last_hidden_state of shape (batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.

  • past_key_values (tuple(tuple(torch.FloatTensor)) of length config.n_layers with each tuple having 4 tensors of shape (batch_size, num_heads, sequence_length - 1, embed_size_per_head)) –

    Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.

    If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

  • use_cache (bool, optional) – If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

  • return_dict (bool, optional) – Whether or not to return a ModelOutput instead of a plain tuple.

  • labels (torch.LongTensor of shape (batch_size, sequence_length), optional) – Labels for computing the masked language modeling loss. Indices should either be in [0, ..., config.vocab_size] or -100 (see input_ids docstring). Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size].

Returns

A Seq2SeqLMOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (BartConfig) and inputs.

  • loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Languaged modeling loss.

  • logits (torch.FloatTensor of shape (batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

  • past_key_values (List[torch.FloatTensor], optional, returned when use_cache=True is passed or when config.use_cache=True) – List of torch.FloatTensor of length config.n_layers, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)).

    Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see past_key_values input) to speed up sequential decoding.

  • decoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.

  • decoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

  • encoder_last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.

  • encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.

  • encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

Conditional generation example:

>>> # Mask filling only works for bart-large
>>> from transformers import BartTokenizer, BartForConditionalGeneration
>>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
>>> TXT = "My friends are <mask> but they eat too many carbs."

>>> model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
>>> input_ids = tokenizer([TXT], return_tensors='pt')['input_ids']
>>> logits = model(input_ids).logits

>>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
>>> probs = logits[0, masked_index].softmax(dim=0)
>>> values, predictions = probs.topk(5)

>>> tokenizer.decode(predictions).split()
>>> # ['good', 'great', 'all', 'really', 'very']

Return type

Seq2SeqLMOutput or tuple(torch.FloatTensor)

Summarization example:

>>> from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

>>> # see ``examples/summarization/bart/run_eval.py`` for a longer example
>>> model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
>>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

>>> ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')

>>> # Generate Summary
>>> summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
>>> print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
prepare_inputs_for_generation(decoder_input_ids, past, attention_mask, use_cache, encoder_outputs, **kwargs)[source]

Implement in subclasses of PreTrainedModel for custom behavior to prepare inputs in the generate method.

adjust_logits_during_generation(logits, cur_len, max_length)[source]

Implement in subclasses of PreTrainedModel for custom behavior to adjust the logits in the generate method.

get_encoder()[source]
get_output_embeddings()[source]

Returns the model’s output embeddings.

Returns

A torch module mapping hidden states to vocabulary.

Return type

nn.Module

training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.BartForSequenceClassification(config: transformers.configuration_bart.BartConfig, **kwargs)[source]

Bases: deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel

Bart model with a sequence classification/head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (BartConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_outputs=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The BartForSequenceClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

    Indices can be obtained using BartTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

    What are input IDs?

  • attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    What are attention masks?

  • decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting the input_ids to the right, following the paper.

  • decoder_attention_mask (torch.BoolTensor of shape (batch_size, tgt_seq_len), optional) –

    Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.

    If you want to change padding behavior, you should read modeling_bart._prepare_decoder_inputs() and modify to your needs. See diagram 1 in the paper for more information on the default strategy.

  • encoder_outputs (tuple(tuple(torch.FloatTensor), optional) – Tuple consists of (last_hidden_state, optional: hidden_states, optional: attentions) last_hidden_state of shape (batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.

  • past_key_values (tuple(tuple(torch.FloatTensor)) of length config.n_layers with each tuple having 4 tensors of shape (batch_size, num_heads, sequence_length - 1, embed_size_per_head)) –

    Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.

    If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

  • use_cache (bool, optional) – If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

  • return_dict (bool, optional) – Whether or not to return a ModelOutput instead of a plain tuple.

  • labels (torch.LongTensor of shape (batch_size,), optional) – Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

Returns

A Seq2SeqSequenceClassifierOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (BartConfig) and inputs.

  • loss (torch.FloatTensor of shape (1,), optional, returned when label is provided) – Classification (or regression if config.num_labels==1) loss.

  • logits (torch.FloatTensor of shape (batch_size, config.num_labels)) – Classification (or regression if config.num_labels==1) scores (before SoftMax).

  • past_key_values (List[torch.FloatTensor], optional, returned when use_cache=True is passed or when config.use_cache=True) – List of torch.FloatTensor of length config.n_layers, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)).

    Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see past_key_values input) to speed up sequential decoding.

  • decoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.

  • decoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

  • encoder_last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.

  • encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.

  • encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

Seq2SeqSequenceClassifierOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import BartTokenizer, BartForSequenceClassification
>>> import torch

>>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
>>> model = BartForSequenceClassification.from_pretrained('facebook/bart-large', return_dict=True)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1
>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits
training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.BartForQuestionAnswering(config)[source]

Bases: deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel

BART Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters

config (BartConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_outputs=None, start_positions=None, end_positions=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The BartForQuestionAnswering forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

    Indices can be obtained using BartTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

    What are input IDs?

  • attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    What are attention masks?

  • decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting the input_ids to the right, following the paper.

  • decoder_attention_mask (torch.BoolTensor of shape (batch_size, tgt_seq_len), optional) –

    Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.

    If you want to change padding behavior, you should read modeling_bart._prepare_decoder_inputs() and modify to your needs. See diagram 1 in the paper for more information on the default strategy.

  • encoder_outputs (tuple(tuple(torch.FloatTensor), optional) – Tuple consists of (last_hidden_state, optional: hidden_states, optional: attentions) last_hidden_state of shape (batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.

  • past_key_values (tuple(tuple(torch.FloatTensor)) of length config.n_layers with each tuple having 4 tensors of shape (batch_size, num_heads, sequence_length - 1, embed_size_per_head)) –

    Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.

    If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

  • use_cache (bool, optional) – If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

  • return_dict (bool, optional) – Whether or not to return a ModelOutput instead of a plain tuple.

  • start_positions (torch.LongTensor of shape (batch_size,), optional) – Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

  • end_positions (torch.LongTensor of shape (batch_size,), optional) – Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

Returns

A Seq2SeqQuestionAnsweringModelOutput (if return_dict=True is passed or when config.return_dict=True) or a tuple of torch.FloatTensor comprising various elements depending on the configuration (BartConfig) and inputs.

  • loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.

  • start_logits (torch.FloatTensor of shape (batch_size, sequence_length)) – Span-start scores (before SoftMax).

  • end_logits (torch.FloatTensor of shape (batch_size, sequence_length)) – Span-end scores (before SoftMax).

  • past_key_values (List[torch.FloatTensor], optional, returned when use_cache=True is passed or when config.use_cache=True) – List of torch.FloatTensor of length config.n_layers, with each tensor of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head)).

    Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see past_key_values input) to speed up sequential decoding.

  • decoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.

  • decoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

  • encoder_last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.

  • encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.

  • encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

Seq2SeqQuestionAnsweringModelOutput or tuple(torch.FloatTensor)

Example:

>>> from transformers import BartTokenizer, BartForQuestionAnswering
>>> import torch

>>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
>>> model = BartForQuestionAnswering.from_pretrained('facebook/bart-large', return_dict=True)

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> inputs = tokenizer(question, text, return_tensors='pt')
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([3])

>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> loss = outputs.loss
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits
training: bool
class deepke.name_entity_re.few_shot.models.modeling_bart.SinusoidalPositionalEmbedding(num_positions, embedding_dim, padding_idx=None)[source]

Bases: torch.nn.modules.sparse.Embedding

This module produces sinusoidal positional embeddings of any length.

weight: torch.Tensor
num_embeddings: int
embedding_dim: int
padding_idx: Optional[int]
max_norm: Optional[float]
norm_type: float
scale_grad_by_freq: bool
sparse: bool
forward(input_ids, use_cache=False)[source]

Input is expected to be of size [bsz x seqlen].