Models¶
deepke.name_entity_re.few_shot.models.model module¶
- class deepke.name_entity_re.few_shot.models.model.PromptBartEncoder(encoder)[source]¶
Bases:
torch.nn.modules.module.Module- forward(src_tokens, attention_mask=None, past_key_values=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.model.PromptBartDecoder(decoder, pad_token_id, label_ids, use_prompt=False, prompt_len=10, learn_weights=False)[source]¶
Bases:
torch.nn.modules.module.Module- forward(tgt_tokens, prompt_state)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.model.PromptBartModel(tokenizer, label_ids, args)[source]¶
Bases:
torch.nn.modules.module.Module- forward(src_tokens, tgt_tokens, src_seq_len, first)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.model.PromptBartState(encoder_output, encoder_mask, past_key_values, src_tokens, first, src_embed_outputs, preseqlen)[source]¶
Bases:
object
- class deepke.name_entity_re.few_shot.models.model.PromptGeneratorModel(prompt_model, max_length=20, max_len_a=0.0, num_beams=1, do_sample=False, bos_token_id=None, eos_token_id=None, repetition_penalty=1, length_penalty=1.0, pad_token_id=0, restricter=None)[source]¶
Bases:
torch.nn.modules.module.Module- forward(src_tokens, tgt_tokens, src_seq_len=None, tgt_seq_len=None, first=None)[source]¶
- Parameters
src_tokens (torch.LongTensor) – bsz x max_len
tgt_tokens (torch.LongTensor) – bsz x max_len’
src_seq_len (torch.LongTensor) – bsz
tgt_seq_len (torch.LongTensor) – bsz
- Returns
- deepke.name_entity_re.few_shot.models.model.greedy_generate(decoder, tokens=None, state=None, max_length=20, max_len_a=0.0, num_beams=1, bos_token_id=None, eos_token_id=None, pad_token_id=0, repetition_penalty=1, length_penalty=1.0, restricter=None)[source]¶
deepke.name_entity_re.few_shot.models.modeling_bart module¶
PyTorch BART model, ported from the fairseq repo.
- deepke.name_entity_re.few_shot.models.modeling_bart.invert_mask(attention_mask)[source]¶
Turns 1->0, 0->1, False->True, True-> False
- class deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel(config: transformers.configuration_utils.PretrainedConfig, *inputs, **kwargs)[source]¶
Bases:
transformers.modeling_utils.PreTrainedModel- config_class¶
alias of
transformers.configuration_bart.BartConfig
- base_model_prefix = 'model'¶
- property dummy_inputs¶
Dummy inputs to do a forward pass in the network.
- Type
Dict[str, torch.Tensor]
- deepke.name_entity_re.few_shot.models.modeling_bart.shift_tokens_right(input_ids, pad_token_id)[source]¶
Shift input ids one token to the right, and wrap the last non pad token (usually <eos>).
- deepke.name_entity_re.few_shot.models.modeling_bart.make_padding_mask(input_ids, padding_idx=1)[source]¶
True for pad tokens
- class deepke.name_entity_re.few_shot.models.modeling_bart.EncoderLayer(config: transformers.configuration_bart.BartConfig)[source]¶
Bases:
torch.nn.modules.module.Module- forward(idx, x, encoder_padding_mask, layer_state, output_attentions=False)[source]¶
- Parameters
x (Tensor) – input to the layer of shape (seq_len, batch, embed_dim)
encoder_padding_mask (ByteTensor) – binary ByteTensor of shape (batch, src_len) where padding elements are indicated by
1.t_tgt (for) –
excluded (t_src is) –
attention (included in) –
- Returns
encoded output of shape (seq_len, batch, embed_dim)
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartEncoder(config: transformers.configuration_bart.BartConfig, embed_tokens)[source]¶
Bases:
torch.nn.modules.module.ModuleTransformer encoder consisting of config.encoder_layers self attention layers. Each layer is a
EncoderLayer.- Parameters
config – BartConfig
- forward(input_ids, attention_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶
- Parameters
input_ids (LongTensor) – tokens in the source language of shape (batch, src_len)
attention_mask (torch.LongTensor) – indicating which indices are padding tokens.
- Returns
x (Tensor): the last encoder layer’s output of shape (src_len, batch, embed_dim)
encoder_states (tuple(torch.FloatTensor)): all intermediate hidden states of shape (src_len, batch, embed_dim). Only populated if output_hidden_states: is True.
all_attentions (tuple(torch.FloatTensor)): Attention weights for each layer.
During training might not be of length n_layers because of layer dropout.
- Return type
BaseModelOutput or Tuple comprised of
- forward_with_encoder_past(input_ids, attention_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶
- Parameters
input_ids (LongTensor) – tokens in the source language of shape (batch, src_len)
attention_mask (torch.LongTensor) – indicating which indices are padding tokens.
- Returns
x (Tensor): the last encoder layer’s output of shape (src_len, batch, embed_dim)
encoder_states (tuple(torch.FloatTensor)): all intermediate hidden states of shape (src_len, batch, embed_dim). Only populated if output_hidden_states: is True.
all_attentions (tuple(torch.FloatTensor)): Attention weights for each layer.
During training might not be of length n_layers because of layer dropout.
- Return type
BaseModelOutput or Tuple comprised of
- class deepke.name_entity_re.few_shot.models.modeling_bart.DecoderLayer(config: transformers.configuration_bart.BartConfig)[source]¶
Bases:
torch.nn.modules.module.Module- forward(idx, x, encoder_hidden_states, encoder_attn_mask=None, layer_state=None, causal_mask=None, decoder_padding_mask=None, output_attentions=False)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartDecoder(config: transformers.configuration_bart.BartConfig, embed_tokens: torch.nn.modules.sparse.Embedding)[source]¶
Bases:
torch.nn.modules.module.ModuleTransformer decoder consisting of config.decoder_layers layers. Each layer is a
DecoderLayer. :param config: BartConfig :param embed_tokens: output embedding :type embed_tokens: torch.nn.Embedding- forward(input_ids, encoder_hidden_states, encoder_padding_mask, decoder_padding_mask, decoder_causal_mask, past_key_values=None, use_cache=False, use_prompt=False, output_attentions=False, output_hidden_states=False, return_dict=False, **unused)[source]¶
Includes several features from “Jointly Learning to Align and Translate with Transformer Models” (Garg et al., EMNLP 2019).
- Parameters
input_ids (LongTensor) – previous decoder outputs of shape (batch, tgt_len), for teacher forcing
encoder_hidden_states – output from the encoder, used for encoder-side attention
encoder_padding_mask – for ignoring pad tokens
past_key_values (dict or None) – dictionary used for storing state during generation
- Returns
the decoder’s features of shape (batch, tgt_len, embed_dim)
the cache
hidden states
attentions
- Return type
BaseModelOutputWithPast or tuple
- class deepke.name_entity_re.few_shot.models.modeling_bart.Attention(embed_dim, num_heads, dropout=0.0, bias=True, encoder_decoder_attention=False, cache_key=None, preseqlen=- 1, use_prompt=True)[source]¶
Bases:
torch.nn.modules.module.ModuleMulti-headed attention from ‘Attention Is All You Need’ paper
- forward(idx, query, key: Optional[torch.Tensor], key_padding_mask: Optional[torch.Tensor] = None, layer_state: Optional[Dict[str, Optional[torch.Tensor]]] = None, attn_mask: Optional[torch.Tensor] = None, output_attentions=False) Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶
Input shape: Time(SeqLen) x Batch x Channel
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartClassificationHead(input_dim, inner_dim, num_classes, pooler_dropout)[source]¶
Bases:
torch.nn.modules.module.ModuleHead for sentence-level classification tasks.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.modeling_bart.LearnedPositionalEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: int, offset)[source]¶
Bases:
torch.nn.modules.sparse.EmbeddingThis module learns positional embeddings up to a fixed maximum size. Padding ids are ignored by either offsetting based on padding_idx or by setting padding_idx to None and ensuring that the appropriate position ids are passed to the forward function.
- weight: torch.Tensor¶
- deepke.name_entity_re.few_shot.models.modeling_bart.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True)[source]¶
- deepke.name_entity_re.few_shot.models.modeling_bart.fill_with_neg_inf(t)[source]¶
FP16-compatible function that fills a input_ids with -inf.
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartModel(config: transformers.configuration_bart.BartConfig)[source]¶
Bases:
deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModelThe bare BART Model outputting raw hidden-states without any specific head on top.
This model inherits from
PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
BartConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()method to load the model weights.
- forward(input_ids, attention_mask=None, decoder_input_ids=None, encoder_outputs: Optional[Tuple] = None, decoder_attention_mask=None, past_key_values=None, use_cache=None, use_prompt=None, output_attentions=None, output_hidden_states=None, return_dict=None, **kwargs)[source]¶
The
BartModelforward method, overrides the__call__()special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensorof shape(batch_size, sequence_length)) –Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
Indices can be obtained using
BartTokenizer. Seetransformers.PreTrainedTokenizer.encode()andtransformers.PreTrainedTokenizer.__call__()for details.attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]:1 for tokens that are not masked,
0 for tokens that are masked.
decoder_input_ids (
torch.LongTensorof shape(batch_size, target_sequence_length), optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting theinput_idsto the right, following the paper.decoder_attention_mask (
torch.BoolTensorof shape(batch_size, tgt_seq_len), optional) –Default behavior: generate a tensor that ignores pad tokens in
decoder_input_ids. Causal mask will also be used by default.If you want to change padding behavior, you should read
modeling_bart._prepare_decoder_inputs()and modify to your needs. See diagram 1 in the paper for more information on the default strategy.encoder_outputs (
tuple(tuple(torch.FloatTensor), optional) – Tuple consists of (last_hidden_state, optional:hidden_states, optional:attentions)last_hidden_stateof shape(batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.past_key_values (
tuple(tuple(torch.FloatTensor))of lengthconfig.n_layerswith each tuple having 4 tensors of shape(batch_size, num_heads, sequence_length - 1, embed_size_per_head)) –Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.
If
past_key_valuesare used, the user can optionally input only the lastdecoder_input_ids(those that don’t have their past key value states given to this model) of shape(batch_size, 1)instead of alldecoder_input_idsof shape(batch_size, sequence_length).use_cache (
bool, optional) – If set toTrue,past_key_valueskey value states are returned and can be used to speed up decoding (seepast_key_values).output_attentions (
bool, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returned tensors for more detail.output_hidden_states (
bool, optional) – Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors for more detail.return_dict (
bool, optional) – Whether or not to return aModelOutputinstead of a plain tuple.
- Returns
A
Seq2SeqModelOutput(ifreturn_dict=Trueis passed or whenconfig.return_dict=True) or a tuple oftorch.FloatTensorcomprising various elements depending on the configuration (BartConfig) and inputs.last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the decoder of the model.If
past_key_valuesis used only the last hidden-state of the sequences of shape(batch_size, 1, hidden_size)is output.past_key_values (
List[torch.FloatTensor], optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) – List oftorch.FloatTensorof lengthconfig.n_layers, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)).Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) – Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) – Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) – Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) – Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
Seq2SeqModelOutputortuple(torch.FloatTensor)
Example:
>>> from transformers import BartTokenizer, BartModel >>> import torch >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') >>> model = BartModel.from_pretrained('facebook/bart-large', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> outputs = model(**inputs) >>> last_hidden_states = outputs.last_hidden_state
- get_input_embeddings()[source]¶
Returns the model’s input embeddings.
- Returns
A torch module mapping vocabulary to hidden states.
- Return type
nn.Module
- set_input_embeddings(value)[source]¶
Set model’s input embeddings.
- Parameters
value (
nn.Module) – A module mapping vocabulary to hidden states.
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartForConditionalGeneration(config: transformers.configuration_bart.BartConfig)[source]¶
Bases:
deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModelThe BART Model with a language modeling head. Can be used for summarization.
This model inherits from
PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
BartConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()method to load the model weights.
- base_model_prefix = 'model'¶
- authorized_missing_keys = ['final_logits_bias', 'encoder\\.version', 'decoder\\.version']¶
- resize_token_embeddings(new_num_tokens: int) torch.nn.modules.sparse.Embedding[source]¶
Resizes input token embeddings matrix of the model if
new_num_tokens != config.vocab_size.Takes care of tying weights embeddings afterwards if the model class has a
tie_weights()method.- Parameters
new_num_tokens (
int, optional) – The number of new tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided orNone, just returns a pointer to the input tokenstorch.nn.Embeddingmodule of the model wihtout doing anything.- Returns
Pointer to the input tokens Embeddings Module of the model.
- Return type
torch.nn.Embedding
- forward(input_ids, attention_mask=None, encoder_outputs=None, decoder_input_ids=None, decoder_attention_mask=None, past_key_values=None, labels=None, use_cache=None, use_prompt=None, output_attentions=None, output_hidden_states=None, return_dict=None, **unused)[source]¶
The
BartForConditionalGenerationforward method, overrides the__call__()special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensorof shape(batch_size, sequence_length)) –Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
Indices can be obtained using
BartTokenizer. Seetransformers.PreTrainedTokenizer.encode()andtransformers.PreTrainedTokenizer.__call__()for details.attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]:1 for tokens that are not masked,
0 for tokens that are masked.
decoder_input_ids (
torch.LongTensorof shape(batch_size, target_sequence_length), optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting theinput_idsto the right, following the paper.decoder_attention_mask (
torch.BoolTensorof shape(batch_size, tgt_seq_len), optional) –Default behavior: generate a tensor that ignores pad tokens in
decoder_input_ids. Causal mask will also be used by default.If you want to change padding behavior, you should read
modeling_bart._prepare_decoder_inputs()and modify to your needs. See diagram 1 in the paper for more information on the default strategy.encoder_outputs (
tuple(tuple(torch.FloatTensor), optional) – Tuple consists of (last_hidden_state, optional:hidden_states, optional:attentions)last_hidden_stateof shape(batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.past_key_values (
tuple(tuple(torch.FloatTensor))of lengthconfig.n_layerswith each tuple having 4 tensors of shape(batch_size, num_heads, sequence_length - 1, embed_size_per_head)) –Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.
If
past_key_valuesare used, the user can optionally input only the lastdecoder_input_ids(those that don’t have their past key value states given to this model) of shape(batch_size, 1)instead of alldecoder_input_idsof shape(batch_size, sequence_length).use_cache (
bool, optional) – If set toTrue,past_key_valueskey value states are returned and can be used to speed up decoding (seepast_key_values).output_attentions (
bool, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returned tensors for more detail.output_hidden_states (
bool, optional) – Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors for more detail.return_dict (
bool, optional) – Whether or not to return aModelOutputinstead of a plain tuple.labels (
torch.LongTensorof shape(batch_size, sequence_length), optional) – Labels for computing the masked language modeling loss. Indices should either be in[0, ..., config.vocab_size]or -100 (seeinput_idsdocstring). Tokens with indices set to-100are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size].
- Returns
A
Seq2SeqLMOutput(ifreturn_dict=Trueis passed or whenconfig.return_dict=True) or a tuple oftorch.FloatTensorcomprising various elements depending on the configuration (BartConfig) and inputs.loss (
torch.FloatTensorof shape(1,), optional, returned whenlabelsis provided) – Languaged modeling loss.logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.vocab_size)) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).past_key_values (
List[torch.FloatTensor], optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) – List oftorch.FloatTensorof lengthconfig.n_layers, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)).Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) – Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) – Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) – Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) – Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Conditional generation example:
>>> # Mask filling only works for bart-large >>> from transformers import BartTokenizer, BartForConditionalGeneration >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') >>> TXT = "My friends are <mask> but they eat too many carbs." >>> model = BartForConditionalGeneration.from_pretrained('facebook/bart-large') >>> input_ids = tokenizer([TXT], return_tensors='pt')['input_ids'] >>> logits = model(input_ids).logits >>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item() >>> probs = logits[0, masked_index].softmax(dim=0) >>> values, predictions = probs.topk(5) >>> tokenizer.decode(predictions).split() >>> # ['good', 'great', 'all', 'really', 'very']
- Return type
Seq2SeqLMOutputortuple(torch.FloatTensor)
Summarization example:
>>> from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig >>> # see ``examples/summarization/bart/run_eval.py`` for a longer example >>> model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn') >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn') >>> ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs." >>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt') >>> # Generate Summary >>> summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True) >>> print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
- prepare_inputs_for_generation(decoder_input_ids, past, attention_mask, use_cache, encoder_outputs, **kwargs)[source]¶
Implement in subclasses of
PreTrainedModelfor custom behavior to prepare inputs in the generate method.
- adjust_logits_during_generation(logits, cur_len, max_length)[source]¶
Implement in subclasses of
PreTrainedModelfor custom behavior to adjust the logits in the generate method.
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartForSequenceClassification(config: transformers.configuration_bart.BartConfig, **kwargs)[source]¶
Bases:
deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModelBart model with a sequence classification/head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
This model inherits from
PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
BartConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()method to load the model weights.
- forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_outputs=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
The
BartForSequenceClassificationforward method, overrides the__call__()special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensorof shape(batch_size, sequence_length)) –Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
Indices can be obtained using
BartTokenizer. Seetransformers.PreTrainedTokenizer.encode()andtransformers.PreTrainedTokenizer.__call__()for details.attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]:1 for tokens that are not masked,
0 for tokens that are masked.
decoder_input_ids (
torch.LongTensorof shape(batch_size, target_sequence_length), optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting theinput_idsto the right, following the paper.decoder_attention_mask (
torch.BoolTensorof shape(batch_size, tgt_seq_len), optional) –Default behavior: generate a tensor that ignores pad tokens in
decoder_input_ids. Causal mask will also be used by default.If you want to change padding behavior, you should read
modeling_bart._prepare_decoder_inputs()and modify to your needs. See diagram 1 in the paper for more information on the default strategy.encoder_outputs (
tuple(tuple(torch.FloatTensor), optional) – Tuple consists of (last_hidden_state, optional:hidden_states, optional:attentions)last_hidden_stateof shape(batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.past_key_values (
tuple(tuple(torch.FloatTensor))of lengthconfig.n_layerswith each tuple having 4 tensors of shape(batch_size, num_heads, sequence_length - 1, embed_size_per_head)) –Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.
If
past_key_valuesare used, the user can optionally input only the lastdecoder_input_ids(those that don’t have their past key value states given to this model) of shape(batch_size, 1)instead of alldecoder_input_idsof shape(batch_size, sequence_length).use_cache (
bool, optional) – If set toTrue,past_key_valueskey value states are returned and can be used to speed up decoding (seepast_key_values).output_attentions (
bool, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returned tensors for more detail.output_hidden_states (
bool, optional) – Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors for more detail.return_dict (
bool, optional) – Whether or not to return aModelOutputinstead of a plain tuple.labels (
torch.LongTensorof shape(batch_size,), optional) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]. Ifconfig.num_labels > 1a classification loss is computed (Cross-Entropy).
- Returns
A
Seq2SeqSequenceClassifierOutput(ifreturn_dict=Trueis passed or whenconfig.return_dict=True) or a tuple oftorch.FloatTensorcomprising various elements depending on the configuration (BartConfig) and inputs.loss (
torch.FloatTensorof shape(1,), optional, returned whenlabelis provided) – Classification (or regression if config.num_labels==1) loss.logits (
torch.FloatTensorof shape(batch_size, config.num_labels)) – Classification (or regression if config.num_labels==1) scores (before SoftMax).past_key_values (
List[torch.FloatTensor], optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) – List oftorch.FloatTensorof lengthconfig.n_layers, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)).Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) – Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) – Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) – Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) – Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
Seq2SeqSequenceClassifierOutputortuple(torch.FloatTensor)
Example:
>>> from transformers import BartTokenizer, BartForSequenceClassification >>> import torch >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') >>> model = BartForSequenceClassification.from_pretrained('facebook/bart-large', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 >>> outputs = model(**inputs, labels=labels) >>> loss = outputs.loss >>> logits = outputs.logits
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartForQuestionAnswering(config)[source]¶
Bases:
deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModelBART Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
This model inherits from
PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
BartConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()method to load the model weights.
- forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_outputs=None, start_positions=None, end_positions=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
The
BartForQuestionAnsweringforward method, overrides the__call__()special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensorof shape(batch_size, sequence_length)) –Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
Indices can be obtained using
BartTokenizer. Seetransformers.PreTrainedTokenizer.encode()andtransformers.PreTrainedTokenizer.__call__()for details.attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]:1 for tokens that are not masked,
0 for tokens that are masked.
decoder_input_ids (
torch.LongTensorof shape(batch_size, target_sequence_length), optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting theinput_idsto the right, following the paper.decoder_attention_mask (
torch.BoolTensorof shape(batch_size, tgt_seq_len), optional) –Default behavior: generate a tensor that ignores pad tokens in
decoder_input_ids. Causal mask will also be used by default.If you want to change padding behavior, you should read
modeling_bart._prepare_decoder_inputs()and modify to your needs. See diagram 1 in the paper for more information on the default strategy.encoder_outputs (
tuple(tuple(torch.FloatTensor), optional) – Tuple consists of (last_hidden_state, optional:hidden_states, optional:attentions)last_hidden_stateof shape(batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.past_key_values (
tuple(tuple(torch.FloatTensor))of lengthconfig.n_layerswith each tuple having 4 tensors of shape(batch_size, num_heads, sequence_length - 1, embed_size_per_head)) –Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.
If
past_key_valuesare used, the user can optionally input only the lastdecoder_input_ids(those that don’t have their past key value states given to this model) of shape(batch_size, 1)instead of alldecoder_input_idsof shape(batch_size, sequence_length).use_cache (
bool, optional) – If set toTrue,past_key_valueskey value states are returned and can be used to speed up decoding (seepast_key_values).output_attentions (
bool, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returned tensors for more detail.output_hidden_states (
bool, optional) – Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors for more detail.return_dict (
bool, optional) – Whether or not to return aModelOutputinstead of a plain tuple.start_positions (
torch.LongTensorof shape(batch_size,), optional) – Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.end_positions (
torch.LongTensorof shape(batch_size,), optional) – Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.
- Returns
A
Seq2SeqQuestionAnsweringModelOutput(ifreturn_dict=Trueis passed or whenconfig.return_dict=True) or a tuple oftorch.FloatTensorcomprising various elements depending on the configuration (BartConfig) and inputs.loss (
torch.FloatTensorof shape(1,), optional, returned whenlabelsis provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.start_logits (
torch.FloatTensorof shape(batch_size, sequence_length)) – Span-start scores (before SoftMax).end_logits (
torch.FloatTensorof shape(batch_size, sequence_length)) – Span-end scores (before SoftMax).past_key_values (
List[torch.FloatTensor], optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) – List oftorch.FloatTensorof lengthconfig.n_layers, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)).Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) – Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) – Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) – Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) – Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
Seq2SeqQuestionAnsweringModelOutputortuple(torch.FloatTensor)
Example:
>>> from transformers import BartTokenizer, BartForQuestionAnswering >>> import torch >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') >>> model = BartForQuestionAnswering.from_pretrained('facebook/bart-large', return_dict=True) >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" >>> inputs = tokenizer(question, text, return_tensors='pt') >>> start_positions = torch.tensor([1]) >>> end_positions = torch.tensor([3]) >>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions) >>> loss = outputs.loss >>> start_scores = outputs.start_logits >>> end_scores = outputs.end_logits