Models¶
deepke.name_entity_re.few_shot.models.model module¶
- class deepke.name_entity_re.few_shot.models.model.PromptBartEncoder(encoder)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(src_tokens, attention_mask=None, past_key_values=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.model.PromptBartDecoder(decoder, pad_token_id, label_ids, use_prompt=False, prompt_len=10, learn_weights=False)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(tgt_tokens, prompt_state)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.model.PromptBartModel(tokenizer, label_ids, args)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(src_tokens, tgt_tokens, src_seq_len, first)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.model.PromptBartState(encoder_output, encoder_mask, past_key_values, src_tokens, first, src_embed_outputs, preseqlen)[source]¶
Bases:
object
- class deepke.name_entity_re.few_shot.models.model.PromptGeneratorModel(prompt_model, max_length=20, max_len_a=0.0, num_beams=1, do_sample=False, bos_token_id=None, eos_token_id=None, repetition_penalty=1, length_penalty=1.0, pad_token_id=0, restricter=None)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(src_tokens, tgt_tokens, src_seq_len=None, tgt_seq_len=None, first=None)[source]¶
- Parameters
src_tokens (torch.LongTensor) – bsz x max_len
tgt_tokens (torch.LongTensor) – bsz x max_len’
src_seq_len (torch.LongTensor) – bsz
tgt_seq_len (torch.LongTensor) – bsz
- Returns
- deepke.name_entity_re.few_shot.models.model.greedy_generate(decoder, tokens=None, state=None, max_length=20, max_len_a=0.0, num_beams=1, bos_token_id=None, eos_token_id=None, pad_token_id=0, repetition_penalty=1, length_penalty=1.0, restricter=None)[source]¶
deepke.name_entity_re.few_shot.models.modeling_bart module¶
PyTorch BART model, ported from the fairseq repo.
- deepke.name_entity_re.few_shot.models.modeling_bart.invert_mask(attention_mask)[source]¶
Turns 1->0, 0->1, False->True, True-> False
- class deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel(config: transformers.configuration_utils.PretrainedConfig, *inputs, **kwargs)[source]¶
Bases:
transformers.modeling_utils.PreTrainedModel
- config_class¶
alias of
transformers.configuration_bart.BartConfig
- base_model_prefix = 'model'¶
- property dummy_inputs¶
Dummy inputs to do a forward pass in the network.
- Type
Dict[str, torch.Tensor]
- deepke.name_entity_re.few_shot.models.modeling_bart.shift_tokens_right(input_ids, pad_token_id)[source]¶
Shift input ids one token to the right, and wrap the last non pad token (usually <eos>).
- deepke.name_entity_re.few_shot.models.modeling_bart.make_padding_mask(input_ids, padding_idx=1)[source]¶
True for pad tokens
- class deepke.name_entity_re.few_shot.models.modeling_bart.EncoderLayer(config: transformers.configuration_bart.BartConfig)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(idx, x, encoder_padding_mask, layer_state, output_attentions=False)[source]¶
- Parameters
x (Tensor) – input to the layer of shape (seq_len, batch, embed_dim)
encoder_padding_mask (ByteTensor) – binary ByteTensor of shape (batch, src_len) where padding elements are indicated by
1
.t_tgt (for) –
excluded (t_src is) –
attention (included in) –
- Returns
encoded output of shape (seq_len, batch, embed_dim)
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartEncoder(config: transformers.configuration_bart.BartConfig, embed_tokens)[source]¶
Bases:
torch.nn.modules.module.Module
Transformer encoder consisting of config.encoder_layers self attention layers. Each layer is a
EncoderLayer
.- Parameters
config – BartConfig
- forward(input_ids, attention_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶
- Parameters
input_ids (LongTensor) – tokens in the source language of shape (batch, src_len)
attention_mask (torch.LongTensor) – indicating which indices are padding tokens.
- Returns
x (Tensor): the last encoder layer’s output of shape (src_len, batch, embed_dim)
encoder_states (tuple(torch.FloatTensor)): all intermediate hidden states of shape (src_len, batch, embed_dim). Only populated if output_hidden_states: is True.
all_attentions (tuple(torch.FloatTensor)): Attention weights for each layer.
During training might not be of length n_layers because of layer dropout.
- Return type
BaseModelOutput or Tuple comprised of
- forward_with_encoder_past(input_ids, attention_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=False)[source]¶
- Parameters
input_ids (LongTensor) – tokens in the source language of shape (batch, src_len)
attention_mask (torch.LongTensor) – indicating which indices are padding tokens.
- Returns
x (Tensor): the last encoder layer’s output of shape (src_len, batch, embed_dim)
encoder_states (tuple(torch.FloatTensor)): all intermediate hidden states of shape (src_len, batch, embed_dim). Only populated if output_hidden_states: is True.
all_attentions (tuple(torch.FloatTensor)): Attention weights for each layer.
During training might not be of length n_layers because of layer dropout.
- Return type
BaseModelOutput or Tuple comprised of
- class deepke.name_entity_re.few_shot.models.modeling_bart.DecoderLayer(config: transformers.configuration_bart.BartConfig)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(idx, x, encoder_hidden_states, encoder_attn_mask=None, layer_state=None, causal_mask=None, decoder_padding_mask=None, output_attentions=False)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartDecoder(config: transformers.configuration_bart.BartConfig, embed_tokens: torch.nn.modules.sparse.Embedding)[source]¶
Bases:
torch.nn.modules.module.Module
Transformer decoder consisting of config.decoder_layers layers. Each layer is a
DecoderLayer
. :param config: BartConfig :param embed_tokens: output embedding :type embed_tokens: torch.nn.Embedding- forward(input_ids, encoder_hidden_states, encoder_padding_mask, decoder_padding_mask, decoder_causal_mask, past_key_values=None, use_cache=False, use_prompt=False, output_attentions=False, output_hidden_states=False, return_dict=False, **unused)[source]¶
Includes several features from “Jointly Learning to Align and Translate with Transformer Models” (Garg et al., EMNLP 2019).
- Parameters
input_ids (LongTensor) – previous decoder outputs of shape (batch, tgt_len), for teacher forcing
encoder_hidden_states – output from the encoder, used for encoder-side attention
encoder_padding_mask – for ignoring pad tokens
past_key_values (dict or None) – dictionary used for storing state during generation
- Returns
the decoder’s features of shape (batch, tgt_len, embed_dim)
the cache
hidden states
attentions
- Return type
BaseModelOutputWithPast or tuple
- class deepke.name_entity_re.few_shot.models.modeling_bart.Attention(embed_dim, num_heads, dropout=0.0, bias=True, encoder_decoder_attention=False, cache_key=None, preseqlen=- 1, use_prompt=True)[source]¶
Bases:
torch.nn.modules.module.Module
Multi-headed attention from ‘Attention Is All You Need’ paper
- forward(idx, query, key: Optional[torch.Tensor], key_padding_mask: Optional[torch.Tensor] = None, layer_state: Optional[Dict[str, Optional[torch.Tensor]]] = None, attn_mask: Optional[torch.Tensor] = None, output_attentions=False) Tuple[torch.Tensor, Optional[torch.Tensor]] [source]¶
Input shape: Time(SeqLen) x Batch x Channel
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartClassificationHead(input_dim, inner_dim, num_classes, pooler_dropout)[source]¶
Bases:
torch.nn.modules.module.Module
Head for sentence-level classification tasks.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.few_shot.models.modeling_bart.LearnedPositionalEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: int, offset)[source]¶
Bases:
torch.nn.modules.sparse.Embedding
This module learns positional embeddings up to a fixed maximum size. Padding ids are ignored by either offsetting based on padding_idx or by setting padding_idx to None and ensuring that the appropriate position ids are passed to the forward function.
- weight: torch.Tensor¶
- deepke.name_entity_re.few_shot.models.modeling_bart.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True)[source]¶
- deepke.name_entity_re.few_shot.models.modeling_bart.fill_with_neg_inf(t)[source]¶
FP16-compatible function that fills a input_ids with -inf.
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartModel(config: transformers.configuration_bart.BartConfig)[source]¶
Bases:
deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel
The bare BART Model outputting raw hidden-states without any specific head on top.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
BartConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
- forward(input_ids, attention_mask=None, decoder_input_ids=None, encoder_outputs: Optional[Tuple] = None, decoder_attention_mask=None, past_key_values=None, use_cache=None, use_prompt=None, output_attentions=None, output_hidden_states=None, return_dict=None, **kwargs)[source]¶
The
BartModel
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
Indices can be obtained using
BartTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
decoder_input_ids (
torch.LongTensor
of shape(batch_size, target_sequence_length)
, optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting theinput_ids
to the right, following the paper.decoder_attention_mask (
torch.BoolTensor
of shape(batch_size, tgt_seq_len)
, optional) –Default behavior: generate a tensor that ignores pad tokens in
decoder_input_ids
. Causal mask will also be used by default.If you want to change padding behavior, you should read
modeling_bart._prepare_decoder_inputs()
and modify to your needs. See diagram 1 in the paper for more information on the default strategy.encoder_outputs (
tuple(tuple(torch.FloatTensor)
, optional) – Tuple consists of (last_hidden_state
, optional:hidden_states
, optional:attentions
)last_hidden_state
of shape(batch_size, sequence_length, hidden_size)
, optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.past_key_values (
tuple(tuple(torch.FloatTensor))
of lengthconfig.n_layers
with each tuple having 4 tensors of shape(batch_size, num_heads, sequence_length - 1, embed_size_per_head)
) –Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.
If
past_key_values
are used, the user can optionally input only the lastdecoder_input_ids
(those that don’t have their past key value states given to this model) of shape(batch_size, 1)
instead of alldecoder_input_ids
of shape(batch_size, sequence_length)
.use_cache (
bool
, optional) – If set toTrue
,past_key_values
key value states are returned and can be used to speed up decoding (seepast_key_values
).output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.
- Returns
A
Seq2SeqModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (BartConfig
) and inputs.last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the decoder of the model.If
past_key_values
is used only the last hidden-state of the sequences of shape(batch_size, 1, hidden_size)
is output.past_key_values (
List[torch.FloatTensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) – List oftorch.FloatTensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see
past_key_values
input) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
Seq2SeqModelOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import BartTokenizer, BartModel >>> import torch >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') >>> model = BartModel.from_pretrained('facebook/bart-large', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> outputs = model(**inputs) >>> last_hidden_states = outputs.last_hidden_state
- get_input_embeddings()[source]¶
Returns the model’s input embeddings.
- Returns
A torch module mapping vocabulary to hidden states.
- Return type
nn.Module
- set_input_embeddings(value)[source]¶
Set model’s input embeddings.
- Parameters
value (
nn.Module
) – A module mapping vocabulary to hidden states.
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartForConditionalGeneration(config: transformers.configuration_bart.BartConfig)[source]¶
Bases:
deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel
The BART Model with a language modeling head. Can be used for summarization.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
BartConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
- base_model_prefix = 'model'¶
- authorized_missing_keys = ['final_logits_bias', 'encoder\\.version', 'decoder\\.version']¶
- resize_token_embeddings(new_num_tokens: int) torch.nn.modules.sparse.Embedding [source]¶
Resizes input token embeddings matrix of the model if
new_num_tokens != config.vocab_size
.Takes care of tying weights embeddings afterwards if the model class has a
tie_weights()
method.- Parameters
new_num_tokens (
int
, optional) – The number of new tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided orNone
, just returns a pointer to the input tokenstorch.nn.Embedding
module of the model wihtout doing anything.- Returns
Pointer to the input tokens Embeddings Module of the model.
- Return type
torch.nn.Embedding
- forward(input_ids, attention_mask=None, encoder_outputs=None, decoder_input_ids=None, decoder_attention_mask=None, past_key_values=None, labels=None, use_cache=None, use_prompt=None, output_attentions=None, output_hidden_states=None, return_dict=None, **unused)[source]¶
The
BartForConditionalGeneration
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
Indices can be obtained using
BartTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
decoder_input_ids (
torch.LongTensor
of shape(batch_size, target_sequence_length)
, optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting theinput_ids
to the right, following the paper.decoder_attention_mask (
torch.BoolTensor
of shape(batch_size, tgt_seq_len)
, optional) –Default behavior: generate a tensor that ignores pad tokens in
decoder_input_ids
. Causal mask will also be used by default.If you want to change padding behavior, you should read
modeling_bart._prepare_decoder_inputs()
and modify to your needs. See diagram 1 in the paper for more information on the default strategy.encoder_outputs (
tuple(tuple(torch.FloatTensor)
, optional) – Tuple consists of (last_hidden_state
, optional:hidden_states
, optional:attentions
)last_hidden_state
of shape(batch_size, sequence_length, hidden_size)
, optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.past_key_values (
tuple(tuple(torch.FloatTensor))
of lengthconfig.n_layers
with each tuple having 4 tensors of shape(batch_size, num_heads, sequence_length - 1, embed_size_per_head)
) –Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.
If
past_key_values
are used, the user can optionally input only the lastdecoder_input_ids
(those that don’t have their past key value states given to this model) of shape(batch_size, 1)
instead of alldecoder_input_ids
of shape(batch_size, sequence_length)
.use_cache (
bool
, optional) – If set toTrue
,past_key_values
key value states are returned and can be used to speed up decoding (seepast_key_values
).output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.labels (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) – Labels for computing the masked language modeling loss. Indices should either be in[0, ..., config.vocab_size]
or -100 (seeinput_ids
docstring). Tokens with indices set to-100
are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size]
.
- Returns
A
Seq2SeqLMOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (BartConfig
) and inputs.loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) – Languaged modeling loss.logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) – Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).past_key_values (
List[torch.FloatTensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) – List oftorch.FloatTensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see
past_key_values
input) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
Conditional generation example:
>>> # Mask filling only works for bart-large >>> from transformers import BartTokenizer, BartForConditionalGeneration >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') >>> TXT = "My friends are <mask> but they eat too many carbs." >>> model = BartForConditionalGeneration.from_pretrained('facebook/bart-large') >>> input_ids = tokenizer([TXT], return_tensors='pt')['input_ids'] >>> logits = model(input_ids).logits >>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item() >>> probs = logits[0, masked_index].softmax(dim=0) >>> values, predictions = probs.topk(5) >>> tokenizer.decode(predictions).split() >>> # ['good', 'great', 'all', 'really', 'very']
- Return type
Seq2SeqLMOutput
ortuple(torch.FloatTensor)
Summarization example:
>>> from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig >>> # see ``examples/summarization/bart/run_eval.py`` for a longer example >>> model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn') >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn') >>> ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs." >>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt') >>> # Generate Summary >>> summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True) >>> print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
- prepare_inputs_for_generation(decoder_input_ids, past, attention_mask, use_cache, encoder_outputs, **kwargs)[source]¶
Implement in subclasses of
PreTrainedModel
for custom behavior to prepare inputs in the generate method.
- adjust_logits_during_generation(logits, cur_len, max_length)[source]¶
Implement in subclasses of
PreTrainedModel
for custom behavior to adjust the logits in the generate method.
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartForSequenceClassification(config: transformers.configuration_bart.BartConfig, **kwargs)[source]¶
Bases:
deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel
Bart model with a sequence classification/head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
BartConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
- forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_outputs=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
The
BartForSequenceClassification
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
Indices can be obtained using
BartTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
decoder_input_ids (
torch.LongTensor
of shape(batch_size, target_sequence_length)
, optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting theinput_ids
to the right, following the paper.decoder_attention_mask (
torch.BoolTensor
of shape(batch_size, tgt_seq_len)
, optional) –Default behavior: generate a tensor that ignores pad tokens in
decoder_input_ids
. Causal mask will also be used by default.If you want to change padding behavior, you should read
modeling_bart._prepare_decoder_inputs()
and modify to your needs. See diagram 1 in the paper for more information on the default strategy.encoder_outputs (
tuple(tuple(torch.FloatTensor)
, optional) – Tuple consists of (last_hidden_state
, optional:hidden_states
, optional:attentions
)last_hidden_state
of shape(batch_size, sequence_length, hidden_size)
, optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.past_key_values (
tuple(tuple(torch.FloatTensor))
of lengthconfig.n_layers
with each tuple having 4 tensors of shape(batch_size, num_heads, sequence_length - 1, embed_size_per_head)
) –Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.
If
past_key_values
are used, the user can optionally input only the lastdecoder_input_ids
(those that don’t have their past key value states given to this model) of shape(batch_size, 1)
instead of alldecoder_input_ids
of shape(batch_size, sequence_length)
.use_cache (
bool
, optional) – If set toTrue
,past_key_values
key value states are returned and can be used to speed up decoding (seepast_key_values
).output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.labels (
torch.LongTensor
of shape(batch_size,)
, optional) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Ifconfig.num_labels > 1
a classification loss is computed (Cross-Entropy).
- Returns
A
Seq2SeqSequenceClassifierOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (BartConfig
) and inputs.loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabel
is provided) – Classification (or regression if config.num_labels==1) loss.logits (
torch.FloatTensor
of shape(batch_size, config.num_labels)
) – Classification (or regression if config.num_labels==1) scores (before SoftMax).past_key_values (
List[torch.FloatTensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) – List oftorch.FloatTensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see
past_key_values
input) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
Seq2SeqSequenceClassifierOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import BartTokenizer, BartForSequenceClassification >>> import torch >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') >>> model = BartForSequenceClassification.from_pretrained('facebook/bart-large', return_dict=True) >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 >>> outputs = model(**inputs, labels=labels) >>> loss = outputs.loss >>> logits = outputs.logits
- class deepke.name_entity_re.few_shot.models.modeling_bart.BartForQuestionAnswering(config)[source]¶
Bases:
deepke.name_entity_re.few_shot.models.modeling_bart.PretrainedBartModel
BART Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
This model inherits from
PreTrainedModel
. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
BartConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
- forward(input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, encoder_outputs=None, start_positions=None, end_positions=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
The
BartForQuestionAnswering
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
Indices can be obtained using
BartTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
decoder_input_ids (
torch.LongTensor
of shape(batch_size, target_sequence_length)
, optional) – Provide for translation and summarization training. By default, the model will create this tensor by shifting theinput_ids
to the right, following the paper.decoder_attention_mask (
torch.BoolTensor
of shape(batch_size, tgt_seq_len)
, optional) –Default behavior: generate a tensor that ignores pad tokens in
decoder_input_ids
. Causal mask will also be used by default.If you want to change padding behavior, you should read
modeling_bart._prepare_decoder_inputs()
and modify to your needs. See diagram 1 in the paper for more information on the default strategy.encoder_outputs (
tuple(tuple(torch.FloatTensor)
, optional) – Tuple consists of (last_hidden_state
, optional:hidden_states
, optional:attentions
)last_hidden_state
of shape(batch_size, sequence_length, hidden_size)
, optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.past_key_values (
tuple(tuple(torch.FloatTensor))
of lengthconfig.n_layers
with each tuple having 4 tensors of shape(batch_size, num_heads, sequence_length - 1, embed_size_per_head)
) –Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up decoding.
If
past_key_values
are used, the user can optionally input only the lastdecoder_input_ids
(those that don’t have their past key value states given to this model) of shape(batch_size, 1)
instead of alldecoder_input_ids
of shape(batch_size, sequence_length)
.use_cache (
bool
, optional) – If set toTrue
,past_key_values
key value states are returned and can be used to speed up decoding (seepast_key_values
).output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.start_positions (
torch.LongTensor
of shape(batch_size,)
, optional) – Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.end_positions (
torch.LongTensor
of shape(batch_size,)
, optional) – Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.
- Returns
A
Seq2SeqQuestionAnsweringModelOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (BartConfig
) and inputs.loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.start_logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) – Span-start scores (before SoftMax).end_logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) – Span-end scores (before SoftMax).past_key_values (
List[torch.FloatTensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) – List oftorch.FloatTensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).Contains pre-computed hidden-states (key and values in the attention blocks) of the decoder that can be used (see
past_key_values
input) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
Seq2SeqQuestionAnsweringModelOutput
ortuple(torch.FloatTensor)
Example:
>>> from transformers import BartTokenizer, BartForQuestionAnswering >>> import torch >>> tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') >>> model = BartForQuestionAnswering.from_pretrained('facebook/bart-large', return_dict=True) >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" >>> inputs = tokenizer(question, text, return_tensors='pt') >>> start_positions = torch.tensor([1]) >>> end_positions = torch.tensor([3]) >>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions) >>> loss = outputs.loss >>> start_scores = outputs.start_logits >>> end_scores = outputs.end_logits