CLIP¶
deepke.name_entity_re.multimodal.models.clip.configuration_clip module¶
CLIP model configuration
- class deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPTextConfig(vocab_size=49408, hidden_size=512, intermediate_size=2048, num_hidden_layers=12, num_attention_heads=8, max_position_embeddings=77, hidden_act='quick_gelu', layer_norm_eps=1e-05, dropout=0.0, attention_dropout=0.0, initializer_range=0.02, initializer_factor=1.0, pad_token_id=1, bos_token_id=0, eos_token_id=2, **kwargs)[source]¶
Bases:
transformers.configuration_utils.PretrainedConfig
This is the configuration class to store the configuration of a [CLIPModel]. It is used to instantiate an CLIP model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the CLIP [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) architecture.
Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.
- Parameters
vocab_size (int, optional, defaults to 49408) – Vocabulary size of the CLIP text model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [CLIPModel].
hidden_size (int, optional, defaults to 512) – Dimensionality of the encoder layers and the pooler layer.
intermediate_size (int, optional, defaults to 2048) – Dimensionality of the “intermediate” (i.e., feed-forward) layer in the Transformer encoder.
num_hidden_layers (int, optional, defaults to 12) – Number of hidden layers in the Transformer encoder.
num_attention_heads (int, optional, defaults to 8) – Number of attention heads for each attention layer in the Transformer encoder.
max_position_embeddings (int, optional, defaults to 77) – The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
hidden_act (str or function, optional, defaults to “quick_gelu”) – The non-linear activation function (function or string) in the encoder and pooler. If string, “gelu”, “relu”, “selu” and “gelu_new” ``”quick_gelu”` are supported. layer_norm_eps (float, optional, defaults to 1e-5): The epsilon used by the layer normalization layers.
attention_dropout (float, optional, defaults to 0.0) – The dropout ratio for the attention probabilities.
dropout (float, optional, defaults to 0.0) – The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
initializer_range (float, optional, defaults to 0.02) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
initializer_factor (float`, optional, defaults to 1) – A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing).
Example:
```python >>> from transformers import CLIPTextModel, CLIPTextConfig
>>> # Initializing a CLIPTextModel with openai/clip-vit-base-patch32 style configuration >>> configuration = CLIPTextConfig()
>>> # Initializing a CLIPTextConfig from the openai/clip-vit-base-patch32 style configuration >>> model = CLIPTextModel(configuration)
>>> # Accessing the model configuration >>> configuration = model.config ```
- class deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPVisionConfig(hidden_size=768, intermediate_size=3072, num_hidden_layers=12, num_attention_heads=12, image_size=224, patch_size=32, hidden_act='quick_gelu', layer_norm_eps=1e-05, dropout=0.0, attention_dropout=0.0, initializer_range=0.02, initializer_factor=1.0, **kwargs)[source]¶
Bases:
transformers.configuration_utils.PretrainedConfig
This is the configuration class to store the configuration of a [CLIPModel]. It is used to instantiate an CLIP model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the CLIP [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) architecture.
Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.
- Parameters
hidden_size (int, optional, defaults to 768) – Dimensionality of the encoder layers and the pooler layer.
intermediate_size (int, optional, defaults to 3072) – Dimensionality of the “intermediate” (i.e., feed-forward) layer in the Transformer encoder.
num_hidden_layers (int, optional, defaults to 12) – Number of hidden layers in the Transformer encoder.
num_attention_heads (int, optional, defaults to 12) – Number of attention heads for each attention layer in the Transformer encoder.
image_size (int, optional, defaults to 224) – The size (resolution) of each image.
patch_size (int, optional, defaults to 32) – The size (resolution) of each patch.
hidden_act (str or function, optional, defaults to “quick_gelu”) – The non-linear activation function (function or string) in the encoder and pooler. If string, “gelu”, “relu”, “selu” and “gelu_new” ``”quick_gelu”` are supported. layer_norm_eps (float, optional, defaults to 1e-5): The epsilon used by the layer normalization layers.
dropout (float, optional, defaults to 0.0) – The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (float, optional, defaults to 0.0) – The dropout ratio for the attention probabilities.
initializer_range (float, optional, defaults to 0.02) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
initializer_factor (float`, optional, defaults to 1) – A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing).
Example:
```python >>> from transformers import CLIPVisionModel, CLIPVisionConfig
>>> # Initializing a CLIPVisionModel with openai/clip-vit-base-patch32 style configuration >>> configuration = CLIPVisionConfig()
>>> # Initializing a CLIPVisionModel model from the openai/clip-vit-base-patch32 style configuration >>> model = CLIPVisionModel(configuration)
>>> # Accessing the model configuration >>> configuration = model.config ```
- class deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPConfig(text_config_dict=None, vision_config_dict=None, projection_dim=512, logit_scale_init_value=2.6592, **kwargs)[source]¶
Bases:
transformers.configuration_utils.PretrainedConfig
[CLIPConfig] is the configuration class to store the configuration of a [CLIPModel]. It is used to instantiate CLIP model according to the specified arguments, defining the text model and vision model configs.
Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.
- Parameters
text_config_dict (dict, optional) – Dictionary of configuration options used to initialize [CLIPTextConfig].
vision_config_dict (dict, optional) – Dictionary of configuration options used to initialize [CLIPVisionConfig].
projection_dim (int, optional, defaults to 512) – Dimentionality of text and vision projection layers.
logit_scale_init_value (float, optional, defaults to 2.6592) – The inital value of the logit_scale paramter. Default is used as per the original CLIP implementation.
kwargs (optional) – Dictionary of keyword arguments.
- is_composition = True¶
- classmethod from_text_vision_configs(text_config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPTextConfig, vision_config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPVisionConfig, **kwargs)[source]¶
Instantiate a [CLIPConfig] (or a derived class) from clip text model configuration and clip vision model configuration.
- Returns
An instance of a configuration object
- Return type
[CLIPConfig]
deepke.name_entity_re.multimodal.models.clip.feature_extraction_clip module¶
Feature extractor class for CLIP.
- class deepke.name_entity_re.multimodal.models.clip.feature_extraction_clip.CLIPFeatureExtractor(do_resize=True, size=224, resample=Resampling.BICUBIC, do_center_crop=True, crop_size=224, do_normalize=True, image_mean=None, image_std=None, **kwargs)[source]¶
Bases:
deepke.name_entity_re.multimodal.models.clip.feature_extraction_utils.FeatureExtractionMixin
,deepke.name_entity_re.multimodal.models.clip.image_utils.ImageFeatureExtractionMixin
Constructs a CLIP feature extractor.
This feature extractor inherits from [FeatureExtractionMixin] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.
- Parameters
do_resize (bool, optional, defaults to True) – Whether to resize the input to a certain size.
size (int, optional, defaults to 224) – Resize the input to the given size. Only has an effect if do_resize is set to True.
resample (int, optional, defaults to PIL.Image.BICUBIC) – An optional resampling filter. This can be one of PIL.Image.NEAREST, PIL.Image.BOX, PIL.Image.BILINEAR, PIL.Image.HAMMING, PIL.Image.BICUBIC or PIL.Image.LANCZOS. Only has an effect if do_resize is set to True.
do_center_crop (bool, optional, defaults to True) – Whether to crop the input at the center. If the input size is smaller than crop_size along any edge, the image is padded with 0’s and then center cropped.
crop_size (int, optional, defaults to 224) – Desired output size when applying center-cropping. Only has an effect if do_center_crop is set to True.
do_normalize (bool, optional, defaults to True) – Whether or not to normalize the input with image_mean and image_std.
image_mean (List[int], defaults to [0.485, 0.456, 0.406]) – The sequence of means for each channel, to be used when normalizing images.
image_std (List[int], defaults to [0.229, 0.224, 0.225]) – The sequence of standard deviations for each channel, to be used when normalizing images.
- model_input_names = ['pixel_values']¶
- center_crop(image, size)[source]¶
Crops image to the given size using a center crop. Note that if the image is too small to be cropped to the size is given, it will be padded (so the returned result has the size asked).
- Parameters
image (PIL.Image.Image or np.ndarray or torch.Tensor) – The image to resize.
size (int or Tuple[int, int]) – The size to which crop the image.
- resize(image, size, resample=Resampling.BICUBIC)[source]¶
Resizes image. Note that this will trigger a conversion of image to a PIL Image.
- Parameters
image (PIL.Image.Image or np.ndarray or torch.Tensor) – The image to resize.
size (int or Tuple[int, int]) – The size to use for resizing the image. If int it will be resized to match the shorter side
resample (int, optional, defaults to PIL.Image.BILINEAR) – The filter to user for resampling.
deepke.name_entity_re.multimodal.models.clip.feature_extraction_utils module¶
Feature extraction saving/loading class for common feature extractors.
- class deepke.name_entity_re.multimodal.models.clip.feature_extraction_utils.ExplicitEnum(value)[source]¶
Bases:
enum.Enum
Enum with more explicit error message for missing values.
- class deepke.name_entity_re.multimodal.models.clip.feature_extraction_utils.TensorType(value)[source]¶
Bases:
deepke.name_entity_re.multimodal.models.clip.feature_extraction_utils.ExplicitEnum
Possible values for the return_tensors argument in [PreTrainedTokenizerBase.__call__]. Useful for tab-completion in an IDE.
- PYTORCH = 'pt'¶
- TENSORFLOW = 'tf'¶
- NUMPY = 'np'¶
- JAX = 'jax'¶
- class deepke.name_entity_re.multimodal.models.clip.feature_extraction_utils.BatchFeature(data: Optional[Dict[str, Any]] = None, tensor_type: Union[None, str, deepke.name_entity_re.multimodal.models.clip.feature_extraction_utils.TensorType] = None)[source]¶
Bases:
collections.UserDict
Holds the output of the
pad()
and feature extractor specific__call__
methods.This class is derived from a python dictionary and can be used as a dictionary.
- Parameters
data (
dict
) – Dictionary of lists/arrays/tensors returned by the __call__/pad methods (‘input_values’, ‘attention_mask’, etc.).tensor_type (
Union[None, str, TensorType]
, optional) – You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.
- class deepke.name_entity_re.multimodal.models.clip.feature_extraction_utils.FeatureExtractionMixin(**kwargs)[source]¶
Bases:
object
This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.
- classmethod from_pretrained(pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) SequenceFeatureExtractor [source]¶
Instantiate a type of
FeatureExtractionMixin
from a feature extractor, e.g. a derived class ofSequenceFeatureExtractor
.- Parameters
pretrained_model_name_or_path (
str
oros.PathLike
) –This can be either:
a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like
bert-base-uncased
, or namespaced under a user or organization name, likedbmdz/bert-base-german-cased
.a path to a directory containing a feature extractor file saved using the
save_pretrained()
method, e.g.,./my_model_directory/
.a path or url to a saved feature extractor JSON file, e.g.,
./my_model_directory/preprocessor_config.json
.
cache_dir (
str
oros.PathLike
, optional) – Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.force_download (
bool
, optional, defaults toFalse
) – Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.resume_download (
bool
, optional, defaults toFalse
) – Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.proxies (
Dict[str, str]
, optional) – A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request.use_auth_token (
str
or bool, optional) – The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runningtransformers-cli login
(stored inhuggingface
).revision (
str
, optional, defaults to"main"
) – The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.return_unused_kwargs (
bool
, optional, defaults toFalse
) – IfFalse
, then this function returns just the final feature extractor object. IfTrue
, then this functions returns aTuple(feature_extractor, unused_kwargs)
where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part ofkwargs
which has not been used to updatefeature_extractor
and is otherwise ignored.kwargs (
Dict[str, Any]
, optional) – The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by thereturn_unused_kwargs
keyword parameter.
Note
Passing
use_auth_token=True
is required when you want to use a private model.- Returns
A feature extractor of type
FeatureExtractionMixin
.
Examples:
# We can't instantiate directly the base class `FeatureExtractionMixin` nor `SequenceFeatureExtractor` so let's show the examples on a # derived class: `Wav2Vec2FeatureExtractor` feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h') # Download feature_extraction_config from huggingface.co and cache. feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('./test/saved_model/') # E.g. feature_extractor (or model) was saved using `save_pretrained('./test/saved_model/')` feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('./test/saved_model/preprocessor_config.json') feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h', return_attention_mask=False, foo=False) assert feature_extractor.return_attention_mask is False feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h', return_attention_mask=False, foo=False, return_unused_kwargs=True) assert feature_extractor.return_attention_mask is False assert unused_kwargs == {'foo': False}
- save_pretrained(save_directory: Union[str, os.PathLike])[source]¶
Save a feature_extractor object to the directory
save_directory
, so that it can be re-loaded using thefrom_pretrained()
class method.- Parameters
save_directory (
str
oros.PathLike
) – Directory where the feature extractor JSON file will be saved (will be created if it does not exist).
- classmethod get_feature_extractor_dict(pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) Tuple[Dict[str, Any], Dict[str, Any]] [source]¶
From a
pretrained_model_name_or_path
, resolve to a dictionary of parameters, to be used for instantiating a feature extractor of typeFeatureExtractionMixin
usingfrom_dict
.- Parameters
pretrained_model_name_or_path (
str
oros.PathLike
) – The identifier of the pre-trained checkpoint from which we want the dictionary of parameters.- Returns
The dictionary(ies) that will be used to instantiate the feature extractor object.
- Return type
Tuple[Dict, Dict]
- classmethod from_dict(feature_extractor_dict: Dict[str, Any], **kwargs) SequenceFeatureExtractor [source]¶
Instantiates a type of
FeatureExtractionMixin
from a Python dictionary of parameters.- Parameters
feature_extractor_dict (
Dict[str, Any]
) – Dictionary that will be used to instantiate the feature extractor object. Such a dictionary can be retrieved from a pretrained checkpoint by leveraging theto_dict()
method.kwargs (
Dict[str, Any]
) – Additional parameters from which to initialize the feature extractor object.
- Returns
The feature extractor object instantiated from those parameters.
- Return type
FeatureExtractionMixin
- to_dict() Dict[str, Any] [source]¶
Serializes this instance to a Python dictionary.
- Returns
Dictionary of all the attributes that make up this feature extractor instance.
- Return type
Dict[str, Any]
- classmethod from_json_file(json_file: Union[str, os.PathLike]) SequenceFeatureExtractor [source]¶
Instantiates a feature extractor of type
FeatureExtractionMixin
from the path to a JSON file of parameters.- Parameters
json_file (
str
oros.PathLike
) – Path to the JSON file containing the parameters.- Returns
The feature_extractor object instantiated from that JSON file.
- Return type
A feature extractor of type
FeatureExtractionMixin
- to_json_string() str [source]¶
Serializes this instance to a JSON string.
- Returns
String containing all the attributes that make up this feature_extractor instance in JSON format.
- Return type
- to_json_file(json_file_path: Union[str, os.PathLike])[source]¶
Save this instance to a JSON file.
- Parameters
json_file_path (
str
oros.PathLike
) – Path to the JSON file in which this feature_extractor instance’s parameters will be saved.
deepke.name_entity_re.multimodal.models.clip.file_utils module¶
Utilities for working with the local dataset cache. Parts of this file is adapted from the AllenNLP library at https://github.com/allenai/allennlp.
- class deepke.name_entity_re.multimodal.models.clip.file_utils.EmptyTqdm(*args, **kwargs)[source]¶
Bases:
object
Dummy tqdm which doesn’t do anything.
- deepke.name_entity_re.multimodal.models.clip.file_utils.is_torch_onnx_dict_inputs_support_available()[source]¶
- deepke.name_entity_re.multimodal.models.clip.file_utils.is_pytorch_quantization_available()[source]¶
- deepke.name_entity_re.multimodal.models.clip.file_utils.is_tensorflow_probability_available()[source]¶
- class deepke.name_entity_re.multimodal.models.clip.file_utils.DummyObject[source]¶
Bases:
type
Metaclass for the dummy objects. Any class inheriting from it will return the ImportError generated by requires_backend each time a user tries to access any method of that class.
- deepke.name_entity_re.multimodal.models.clip.file_utils.add_start_docstrings_to_model_forward(*docstr)[source]¶
- deepke.name_entity_re.multimodal.models.clip.file_utils.add_code_sample_docstrings(*docstr, processor_class=None, checkpoint=None, output_type=None, config_class=None, mask='[MASK]', model_cls=None, modality=None, expected_output='', expected_loss='')[source]¶
- deepke.name_entity_re.multimodal.models.clip.file_utils.replace_return_docstrings(output_type=None, config_class=None)[source]¶
- deepke.name_entity_re.multimodal.models.clip.file_utils.hf_bucket_url(model_id: str, filename: str, subfolder: Optional[str] = None, revision: Optional[str] = None, mirror=None) str [source]¶
Resolve a model identifier, a file name, and an optional revision id, to a huggingface.co-hosted url, redirecting to Cloudfront (a Content Delivery Network, or CDN) for large files. Cloudfront is replicated over the globe so downloads are way faster for the end user (and it also lowers our bandwidth costs). Cloudfront aggressively caches files by default (default TTL is 24 hours), however this is not an issue here because we migrated to a git-based versioning system on huggingface.co, so we now store the files on S3/Cloudfront in a content-addressable way (i.e., the file name is its hash). Using content-addressable filenames means cache can’t ever be stale. In terms of client-side caching from this library, we base our caching on the objects’ ETag. An object’ ETag is: its sha1 if stored in git, or its sha256 if stored in git-lfs. Files cached locally from transformers before v3.5.0 are not shared with those new files, because the cached file’s name contains a hash of the url (which changed).
- deepke.name_entity_re.multimodal.models.clip.file_utils.url_to_filename(url: str, etag: Optional[str] = None) str [source]¶
Convert url into a hashed filename in a repeatable way. If etag is specified, append its hash to the url’s, delimited by a period. If the url ends with .h5 (Keras HDF5 weights) adds ‘.h5’ to the name so that TF 2.0 can identify it as a HDF5 file (see https://github.com/tensorflow/tensorflow/blob/00fad90125b18b80fe054de1055770cfb8fe4ba3/tensorflow/python/keras/engine/network.py#L1380)
- deepke.name_entity_re.multimodal.models.clip.file_utils.filename_to_url(filename, cache_dir=None)[source]¶
Return the url and etag (which may be None) stored for filename. Raise EnvironmentError if filename or its stored metadata do not exist.
- deepke.name_entity_re.multimodal.models.clip.file_utils.get_cached_models(cache_dir: Optional[Union[str, pathlib.Path]] = None) List[Tuple] [source]¶
Returns a list of tuples representing model binaries that are cached locally. Each tuple has shape (model_url, etag, size_MB). Filenames in cache_dir are use to get the metadata for each model, only urls ending with .bin are added. :param cache_dir: The cache directory to search for models within. Will default to the transformers cache if unset. :type cache_dir: Union[str, Path], optional
- Returns
List of tuples each with shape (model_url, etag, size_MB)
- Return type
List[Tuple]
- deepke.name_entity_re.multimodal.models.clip.file_utils.cached_path(url_or_filename, cache_dir=None, force_download=False, proxies=None, resume_download=False, user_agent: Optional[Union[Dict, str]] = None, extract_compressed_file=False, force_extract=False, use_auth_token: Optional[Union[bool, str]] = None, local_files_only=False) Optional[str] [source]¶
Given something that might be a URL (or might be a local path), determine which. If it’s a URL, download the file and cache it, and return the path to the cached file. If it’s already a local path, make sure the file exists and then return the path :param cache_dir: specify a cache directory to save the file to (overwrite the default cache dir). :param force_download: if True, re-download the file even if it’s already cached in the cache dir. :param resume_download: if True, resume the download if incompletely received file is found. :param user_agent: Optional string or dict that will be appended to the user-agent on remote requests. :param use_auth_token: Optional string or boolean to use as Bearer token for remote files. If True,
will get token from ~/.huggingface.
- Parameters
extract_compressed_file – if True and the path point to a zip or tar file, extract the compressed file in a folder along the archive.
force_extract – if True when extract_compressed_file is True and the archive was already extracted, re-extract the archive and override the folder where it was extracted.
- Returns
Local path (string) of file or if networking is off, last version of file cached on disk.
- Raises
In case of non-recoverable file (non-existent or inaccessible url + no cache on disk). –
- deepke.name_entity_re.multimodal.models.clip.file_utils.http_user_agent(user_agent: Optional[Union[Dict, str]] = None) str [source]¶
Formats a user-agent string with basic info about a request.
- exception deepke.name_entity_re.multimodal.models.clip.file_utils.RepositoryNotFoundError(*args, **kwargs)[source]¶
Bases:
requests.exceptions.HTTPError
Raised when trying to access a hf.co URL with an invalid repository name, or with a private repo name the user does not have access to.
- exception deepke.name_entity_re.multimodal.models.clip.file_utils.EntryNotFoundError(*args, **kwargs)[source]¶
Bases:
requests.exceptions.HTTPError
Raised when trying to access a hf.co URL with a valid repository and revision but an invalid filename.
- exception deepke.name_entity_re.multimodal.models.clip.file_utils.RevisionNotFoundError(*args, **kwargs)[source]¶
Bases:
requests.exceptions.HTTPError
Raised when trying to access a hf.co URL with a valid repository but an invalid revision.
- deepke.name_entity_re.multimodal.models.clip.file_utils.http_get(url: str, temp_file: BinaryIO, proxies=None, resume_size=0, headers: Optional[Dict[str, str]] = None)[source]¶
Download remote file. Do not gobble up errors.
- deepke.name_entity_re.multimodal.models.clip.file_utils.get_from_cache(url: str, cache_dir=None, force_download=False, proxies=None, etag_timeout=10, resume_download=False, user_agent: Optional[Union[Dict, str]] = None, use_auth_token: Optional[Union[bool, str]] = None, local_files_only=False) Optional[str] [source]¶
Given a URL, look for the corresponding file in the local cache. If it’s not there, download it. Then return the path to the cached file. :returns: Local path (string) of file or if networking is off, last version of file cached on disk.
- Raises
In case of non-recoverable file (non-existent or inaccessible url + no cache on disk). –
- deepke.name_entity_re.multimodal.models.clip.file_utils.get_file_from_repo(path_or_repo: Union[str, os.PathLike], filename: str, cache_dir: Optional[Union[str, os.PathLike]] = None, force_download: bool = False, resume_download: bool = False, proxies: Optional[Dict[str, str]] = None, use_auth_token: Optional[Union[bool, str]] = None, revision: Optional[str] = None, local_files_only: bool = False)[source]¶
Tries to locate a file in a local folder and repo, downloads and cache it if necessary. :param path_or_repo: This can be either:
a string, the model id of a model repo on huggingface.co.
a path to a directory potentially containing the file.
- Parameters
filename (str) – The name of the file to locate in path_or_repo.
cache_dir (str or os.PathLike, optional) – Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
force_download (bool, optional, defaults to False) – Whether or not to force to (re-)download the configuration files and override the cached versions if they exist.
resume_download (bool, optional, defaults to False) – Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.
proxies (Dict[str, str], optional) – A dictionary of proxy servers to use by protocol or endpoint, e.g., {‘http’: ‘foo.bar:3128’, ‘http://hostname’: ‘foo.bar:4012’}. The proxies are used on each request.
use_auth_token (str or bool, optional) – The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running transformers-cli login (stored in ~/.huggingface).
revision (str, optional, defaults to “main”) – The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
local_files_only (bool, optional, defaults to False) – If True, will only try to load the tokenizer configuration from local files.
<Tip> Passing use_auth_token=True is required when you want to use a private model. </Tip> :returns: Returns the resolved file (to the cache folder if downloaded from a repo) or None if the
file does not exist.
- Return type
Optional[str]
Examples:
`python # Download a tokenizer configuration from huggingface.co and cache. tokenizer_config = get_file_from_repo("bert-base-uncased", "tokenizer_config.json") # This model does not have a tokenizer config so the result will be None. tokenizer_config = get_file_from_repo("xlm-roberta-base", "tokenizer_config.json") `
- deepke.name_entity_re.multimodal.models.clip.file_utils.has_file(path_or_repo: Union[str, os.PathLike], filename: str, revision: Optional[str] = None, mirror: Optional[str] = None, proxies: Optional[Dict[str, str]] = None, use_auth_token: Optional[Union[bool, str]] = None)[source]¶
Checks if a repo contains a given file wihtout downloading it. Works for remote repos and local folders. <Tip warning={false}> This function will raise an error if the repository path_or_repo is not valid or if revision does not exist for this repo, but will return False for regular connection errors. </Tip>
- deepke.name_entity_re.multimodal.models.clip.file_utils.get_list_of_files(path_or_repo: Union[str, os.PathLike], revision: Optional[str] = None, use_auth_token: Optional[Union[bool, str]] = None, local_files_only: bool = False) List[str] [source]¶
Gets the list of files inside path_or_repo. :param path_or_repo: Can be either the id of a repo on huggingface.co or a path to a directory. :type path_or_repo: str or os.PathLike :param revision: The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
- Parameters
use_auth_token (str or bool, optional) – The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running transformers-cli login (stored in ~/.huggingface).
local_files_only (bool, optional, defaults to False) – Whether or not to only rely on local files and not to attempt to download any files.
<Tip warning={true}> This API is not optimized, so calling it a lot may result in connection errors. </Tip> :returns: The list of files available in path_or_repo. :rtype: List[str]
- class deepke.name_entity_re.multimodal.models.clip.file_utils.cached_property(fget=None, fset=None, fdel=None, doc=None)[source]¶
Bases:
property
Descriptor that mimics @property but caches output in member variable. From tensorflow_datasets Built-in in functools from Python 3.8.
- deepke.name_entity_re.multimodal.models.clip.file_utils.is_tensor(x)[source]¶
Tests if x is a torch.Tensor, tf.Tensor, jaxlib.xla_extension.DeviceArray or np.ndarray.
- deepke.name_entity_re.multimodal.models.clip.file_utils.to_py_obj(obj)[source]¶
Convert a TensorFlow tensor, PyTorch tensor, Numpy array or python list to a python list.
- deepke.name_entity_re.multimodal.models.clip.file_utils.to_numpy(obj)[source]¶
Convert a TensorFlow tensor, PyTorch tensor, Numpy array or python list to a Numpy array.
- class deepke.name_entity_re.multimodal.models.clip.file_utils.ModelOutput[source]¶
Bases:
collections.OrderedDict
Base class for all model outputs as dataclass. Has a __getitem__ that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore the None attributes. Otherwise behaves like a regular python dictionary. <Tip warning={true}> You can’t unpack a ModelOutput directly. Use the [~file_utils.ModelOutput.to_tuple] method to convert it to a tuple before. </Tip>
- setdefault(*args, **kwargs)[source]¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- pop(k[, d]) v, remove specified key and return the corresponding [source]¶
value. If key is not found, d is returned if given, otherwise KeyError is raised.
- class deepke.name_entity_re.multimodal.models.clip.file_utils.ExplicitEnum(value)[source]¶
Bases:
enum.Enum
Enum with more explicit error message for missing values.
- class deepke.name_entity_re.multimodal.models.clip.file_utils.PaddingStrategy(value)[source]¶
Bases:
deepke.name_entity_re.multimodal.models.clip.file_utils.ExplicitEnum
Possible values for the padding argument in [PreTrainedTokenizerBase.__call__]. Useful for tab-completion in an IDE.
- LONGEST = 'longest'¶
- MAX_LENGTH = 'max_length'¶
- DO_NOT_PAD = 'do_not_pad'¶
- class deepke.name_entity_re.multimodal.models.clip.file_utils.TensorType(value)[source]¶
Bases:
deepke.name_entity_re.multimodal.models.clip.file_utils.ExplicitEnum
Possible values for the return_tensors argument in [PreTrainedTokenizerBase.__call__]. Useful for tab-completion in an IDE.
- PYTORCH = 'pt'¶
- TENSORFLOW = 'tf'¶
- NUMPY = 'np'¶
- JAX = 'jax'¶
- deepke.name_entity_re.multimodal.models.clip.file_utils.copy_func(f)[source]¶
Returns a copy of a function f.
- deepke.name_entity_re.multimodal.models.clip.file_utils.is_local_clone(repo_path, repo_url)[source]¶
Checks if the folder in repo_path is a local clone of repo_url.
- class deepke.name_entity_re.multimodal.models.clip.file_utils.PushToHubMixin[source]¶
Bases:
object
A Mixin containing the functionality to push a model or tokenizer to the hub.
- push_to_hub(repo_path_or_name: Optional[str] = None, repo_url: Optional[str] = None, use_temp_dir: bool = False, commit_message: Optional[str] = None, organization: Optional[str] = None, private: Optional[bool] = None, use_auth_token: Optional[Union[bool, str]] = None, **model_card_kwargs) str [source]¶
Upload the {object_files} to the 🤗 Model Hub while synchronizing a local clone of the repo in repo_path_or_name. :param repo_path_or_name: Can either be a repository name for your {object} in the Hub or a path to a local folder (in which case
the repository will have the name of that local folder). If not specified, will default to the name given by repo_url and a local directory with that name will be created.
- Parameters
repo_url (str, optional) – Specify this in case you want to push to an existing repository in the hub. If unspecified, a new repository will be created in your namespace (unless you specify an organization) with repo_name.
use_temp_dir (bool, optional, defaults to False) – Whether or not to clone the distant repo in a temporary directory or in repo_path_or_name inside the current working directory. This will slow things down if you are making changes in an existing repo since you will need to clone the repo before every push.
commit_message (str, optional) – Message to commit while pushing. Will default to “add {object}”.
organization (str, optional) – Organization in which you want to push your {object} (you must be a member of this organization).
private (bool, optional) – Whether or not the repository created should be private (requires a paying subscription).
use_auth_token (bool or str, optional) – The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running transformers-cli login (stored in ~/.huggingface). Will default to True if repo_url is not specified.
- Returns
The url of the commit of your {object} in the given repository.
- Return type
str
Examples:
`python from transformers import {object_class} {object} = {object_class}.from_pretrained("bert-base-cased") # Push the {object} to your namespace with the name "my-finetuned-bert" and have a local clone in the # *my-finetuned-bert* folder. {object}.push_to_hub("my-finetuned-bert") # Push the {object} to your namespace with the name "my-finetuned-bert" with no local clone. {object}.push_to_hub("my-finetuned-bert", use_temp_dir=True) # Push the {object} to an organization with the name "my-finetuned-bert" and have a local clone in the # *my-finetuned-bert* folder. {object}.push_to_hub("my-finetuned-bert", organization="huggingface") # Make a change to an existing repo that has been cloned locally in *my-finetuned-bert*. {object}.push_to_hub("my-finetuned-bert", repo_url="https://huggingface.co/sgugger/my-finetuned-bert") `
deepke.name_entity_re.multimodal.models.clip.image_utils module¶
- class deepke.name_entity_re.multimodal.models.clip.image_utils.ImageFeatureExtractionMixin[source]¶
Bases:
object
Mixin that contain utilities for preparing image features.
- to_pil_image(image, rescale=None)[source]¶
Converts
image
to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.- Parameters
image (
PIL.Image.Image
ornumpy.ndarray
ortorch.Tensor
) – The image to convert to the PIL Image format.rescale (
bool
, optional) – Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default toTrue
if the image type is a floating type,False
otherwise.
- to_numpy_array(image, rescale=None, channel_first=True)[source]¶
Converts
image
to a numpy array. Optionally rescales it and puts the channel dimension as the first dimension.- Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) – The image to convert to a NumPy array.rescale (
bool
, optional) – Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default toTrue
if the image is a PIL Image or an array/tensor of integers,False
otherwise.channel_first (
bool
, optional, defaults toTrue
) – Whether or not to permute the dimensions of the image to put the channel dimension first.
- normalize(image, mean, std)[source]¶
Normalizes
image
withmean
andstd
. Note that this will trigger a conversion ofimage
to a NumPy array if it’s a PIL Image.- Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) – The image to normalize.mean (
List[float]
ornp.ndarray
ortorch.Tensor
) – The mean (per channel) to use for normalization.std (
List[float]
ornp.ndarray
ortorch.Tensor
) – The standard deviation (per channel) to use for normalization.
- resize(image, size, resample=Resampling.BILINEAR)[source]¶
Resizes
image
. Note that this will trigger a conversion ofimage
to a PIL Image.
- center_crop(image, size)[source]¶
Crops
image
to the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked).- Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) – The image to resize.size (
int
orTuple[int, int]
) – The size to which crop the image.
deepke.name_entity_re.multimodal.models.clip.modeling_clip module¶
PyTorch CLIP model.
- deepke.name_entity_re.multimodal.models.clip.modeling_clip.contrastive_loss(logits: torch.Tensor) torch.Tensor [source]¶
- deepke.name_entity_re.multimodal.models.clip.modeling_clip.clip_loss(similarity: torch.Tensor) torch.Tensor [source]¶
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPBaseModelOutput[source]¶
Bases:
transformers.file_utils.ModelOutput
- attentions: Optional[Tuple[torch.FloatTensor]] = None¶
- qks: Optional[Tuple[torch.FloatTensor]] = None¶
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPBaseModelOutputWithPooling[source]¶
Bases:
transformers.file_utils.ModelOutput
- pooler_output: torch.FloatTensor = None¶
- attentions: Optional[Tuple[torch.FloatTensor]] = None¶
- qks: Optional[Tuple[torch.FloatTensor]] = None¶
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPOutput[source]¶
Bases:
transformers.file_utils.ModelOutput
- Parameters
loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenreturn_loss
isTrue
) – Contrastive loss for image-text similarity.logits_per_image – (
torch.FloatTensor
of shape(image_batch_size, text_batch_size)
): The scaled dot product scores betweenimage_embeds
andtext_embeds
. This represents the image-text similarity scores.logits_per_text – (
torch.FloatTensor
of shape(text_batch_size, image_batch_size)
): The scaled dot product scores betweentext_embeds
andimage_embeds
. This represents the text-image similarity scores.text_embeds (
torch.FloatTensor
of shape(batch_size, output_dim
) – The text embeddings obtained by applying the projection layer to the pooled output ofCLIPTextModel
.image_embeds (
torch.FloatTensor
of shape(batch_size, output_dim
) – The image embeddings obtained by applying the projection layer to the pooled output ofCLIPVisionModel
.text_model_output (
BaseModelOutputWithPooling
) – The output of theCLIPTextModel
.vision_model_output (
BaseModelOutputWithPooling
) – The output of theCLIPVisionModel
.
- loss: Optional[torch.FloatTensor] = None¶
- logits_per_image: torch.FloatTensor = None¶
- logits_per_text: torch.FloatTensor = None¶
- text_embeds: torch.FloatTensor = None¶
- image_embeds: torch.FloatTensor = None¶
- text_model_output: transformers.modeling_outputs.BaseModelOutputWithPooling = None¶
- vision_model_output: transformers.modeling_outputs.BaseModelOutputWithPooling = None¶
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPVisionEmbeddings(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPVisionConfig)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(pixel_values, aux_embeddings=None, rcnn_embeddings=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPTextEmbeddings(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPTextConfig)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(input_ids=None, position_ids=None, inputs_embeds=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPAttention(config)[source]¶
Bases:
torch.nn.modules.module.Module
Multi-headed attention from ‘Attention Is All You Need’ paper
- forward(hidden_states: torch.Tensor, attention_mask: Optional[torch.Tensor] = None, causal_attention_mask: Optional[torch.Tensor] = None, output_attentions: bool = False, output_qks: bool = False) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]] [source]¶
Input shape: Batch x Time x Channel
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPMLP(config)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(hidden_states)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPEncoderLayer(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPConfig)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(hidden_states: torch.Tensor, attention_mask: torch.Tensor, causal_attention_mask: torch.Tensor, output_attentions: bool = False, output_qks: bool = False)[source]¶
- Parameters
hidden_states (
torch.FloatTensor
) – input to the layer of shape(seq_len, batch, embed_dim)
attention_mask (
torch.FloatTensor
) – attention mask of size(batch, 1, tgt_len, src_len)
where padding elements are indicated by very large negative values.layer_head_mask (
torch.FloatTensor
) – mask for attention heads in a given layer of size(config.encoder_attention_heads,)
.output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPPreTrainedModel(config: transformers.configuration_utils.PretrainedConfig, *inputs, **kwargs)[source]¶
Bases:
transformers.modeling_utils.PreTrainedModel
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.
- config_class¶
alias of
deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPConfig
- base_model_prefix = 'clip'¶
- supports_gradient_checkpointing = True¶
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPEncoder(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPConfig)[source]¶
Bases:
torch.nn.modules.module.Module
Transformer encoder consisting of
config.num_hidden_layers
self attention layers. Each layer is aCLIPEncoderLayer
.- Parameters
config – CLIPConfig
embed_tokens (nn.Embedding) – output embedding
- forward(inputs_embeds, attention_mask=None, causal_attention_mask=None, output_attentions=None, output_hidden_states=None, return_dict=None, output_qks=False)[source]¶
- Parameters
inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_ids
indices into associated vectors than the model’s internal embedding lookup matrix.attention_mask (
torch.Tensor
of shape(batch_size, sequence_length)
, optional) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
causal_attention_mask (
torch.Tensor
of shape(batch_size, sequence_length)
, optional) –Causal mask for the text model. Mask values selected in
[0, 1]
:1 for tokens that are not masked,
0 for tokens that are masked.
output_attentions (
bool
, optional) – Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail.output_hidden_states (
bool
, optional) – Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail.return_dict (
bool
, optional) – Whether or not to return aModelOutput
instead of a plain tuple.
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPTextTransformer(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPTextConfig)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(input_ids=None, attention_mask=None, position_ids=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
- Returns
A
BaseModelOutputWithPooling
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (~transformers.
) and inputs.last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the model.pooler_output (
torch.FloatTensor
of shape(batch_size, hidden_size)
) – Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
BaseModelOutputWithPooling
ortuple(torch.FloatTensor)
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPTextModel(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPTextConfig)[source]¶
Bases:
deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPPreTrainedModel
- config_class¶
alias of
deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPTextConfig
- get_input_embeddings() torch.nn.modules.module.Module [source]¶
Returns the model’s input embeddings.
- Returns
A torch module mapping vocabulary to hidden states.
- Return type
nn.Module
- set_input_embeddings(value)[source]¶
Set model’s input embeddings.
- Parameters
value (
nn.Module
) – A module mapping vocabulary to hidden states.
- forward(input_ids=None, attention_mask=None, position_ids=None, output_attentions=None, output_hidden_states=None, return_dict=None, output_qks=False)[source]¶
- Returns
A
BaseModelOutputWithPooling
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (~transformers.
) and inputs.last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the model.pooler_output (
torch.FloatTensor
of shape(batch_size, hidden_size)
) – Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Examples:
>>> from transformers import CLIPTokenizer, CLIPTextModel >>> model = CLIPTextModel.from_pretrained("openai/clip-vit-base-patch32") >>> tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32") >>> inputs = tokenizer(["a photo of a cat", "a photo of a dog"], padding=True, return_tensors="pt") >>> outputs = model(**inputs) >>> last_hidden_state = outputs.last_hidden_state >>> pooled_output = outputs.pooled_output # pooled (EOS token) states
- Return type
BaseModelOutputWithPooling
ortuple(torch.FloatTensor)
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPVisionTransformer(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPVisionConfig)[source]¶
Bases:
torch.nn.modules.module.Module
- forward(pixel_values=None, output_attentions=None, output_hidden_states=None, return_dict=None, aux_embeddings=None, rcnn_embeddings=None, output_qks=False)[source]¶
- Returns
A
BaseModelOutputWithPooling
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (~transformers.
) and inputs.last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the model.pooler_output (
torch.FloatTensor
of shape(batch_size, hidden_size)
) – Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
- Return type
BaseModelOutputWithPooling
ortuple(torch.FloatTensor)
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPVisionModel(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPVisionConfig)[source]¶
Bases:
deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPPreTrainedModel
- config_class¶
alias of
deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPVisionConfig
- get_input_embeddings() torch.nn.modules.module.Module [source]¶
Returns the model’s input embeddings.
- Returns
A torch module mapping vocabulary to hidden states.
- Return type
nn.Module
- forward(pixel_values=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
- Returns
A
BaseModelOutputWithPooling
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (~transformers.
) and inputs.last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) – Sequence of hidden-states at the output of the last layer of the model.pooler_output (
torch.FloatTensor
of shape(batch_size, hidden_size)
) – Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Examples:
>>> from PIL import Image >>> import requests >>> from transformers import CLIPProcessor, CLIPVisionModel >>> model = CLIPVisionModel.from_pretrained("openai/clip-vit-base-patch32") >>> processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> inputs = processor(images=image, return_tensors="pt") >>> outputs = model(**inputs) >>> last_hidden_state = outputs.last_hidden_state >>> pooled_output = outputs.pooled_output # pooled CLS states
- Return type
BaseModelOutputWithPooling
ortuple(torch.FloatTensor)
- class deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPModel(config: deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPConfig)[source]¶
Bases:
deepke.name_entity_re.multimodal.models.clip.modeling_clip.CLIPPreTrainedModel
This model is a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
- Parameters
config (
CLIPConfig
) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out thefrom_pretrained()
method to load the model weights.
- config_class¶
alias of
deepke.name_entity_re.multimodal.models.clip.configuration_clip.CLIPConfig
- get_text_features(input_ids=None, attention_mask=None, position_ids=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
- Returns
The text embeddings obtained by applying the projection layer to the pooled output of
CLIPTextModel
.- Return type
text_features (
torch.FloatTensor
of shape(batch_size, output_dim
)
Examples:
>>> from transformers import CLIPTokenizer, CLIPModel >>> model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") >>> tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32") >>> inputs = tokenizer(["a photo of a cat", "a photo of a dog"], padding=True, return_tensors="pt") >>> text_features = model.get_text_features(**inputs)
- get_image_features(pixel_values=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
- Returns
The image embeddings obtained by applying the projection layer to the pooled output of
CLIPVisionModel
.- Return type
image_features (
torch.FloatTensor
of shape(batch_size, output_dim
)
Examples:
>>> from PIL import Image >>> import requests >>> from transformers import CLIPProcessor, CLIPModel >>> model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") >>> processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> inputs = processor(images=image, return_tensors="pt") >>> image_features = model.get_image_features(**inputs)
- forward(input_ids=None, pixel_values=None, attention_mask=None, position_ids=None, return_loss=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]¶
- Returns
A
CLIPOutput
(ifreturn_dict=True
is passed or whenconfig.return_dict=True
) or a tuple oftorch.FloatTensor
comprising various elements depending on the configuration (~transformers.
) and inputs.loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenreturn_loss
isTrue
) – Contrastive loss for image-text similarity.logits_per_image:(:obj:`torch.FloatTensor` of shape
(image_batch_size, text_batch_size)
) – The scaled dot product scores betweenimage_embeds
andtext_embeds
. This represents the image-text similarity scores.logits_per_text:(:obj:`torch.FloatTensor` of shape
(text_batch_size, image_batch_size)
) – The scaled dot product scores betweentext_embeds
andimage_embeds
. This represents the text-image similarity scores.text_embeds(:obj:`torch.FloatTensor` of shape
(batch_size, output_dim
) – The text embeddings obtained by applying the projection layer to the pooled output ofCLIPTextModel
.image_embeds(:obj:`torch.FloatTensor` of shape
(batch_size, output_dim
) – The image embeddings obtained by applying the projection layer to the pooled output ofCLIPVisionModel
.text_model_output(:obj:`BaseModelOutputWithPooling`): The output of the
CLIPTextModel
.vision_model_output(:obj:`BaseModelOutputWithPooling`): The output of the
CLIPVisionModel
.
Examples:
>>> from PIL import Image >>> import requests >>> from transformers import CLIPProcessor, CLIPModel >>> model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") >>> processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True) >>> outputs = model(**inputs) >>> logits_per_image = outputs.logits_per_image # this is the image-text similarity score >>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
- Return type
CLIPOutput
ortuple(torch.FloatTensor)
deepke.name_entity_re.multimodal.models.clip.processing_clip module¶
Image/Text processor class for CLIP
- class deepke.name_entity_re.multimodal.models.clip.processing_clip.CLIPProcessor(feature_extractor, tokenizer)[source]¶
Bases:
object
Constructs a CLIP processor which wraps a CLIP feature extractor and a CLIP tokenizer into a single processor.
[CLIPProcessor] offers all the functionalities of [CLIPFeatureExtractor] and [CLIPTokenizer]. See the [~CLIPProcessor.__call__] and [~CLIPProcessor.decode] for more information.
- Parameters
feature_extractor ([CLIPFeatureExtractor]) – The feature extractor is a required input.
tokenizer ([CLIPTokenizer]) – The tokenizer is a required input.
- save_pretrained(save_directory)[source]¶
Save a CLIP feature extractor object and CLIP tokenizer object to the directory save_directory, so that it can be re-loaded using the [~CLIPProcessor.from_pretrained] class method.
<Tip>
This class method is simply calling [~PreTrainedFeatureExtractor.save_pretrained] and [~tokenization_utils_base.PreTrainedTokenizer.save_pretrained]. Please refer to the docstrings of the methods above for more information.
</Tip>
- Parameters
save_directory (str or os.PathLike) – Directory where the feature extractor JSON file and the tokenizer files will be saved (directory will be created if it does not exist).
- classmethod from_pretrained(pretrained_model_name_or_path, **kwargs)[source]¶
Instantiate a [CLIPProcessor] from a pretrained CLIP processor.
<Tip>
This class method is simply calling CLIPFeatureExtractor’s [~PreTrainedFeatureExtractor.from_pretrained] and CLIPTokenizer’s [~tokenization_utils_base.PreTrainedTokenizer.from_pretrained]. Please refer to the docstrings of the methods above for more information.
</Tip>
- Parameters
pretrained_model_name_or_path (str or os.PathLike) –
This can be either:
a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like clip-vit-base-patch32, or namespaced under a user or organization name, like openai/clip-vit-base-patch32.
a path to a directory containing a feature extractor file saved using the [~PreTrainedFeatureExtractor.save_pretrained] method, e.g., ./my_model_directory/.
a path or url to a saved feature extractor JSON file, e.g., ./my_model_directory/preprocessor_config.json.
**kwargs – Additional keyword arguments passed along to both [PreTrainedFeatureExtractor] and [PreTrainedTokenizer]
deepke.name_entity_re.multimodal.models.clip.tokenization_clip module¶
Tokenization classes for CLIP.
- deepke.name_entity_re.multimodal.models.clip.tokenization_clip.bytes_to_unicode()[source]¶
Returns list of utf-8 byte and a mapping to unicode strings. We specifically avoids mapping to whitespace/control characters the bpe code barfs on.
The reversible bpe codes work on unicode strings. This means you need a large # of unicode characters in your vocab if you want to avoid UNKs. When you’re at something like a 10B token dataset you end up needing around 5K for decent coverage. This is a significant percentage of your normal, say, 32K bpe vocab. To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
- deepke.name_entity_re.multimodal.models.clip.tokenization_clip.get_pairs(word)[source]¶
Return set of symbol pairs in a word.
Word is represented as tuple of symbols (symbols being variable-length strings).
- class deepke.name_entity_re.multimodal.models.clip.tokenization_clip.CLIPTokenizer(vocab_file, merges_file, errors='replace', unk_token='<|endoftext|>', bos_token='<|startoftext|>', eos_token='<|endoftext|>', pad_token='<|endoftext|>', add_prefix_space=False, do_lower_case=True, **kwargs)[source]¶
Bases:
transformers.tokenization_utils.PreTrainedTokenizer
Construct a CLIP tokenizer. Based on byte-level Byte-Pair-Encoding.
This tokenizer has been trained to treat spaces like parts of the tokens (a bit like sentencepiece) so a word will be encoded differently whether it is at the beginning of the sentence (without space) or not:
You can get around that behavior by passing add_prefix_space=True when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance.
<Tip>
When used with is_split_into_words=True, this tokenizer will add a space before each word (even the first one).
</Tip>
This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.
- Parameters
vocab_file (str) – Path to the vocabulary file.
merges_file (str) – Path to the merges file.
errors (str, optional, defaults to “replace”) – Paradigm to follow when decoding bytes to UTF-8. See [bytes.decode](https://docs.python.org/3/library/stdtypes.html#bytes.decode) for more information.
unk_token (str, optional, defaults to <|endoftext|>) – The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.
bos_token (str, optional, defaults to <|endoftext|>) – The beginning of sequence token.
eos_token (str, optional, defaults to <|endoftext|>) – The end of sequence token.
add_prefix_space (bool, optional, defaults to False) – Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. (CLIP tokenizer detect beginning of words by the preceding space).
- pretrained_vocab_files_map: Dict[str, Dict[str, str]] = {'merges_file': {'openai/clip-vit-base-patch32': 'https://huggingface.co/openai/clip-vit-base-patch32/resolve/main/merges.txt'}, 'vocab_file': {'openai/clip-vit-base-patch32': 'https://huggingface.co/openai/clip-vit-base-patch32/resolve/main/vocab.json'}}¶
- property pad_token_id: Optional[int]¶
Id of the padding token in the vocabulary. Returns None if the token has not been set.
- Type
Optional[int]
- get_vocab()[source]¶
Returns the vocabulary as a dictionary of token to index.
tokenizer.get_vocab()[token]
is equivalent totokenizer.convert_tokens_to_ids(token)
whentoken
is in the vocab.- Returns
The vocabulary.
- Return type
Dict[str, int]
- build_inputs_with_special_tokens(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) List[int] [source]¶
Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A CLIP sequence has the following format:
single sequence: <|startoftext|> X <|endoftext|>
Pairs of sequences are not the expected use case, but they will be handled without a separator.
- Parameters
token_ids_0 (List[int]) – List of IDs to which the special tokens will be added.
token_ids_1 (List[int], optional) – Optional second list of IDs for sequence pairs.
- Returns
List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
- Return type
List[int]
- get_special_tokens_mask(token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False) List[int] [source]¶
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.
- Parameters
token_ids_0 (List[int]) – List of IDs.
token_ids_1 (List[int], optional) – Optional second list of IDs for sequence pairs.
already_has_special_tokens (bool, optional, defaults to False) – Whether or not the token list is already formatted with special tokens for the model.
- Returns
A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
- Return type
List[int]
- convert_tokens_to_string(tokens)[source]¶
Converts a sequence of tokens (string) in a single string.
- save_vocabulary(save_directory: str, filename_prefix: Optional[str] = None) Tuple[str] [source]¶
Save only the vocabulary of the tokenizer (vocabulary + added tokens).
This method won’t save the configuration and special token mappings of the tokenizer. Use
_save_pretrained()
to save the whole state of the tokenizer.
- prepare_for_tokenization(text, is_split_into_words=False, **kwargs)[source]¶
Performs any necessary transformations before tokenization.
This method should pop the arguments from kwargs and return the remaining
kwargs
as well. We test thekwargs
at the end of the encoding process to be sure all the arguments have been used.