This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference.
This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).
We were attempting to load SentenceTransformers by looking at the model
prefix, however SentenceTransformers can also be loaded from other
orgs in the model hub, as well as from local disk. This prefix checking
failed in those two cases. To simplify the loading logic and deciding
which wrapper to use, we’ve removed support for text_embedding tasks to
load a plain Transformer. We now only support DPR embedding models and
SentenceTransformer embedding models. If you try to load a plain
Transformer model, it will be loaded by SentenceTransformers and a mean
pooling layer will automatically be added by the SentenceTransformer
library. Since we no longer automatically support non-DPR and
non-SentenceTransformers, we should include somewhere example code for
how to load a custom model without DPR or SentenceTransformers.
See: https://github.com/UKPLab/sentence-transformers/blob/v2.2.2/sentence_transformers/SentenceTransformer.py#L801Resolves#531
I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility.
Fixes#555
For migration from scripts to console_scripts in setup.py,
the current long if __name__ == "__main__": section is a
blocker because the console_scripts requires to specify a
function as an entrypoint.
Move the logic into a main() function.
The eland_import_hub_model script supports uploading a local file where
the --hub-model-id argument is a file path. If the --es-model-id option is
not used the model Id is generated from the hub model id and when that
is a file path the path must be converted to a valid elasticsearch model id.
Closes#503
Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.
Adds text_similarity task support. This is a cross-encoder transformer task where both sequences are given to the transformer at once.
According to 🤗 (or at least how the cross-encoder models are concerned) this is a sequence classification task with just one classification "label". But really, it isn't labeled at all and is more akin to a regression model.
related: elastic/elasticsearch#88439
Elasticsearch uses v1.11 of PyTorch. Models created with the latest PyTorch
release (v1.12) are not compatible with v1.11. This pins the PyTorch version
to 1.11 to prevent the incompatibility. The version of the Elasticsearch Python
client is now required to be >= Eland.
All users of Eland for importing NLP models should upgrade.
For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture.
This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.
This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree.
The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results.
This means that our "probabilities" returned will be different than sklearn.
This improves the user consumed functions and classes for PyTorch NLP model upload to Elasticsearch.
Previously it was difficult to wrap your own module for uploading to Elasticsearch.
This commit splits some classes out, adds new ones, and adds tests showing how to wrap some simple modules.