This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference.
This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).
Co-authored-by: David Olaru <dolaru@elastic.co>
* Reduce Docker image size from 4.8GB to 2.2GB
* Use torch+cpu variant if target platform is linux/amd64
Avoids downloading large & unnecessary NVIDIA deps defined in the package on PyPI
* Build linux/arm64 image using buildx and QEMU
* Recommend using pre-built Docker image
* Update README.md
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
We were attempting to load SentenceTransformers by looking at the model
prefix, however SentenceTransformers can also be loaded from other
orgs in the model hub, as well as from local disk. This prefix checking
failed in those two cases. To simplify the loading logic and deciding
which wrapper to use, we’ve removed support for text_embedding tasks to
load a plain Transformer. We now only support DPR embedding models and
SentenceTransformer embedding models. If you try to load a plain
Transformer model, it will be loaded by SentenceTransformers and a mean
pooling layer will automatically be added by the SentenceTransformer
library. Since we no longer automatically support non-DPR and
non-SentenceTransformers, we should include somewhere example code for
how to load a custom model without DPR or SentenceTransformers.
See: https://github.com/UKPLab/sentence-transformers/blob/v2.2.2/sentence_transformers/SentenceTransformer.py#L801Resolves#531
I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility.
Fixes#555
For migration from scripts to console_scripts in setup.py,
the current long if __name__ == "__main__": section is a
blocker because the console_scripts requires to specify a
function as an entrypoint.
Move the logic into a main() function.