This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference.
This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).
Closes#503
Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.
* Fix bugs with field mapping:
1. If no permission to call _mapping, return readable error
2. If index is wildcard, fix issues with user warnings
* Fixing lint issues
* Removing trailing backslashes in doc
* Remove pandas/matplotlib deprecation warning
This warning is due to a conflict between
pandas/matplotlib that may be resolved in a later
version. For now, ignore the warning so CI works.