I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility.
Fixes#555
For migration from scripts to console_scripts in setup.py,
the current long if __name__ == "__main__": section is a
blocker because the console_scripts requires to specify a
function as an entrypoint.
Move the logic into a main() function.
The eland_import_hub_model script supports uploading a local file where
the --hub-model-id argument is a file path. If the --es-model-id option is
not used the model Id is generated from the hub model id and when that
is a file path the path must be converted to a valid elasticsearch model id.
PyTorch models traced in version 1.13 of PyTorch cannot be evaluated in
version 1.9 or earlier. With this upgrade Eland becomes incompatible with
pre 8.7 Elasticsearch and will refuse to upload a model to the cluster.
In this scenario either upgrade Elasticsearch or use an earlier version of Eland.
Shap is incompatible with NumPy 1.24 due to a deprecated usage becoming
an error. There is no fix in Shap yet so an earlier version of NumPy must
be used.
Pandas 2.0 was recently released we will continue to use the latest 1.5 release
to avoid any incompatibilities.
Users who follow the Eland README as a guide to importing
models can easily end up seeing inexplicably poor performance
due to unknowingly running the model with one allocation and
one thread per allocation.
This change spells out the effect of `--start` and links to
alternatives that allow better use of available hardware.
Co-authored-by: David Kyle <david.kyle@elastic.co>
Closes#503
Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.
Adds text_similarity task support. This is a cross-encoder transformer task where both sequences are given to the transformer at once.
According to 🤗 (or at least how the cross-encoder models are concerned) this is a sequence classification task with just one classification "label". But really, it isn't labeled at all and is more akin to a regression model.
related: elastic/elasticsearch#88439
Elasticsearch uses v1.11 of PyTorch. Models created with the latest PyTorch
release (v1.12) are not compatible with v1.11. This pins the PyTorch version
to 1.11 to prevent the incompatibility. The version of the Elasticsearch Python
client is now required to be >= Eland.
All users of Eland for importing NLP models should upgrade.
For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture.
This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.