* normalized expected scores if min is < 0
* only normalize scores for ES after 8.19+ / 9.1+
* add 9.1.0 snapshot to build matrix
* get min score from booster trees
* removing typing on function definition
* properly flatten our tree leaf scores
* simplify getting min score
* debugging messages
* get all the matches in better way
* Fix model score normalization.
* lint
* lint again
* lint; correct return for bounds map/list
* revert to Aurelian's fix
* re-lint :/
---------
Co-authored-by: Aurelien FOUCRET <aurelien.foucret@elastic.co>
* Fix test setup to match pandas 2.0 demands
* Use the now deprecated _append method
(Better solution might exist)
* Deal with numeric_only being removed in metrics test
* Skip mad metric for other pandas versions
* Account for differences between pandas versions in describe methods
* Run black
* Check Pandas version first
* Mirror behaviour of installed Pandas version when running value_counts
* Allow passing arguments to the individual asserters
* Fix for method _construct_axes_from_arguments no longer existing
* Skip mad metric if it does not exist
* Account for pandas 2.0 timestamp default behaviour
* Deal with empty vs other inferred data types
* Account for default datetime precision change
* Run Black
* Solution for differences in inferred_type only
* Fix csv and json issues
* Skip two doctests
* Passing a set as indexer is no longer allowed
* Don't validate output where it differs between Pandas versions in the environment
* Update test matrix and packaging metadata
* Update version of Python in the docs
* Update Python version in demo notebook
* Match noxfile
* Symmetry
* Fix trailing comma in JSON
* Revert some changes in setup.py to fix building the documentation
* Revert "Revert some changes in setup.py to fix building the documentation"
This reverts commit ea9879753129d8d8390b3cbbce57155a8b4fb346.
* Use PANDAS_VERSION from eland.common
* Still skip the doctest, but make the output pandas 2 instead of 1
* Still skip doctest, but switch to pandas 2 output
* Prepare for pandas 3
* Reference the right column
* Ignore output in tests but switch to pandas 2 output
* Add line comment about NBVAL_IGNORE_OUTPUT
* Restore missing line and add stderr cell
* Use non-private method instead
* Fix indentation and parameter issues
* If index is not specified, and pandas 1 is present, set it to True
From pandas 2 and upwards, index is set to None by default
* Run black
* Newer version of black might have different opinions?
* Add line comment
* Remove unused import
* Add reason for ignore statement
* Add reason for skip
---------
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
* Revert "[ML] Export ML model as sklearn Pipeline (#509)"
This reverts commit 0576114a1d886eafabca3191743a9bea9dc20b1a.
* Keep useful changes
* formatting
* Remove obsolete test matrix configuration and update version references in documentation and Noxfile
* formatting
---------
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
Upgrades the PyTorch, transformers and sentence transformer requirements.
Elasticsearch has upgraded to PyTorch to 2.3.1 in 8.16 and 8.15.2. For
compatibility reasons Eland will refuse to upload to an Elasticsearch cluster
that has is using an earlier version of PyTorch.
* Ensure the feature logger is using NaN for non matching query feature extractors (consistent with ES).
* Default score is None instead of 0.
* LTR model import API improvements.
* Fix feature logger tests.
* Fix export in eland.ml.ltr
* Apply suggestions from code review
Co-authored-by: Adam Demjen <demjened@gmail.com>
* Fix supported models for LTR
---------
Co-authored-by: Adam Demjen <demjened@gmail.com>
* Add XGBRanker and transformer
* Map XGBoostRegressorTransformer to XGBRanker
* Add unit tests
* Remove unused import
* Revert addition of type
* Update function comment
* Distinguish objective based on model class
* Support for supplying inference_config
* Fix linting errors
* Add unit test
* Add LTR type, throw exception on predict, refine test
* Add search step to LTR test
* Fix linter errors
* Update rescoring assertion in test + type defs
* Fix linting error
* Remove failing assertion
Fixes an error uploading the sentence-transformers/all-distilroberta-v1 model
which failed with "missing 2 required positional arguments: 'token_type_ids'
and 'position_ids'". The cause was that the tokenizer type was not recognised
due to a typo
This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference.
This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).
I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility.
Fixes#555
The eland_import_hub_model script supports uploading a local file where
the --hub-model-id argument is a file path. If the --es-model-id option is
not used the model Id is generated from the hub model id and when that
is a file path the path must be converted to a valid elasticsearch model id.
PyTorch models traced in version 1.13 of PyTorch cannot be evaluated in
version 1.9 or earlier. With this upgrade Eland becomes incompatible with
pre 8.7 Elasticsearch and will refuse to upload a model to the cluster.
In this scenario either upgrade Elasticsearch or use an earlier version of Eland.
Closes#503
Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.
For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture.
This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.
This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree.
The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results.
This means that our "probabilities" returned will be different than sklearn.
This improves the user consumed functions and classes for PyTorch NLP model upload to Elasticsearch.
Previously it was difficult to wrap your own module for uploading to Elasticsearch.
This commit splits some classes out, adds new ones, and adds tests showing how to wrap some simple modules.
This adds some more definite types for our NLP tasks and tokenization configurations.
This is the first step in allowing users to more easily import their own transformer models via something other than hugging face.