355 Commits

Author SHA1 Message Date
Quentin Pradet
44ead02b05
Fix lint (#798) 2025-06-05 15:52:19 +04:00
Quentin Pradet
9e8f164677
Release 9.0.1 2025-04-30 17:25:32 +04:00
Quentin Pradet
3c3ffd7403
Forbid Elasticsearch 8 client or server (#780) 2025-04-30 16:25:33 +04:00
Mark J. Hoy
51a2b9cc19
Add 9.1.0 Snapshot to Build and Fix test_ml_model Tests to Normalized Expected Scores if Min Score is Less Than Zero (#777)
* normalized expected scores if min is < 0

* only normalize scores for ES after 8.19+ / 9.1+

* add 9.1.0 snapshot to build matrix

* get min score from booster trees

* removing typing on function definition

* properly flatten our tree leaf scores

* simplify getting min score

* debugging messages

* get all the matches in better way

* Fix model score normalization.

* lint

* lint again

* lint; correct return for bounds map/list

* revert to Aurelian's fix

* re-lint :/

---------

Co-authored-by: Aurelien FOUCRET <aurelien.foucret@elastic.co>
2025-04-23 15:53:32 +00:00
David Kyle
a9c36927f6
Fix tokeniser for DeBERTa models (#769) 2025-04-23 09:10:02 +01:00
Quentin Pradet
87380ef716
Release 9.0.0
Co-authored-by: Miguel Grinberg <miguel.grinberg@gmail.com>
2025-04-16 15:21:04 +04:00
Quentin Pradet
9ca76d7888
Revert "Release 8.18.0" (#774)
This reverts commit ced3cdfe32bd04e3d127b18f66f9b143b2956564.
2025-04-16 14:53:51 +04:00
Quentin Pradet
ced3cdfe32
Release 8.18.0 2025-04-15 20:52:30 +04:00
David Kyle
ee4d701aa4
Upgrade transformers to 4.47 (#752)
The upgrade fixes a crash tracing the baai/bge-m3 model
2025-02-12 17:30:45 +00:00
Bart Broere
75c57b0775
Support Pandas 2 (#742)
* Fix test setup to match pandas 2.0 demands

* Use the now deprecated _append method

(Better solution might exist)

* Deal with numeric_only being removed in metrics test

* Skip mad metric for other pandas versions

* Account for differences between pandas versions in describe methods

* Run black

* Check Pandas version first

* Mirror behaviour of installed Pandas version when running value_counts

* Allow passing arguments to the individual asserters

* Fix for method _construct_axes_from_arguments no longer existing

* Skip mad metric if it does not exist

* Account for pandas 2.0 timestamp default behaviour

* Deal with empty vs other inferred data types

* Account for default datetime precision change

* Run Black

* Solution for differences in inferred_type only

* Fix csv and json issues

* Skip two doctests

* Passing a set as indexer is no longer allowed

* Don't validate output where it differs between Pandas versions in the environment

* Update test matrix and packaging metadata

* Update version of Python in the docs

* Update Python version in demo notebook

* Match noxfile

* Symmetry

* Fix trailing comma in JSON

* Revert some changes in setup.py to fix building the documentation

* Revert "Revert some changes in setup.py to fix building the documentation"

This reverts commit ea9879753129d8d8390b3cbbce57155a8b4fb346.

* Use PANDAS_VERSION from eland.common

* Still skip the doctest, but make the output pandas 2 instead of 1

* Still skip doctest, but switch to pandas 2 output

* Prepare for pandas 3

* Reference the right column

* Ignore output in tests but switch to pandas 2 output

* Add line comment about NBVAL_IGNORE_OUTPUT

* Restore missing line and add stderr cell

* Use non-private method instead

* Fix indentation and parameter issues

* If index is not specified, and pandas 1 is present, set it to True

From pandas 2 and upwards, index is set to None by default

* Run black

* Newer version of black might have different opinions?

* Add line comment

* Remove unused import

* Add reason for ignore statement

* Add reason for skip

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2025-02-04 17:43:43 +04:00
Valeriy Khakhutskyy
77589b26b8
Remove ML model export as sklearn Pipeline and clean up code (#744)
* Revert "[ML] Export ML model as sklearn Pipeline (#509)"

This reverts commit 0576114a1d886eafabca3191743a9bea9dc20b1a.

* Keep useful changes

* formatting

* Remove obsolete test matrix configuration and update version references in documentation and Noxfile

* formatting

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2025-02-04 11:36:50 +04:00
Bart Broere
9b5badb941
Drop Python 3.8 support and introduce Python 3.12 CI/CD (#743) 2025-01-22 21:55:57 +04:00
Quentin Pradet
7774a506ae
Release 8.17.0 2025-01-07 10:58:59 +04:00
Dai Sugimori
82492fe771
Expansion support (#740) 2024-11-23 00:20:58 +09:00
Quentin Pradet
04102f2a4e
Release 8.16.0 2024-11-14 09:07:39 +04:00
Valeriy Khakhutskyy
9aec8fc751
Add deprecation warning for ESGradientBoostingModel subclasses (#738)
Introduce a warning indicating that exporting data frame analytics models as ESGradientBoostingModel subclasses is deprecated and will be removed in version 9.0.0.

The implementation of ESGradientBoostingModel relies on importing undocumented private classes that were changed in 1.4 to https://github.com/scikit-learn/scikit-learn/pull/26278. This dependency makes the code difficult to maintain, while the functionality is not widely used by users. Therefore, we will deprecate this functionality in 8.16 and remove it completely in 9.0.0. 

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-11-11 14:26:11 +01:00
Quentin Pradet
79d9a6ae29
Release 8.15.4 2024-10-18 10:52:52 +04:00
Quentin Pradet
2916b51fa7
Release 8.15.3 2024-10-09 16:16:52 +04:00
Max Hniebergall
06b65e211e
Add support for DeBERTa-V2 tokenizer (#717) 2024-10-03 14:04:19 -04:00
Quentin Pradet
a45c7bc357
Release 8.15.2 2024-10-02 13:54:03 +04:00
Quentin Pradet
a83ce20fcc
Release 8.15.1 2024-10-01 15:31:24 +04:00
David Kyle
5253501704
Upgrade PyTorch to version 2.3.1 (#718)
Upgrades the PyTorch, transformers and sentence transformer requirements.
Elasticsearch has upgraded to PyTorch to 2.3.1 in 8.16 and 8.15.2. For 
compatibility reasons Eland will refuse to upload to an Elasticsearch cluster 
that has is using an earlier version of PyTorch.
2024-09-30 10:22:02 +01:00
Miguel Grinberg
0ce3db26e8
Release 8.15.0 (#715)
* Release 8.15.0

* update release notes
2024-08-13 09:47:48 +01:00
David Kyle
5a76f826df
Add note about using text_similarity for rerank to the CLI (#716) 2024-08-12 14:40:12 +01:00
David Kyle
fd8886da6a
Default truncation to second for text similarity the task type(#713)
In reranking the first input (the query) is generally shorter. In this case
it makes more sense to truncate the second input (the document text)
2024-08-05 11:47:15 +01:00
Aurélien FOUCRET
bee6d0e1f7
Remove input fields from exported LTR models (#708) 2024-07-05 14:31:22 +02:00
Bart Broere
f18aa35e8e
Deal with the possibility of lists (#707) 2024-06-28 22:25:47 +04:00
Quentin Pradet
0ddc21b895
Release 8.14.0 2024-06-10 15:56:43 +04:00
Bart Broere
1014ecdb39
Fix non _source fields missing from the result hits (#693) 2024-06-10 11:09:52 +04:00
David Kyle
632074c0f0
Make eland_import_hub_model script compatible with serverless (#698)
Checks for build_flavor == serverless rather than a version
2024-06-07 14:46:12 +01:00
Bart Broere
35a96ab3f0
Fix missing method str.removeprefix in Python 3.8 (#695) 2024-05-24 10:25:04 +04:00
Ashok Kumar
5b728c29c1
Replace check for Elasticsearch to str/list in ensure_es_client (#690) 2024-05-04 09:01:31 +04:00
Quentin Pradet
e76b32eee2
Release 8.13.1 2024-05-03 09:20:45 +04:00
Quentin Pradet
fd38e26df1
Support HTTP proxies in eland_import_hub_model (#688)
* Document TLS/SSL options for import script

* Mention --help option

* Add HTTP proxy support

* Mention HTTP_PROXY too

---------

Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-05-02 21:03:44 +04:00
Quentin Pradet
1921792df8
Release 8.13.0 2024-03-27 18:18:21 +04:00
David Kyle
ae0bba34c6
Upgrade torch to 2.1.2 (#671)
Compatible with Elasticsearch 8.13 where the same upgrade has been made
2024-03-26 10:06:50 +00:00
David Kyle
5d34dc3cc4
Add override option to specify the model's max input size(#674)
If the max input size cannot be found in the configuration the user
can specify it as a parameter to the eland_import_hub_model script
2024-03-20 10:02:43 +00:00
Bart Broere
9b335315bb
Mirror pandas' to_csv lineterminator instead of line_terminator (#595)
* Mirror pandas' to_csv lineterminator instead of line_terminator

(even though it looks a little weird perhaps)

* Remove squeeze argument

* Revert "Merge branch 'remove-squeeze-argument' into patch-2"

This reverts commit 8b9ab5647e244d78ec3471b80ee7c42e019cf347.

* Don't remove the parameter yet since people might use it

* Add pending deprecation warning

---------

Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-02-23 14:23:58 +04:00
Bart Broere
33cf029efe
Implement eland.DataFrame.to_json (#661)
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-02-15 11:32:54 +04:00
Aurélien FOUCRET
9d492b03aa
Release 8.12.1
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-02-01 10:50:18 +04:00
Quentin Pradet
02190e74e7
Switch to 2024 black style (#657) 2024-01-31 14:47:19 +04:00
Aurélien FOUCRET
2a6a4b1f06
Fix missing value support for XGBRanker. (#654)
* Fix missing value support for XGBRanker.

* lint

* Sort expected scores

* lint
2024-01-23 18:42:24 +01:00
Quentin Pradet
1190364abb
Release 8.12.0 2024-01-19 12:42:45 +04:00
David Kyle
64216d44fb
Add prefix_string config option to the import model hub script (#642) 2024-01-19 12:06:57 +04:00
Aurélien FOUCRET
5169cc926a
Improve LTR (#651)
* Ensure the feature logger is using NaN for non matching query feature extractors (consistent with ES).

* Default score is None instead of 0.

* LTR model import API improvements.

* Fix feature logger tests.

* Fix export in eland.ml.ltr

* Apply suggestions from code review

Co-authored-by: Adam Demjen <demjened@gmail.com>

* Fix supported models for LTR

---------

Co-authored-by: Adam Demjen <demjened@gmail.com>
2024-01-17 13:01:47 +04:00
Aurélien FOUCRET
d3ed669a5e
LTR feature logger (#648) 2024-01-12 13:52:04 +01:00
Adam Demjen
926f0b9b5c
Add XGBRanker and transformer (#649)
* Add XGBRanker and transformer

* Map XGBoostRegressorTransformer to XGBRanker

* Add unit tests

* Remove unused import

* Revert addition of type

* Update function comment

* Distinguish objective based on model class
2024-01-11 15:48:13 -05:00
Adam Demjen
840871f9d9
Accept LTR inference config when creating model (#645)
* Support for supplying inference_config

* Fix linting errors

* Add unit test

* Add LTR type, throw exception on predict, refine test

* Add search step to LTR test

* Fix linter errors

* Update rescoring assertion in test + type defs

* Fix linting error

* Remove failing assertion
2024-01-08 09:19:03 -05:00
David Kyle
6ef418f465
Release 8.11.1 2023-11-22 11:55:53 +01:00
David Kyle
081250cdec
Fix failed import of ST RoBERTa models (#637)
Fixes an error uploading the sentence-transformers/all-distilroberta-v1 model
which failed with "missing 2 required positional arguments: 'token_type_ids' 
and 'position_ids'". The cause was that the tokenizer type was not recognised 
due to a typo
2023-11-21 12:53:43 +00:00