eland

mirror of https://github.com/elastic/eland.git synced 2025-07-11 00:02:14 +08:00

Author	SHA1	Message	Date
David Kyle	5253501704	Upgrade PyTorch to version 2.3.1 (#718 ) Upgrades the PyTorch, transformers and sentence transformer requirements. Elasticsearch has upgraded to PyTorch to 2.3.1 in 8.16 and 8.15.2. For compatibility reasons Eland will refuse to upload to an Elasticsearch cluster that has is using an earlier version of PyTorch.	2024-09-30 10:22:02 +01:00
David Kyle	fd8886da6a	Default truncation to `second` for text similarity the task type(#713 ) In reranking the first input (the query) is generally shorter. In this case it makes more sense to truncate the second input (the document text)	2024-08-05 11:47:15 +01:00
Aurélien FOUCRET	bee6d0e1f7	Remove input fields from exported LTR models (#708 )	2024-07-05 14:31:22 +02:00
Bart Broere	1014ecdb39	Fix non _source fields missing from the result hits (#693 )	2024-06-10 11:09:52 +04:00
Aurélien FOUCRET	9cea2385e6	Work around LTR model cache in tests (#685 )	2024-04-08 14:00:36 +04:00
David Kyle	ae0bba34c6	Upgrade torch to 2.1.2 (#671 ) Compatible with Elasticsearch 8.13 where the same upgrade has been made	2024-03-26 10:06:50 +00:00
David Kyle	8e8c49ddbf	Mute the Learning to Rank tests (#676 )	2024-03-21 10:13:31 +00:00
David Kyle	5d34dc3cc4	Add override option to specify the model's max input size(#674 ) If the max input size cannot be found in the configuration the user can specify it as a parameter to the eland_import_hub_model script	2024-03-20 10:02:43 +00:00
Bart Broere	33cf029efe	Implement eland.DataFrame.to_json (#661 ) Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>	2024-02-15 11:32:54 +04:00
Quentin Pradet	02190e74e7	Switch to 2024 black style (#657 )	2024-01-31 14:47:19 +04:00
Aurélien FOUCRET	2a6a4b1f06	Fix missing value support for XGBRanker. (#654 ) * Fix missing value support for XGBRanker. * lint * Sort expected scores * lint	2024-01-23 18:42:24 +01:00
David Kyle	64216d44fb	Add prefix_string config option to the import model hub script (#642 )	2024-01-19 12:06:57 +04:00
Aurélien FOUCRET	5169cc926a	Improve LTR (#651 ) * Ensure the feature logger is using NaN for non matching query feature extractors (consistent with ES). * Default score is None instead of 0. * LTR model import API improvements. * Fix feature logger tests. * Fix export in eland.ml.ltr * Apply suggestions from code review Co-authored-by: Adam Demjen <demjened@gmail.com> * Fix supported models for LTR --------- Co-authored-by: Adam Demjen <demjened@gmail.com>	2024-01-17 13:01:47 +04:00
Aurélien FOUCRET	d3ed669a5e	LTR feature logger (#648 )	2024-01-12 13:52:04 +01:00
Adam Demjen	926f0b9b5c	Add XGBRanker and transformer (#649 ) * Add XGBRanker and transformer * Map XGBoostRegressorTransformer to XGBRanker * Add unit tests * Remove unused import * Revert addition of type * Update function comment * Distinguish objective based on model class	2024-01-11 15:48:13 -05:00
Adam Demjen	840871f9d9	Accept LTR inference config when creating model (#645 ) * Support for supplying inference_config * Fix linting errors * Add unit test * Add LTR type, throw exception on predict, refine test * Add search step to LTR test * Fix linter errors * Update rescoring assertion in test + type defs * Fix linting error * Remove failing assertion	2024-01-08 09:19:03 -05:00
Aurélien FOUCRET	05c5859b8a	Adding a new movie dataset to the tests. (#646 )	2024-01-04 16:14:56 +01:00
David Kyle	081250cdec	Fix failed import of ST RoBERTa models (#637 ) Fixes an error uploading the sentence-transformers/all-distilroberta-v1 model which failed with "missing 2 required positional arguments: 'token_type_ids' and 'position_ids'". The cause was that the tokenizer type was not recognised due to a typo	2023-11-21 12:53:43 +00:00
David Kyle	b689759278	Skip model config tests (#635 ) For #633	2023-11-21 11:07:55 +00:00
Valeriy Khakhutskyy	6cecb454e3	[ML] Better memory estimation for NLP models (#568 ) This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference. This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).	2023-11-06 12:18:20 +01:00
Bart Broere	5e5f36bdf8	Deal with the mad aggregation being removed in Pandas 2 (#602 )	2023-11-06 06:12:16 +01:00
David Kyle	5b3a83e7f2	[NLP] Support E5 small multi-lingual (#625 ) Although E5 small is a BERT based model it takes 2 parameters to forward not 4. Use the tokenizer type to decide the number of parameters	2023-10-31 17:49:43 +00:00
David Kyle	ab6e44f430	[NLP] Tests for NLP model configurations (#623 ) Add tests for generated Elasticsearch model configurations	2023-10-19 12:39:57 +01:00
Bart Broere	36b941e336	Use _append instead of append since it's still available after 2.0 of pandas (#603 )	2023-10-11 15:41:05 +01:00
Quentin Pradet	c6ce4b2c46	Fix direct usage of TransformerModel (#619 )	2023-10-11 11:56:14 +02:00
Bart Broere	3908f43905	Remove deprecated check_less_precise (#596 )	2023-09-26 07:34:52 +02:00
Enrico Zimuel	bb59a4f8d6	Fixed conf test with isinstance	2023-08-22 13:23:23 +02:00
Valeriy Khakhutskyy	f38de0ed05	Fix failing unit tests (#558 ) I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility. Fixes #555	2023-07-10 15:15:58 +02:00
David Kyle	32ab988eb6	Tolerate different model output formats when measuring embedding size (#535 ) Only add the embedding_size config option if the target Elasticsearch cluster version supports it	2023-05-25 12:25:31 -05:00
David Kyle	1e6f48f8f4	Generate valid NLP model id from file path (#541 ) The eland_import_hub_model script supports uploading a local file where the --hub-model-id argument is a file path. If the --es-model-id option is not used the model Id is generated from the hub model id and when that is a file path the path must be converted to a valid elasticsearch model id.	2023-05-22 15:37:36 +01:00
David Kyle	36bbbe0bdb	Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model. (#522 ) PyTorch models traced in version 1.13 of PyTorch cannot be evaluated in version 1.9 or earlier. With this upgrade Eland becomes incompatible with pre 8.7 Elasticsearch and will refuse to upload a model to the cluster. In this scenario either upgrade Elasticsearch or use an earlier version of Eland.	2023-05-19 16:29:38 +01:00
Seth Michael Larson	f7ea3bd476	Add a compatibility layer for Elasticsearch server 8.5.0 field_caps API	2023-05-02 15:40:20 -05:00
David Kyle	50d301f7cb	Set embedding_size config parameter for Text Embedding models (#532 )	2023-04-25 11:41:14 +01:00
David Kyle	7f4687c791	[ML] Text expansion model config support (#520 )	2023-03-08 15:40:14 +00:00
Benjamin Trent	d5578637cb	Choose text_embedding from auto when task type is unknown but its a sentence-transfomers model (#516 ) closes https://github.com/elastic/eland/issues/514	2023-02-09 12:50:30 -05:00
Valeriy Khakhutskyy	0576114a1d	[ML] Export ML model as sklearn Pipeline (#509 ) Closes #503 Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.	2023-02-01 16:17:06 +01:00
Valeriy Khakhutskyy	2ea96322b3	Update to latest ES versions and fix unit tests (#512 ) Update the test matrix to the latest Elasticsearch versions and fix the broken unit tests on the CI.	2023-01-31 20:55:29 +01:00
Benjamin Trent	8892f4fd64	[ML] adds new auto task type that attempts to automatically determine NLP task type from model config (#475 ) For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture. This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.	2022-06-23 08:32:23 -04:00
Benjamin Trent	fa30246937	[ML] fixes decision tree classifier upload to account for probabilities (#465 ) This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree. The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results. This means that our "probabilities" returned will be different than sklearn.	2022-05-17 08:11:20 -04:00
Benjamin Trent	650e02d16e	[ML] improve general pytorch model import and add tests (#463 ) This improves the user consumed functions and classes for PyTorch NLP model upload to Elasticsearch. Previously it was difficult to wrap your own module for uploading to Elasticsearch. This commit splits some classes out, adds new ones, and adds tests showing how to wrap some simple modules.	2022-05-05 10:50:53 -04:00
Benjamin Trent	afe08f8107	[ML] Improve NLP model import by using nicely defined types (#459 ) This adds some more definite types for our NLP tasks and tokenization configurations. This is the first step in allowing users to more easily import their own transformer models via something other than hugging face.	2022-05-03 15:19:03 -04:00
P. Sai Vinay	76a52b7947	Add support for eland.Series.unqiue()	2022-03-31 08:33:15 -05:00
Ashton Sidhu	e3bff8a623	Add option to disable schema enforcement for `pandas_to_eland`	2022-01-14 07:35:58 -06:00
Benjamin Trent	72856e2c3f	[ML] Add support for MPNet PyTorch models	2022-01-10 11:21:30 -06:00
Ashton Sidhu	64daa07a65	Using the 'date' field for datetime64+timezone columns	2022-01-04 22:03:49 -06:00
Florian Winkler	3db93cd789	Allow using datetime types in filters	2022-01-04 14:46:18 -06:00
Seth Michael Larson	ffe7c792dc	Update Notebook examples for 8.0	2021-12-15 16:01:32 -06:00
Seth Michael Larson	cd0897f5d7	Add a warning when connecting to incompatible Elasticsearch versions	2021-12-15 14:08:20 -06:00
Seth Michael Larson	109387184a	Support the v8.0 Elasticsearch client	2021-12-09 15:01:26 -06:00
Benjamin Trent	a3b0907c5b	[ML] Add inference results tests for PyTorch transformer models	2021-11-10 06:50:10 -06:00

1 2

67 Commits