eland

mirror of https://github.com/elastic/eland.git synced 2025-07-11 00:02:14 +08:00

Author	SHA1	Message	Date
Enrico Zimuel	932092c0e5	Fixed test for mean using ES 8.9.0	2023-08-24 10:46:14 +02:00
Enrico Zimuel	08b7fac32b	Updated test to ES 8.9-SNAPSHOT	2023-08-23 13:53:15 +02:00
Enrico Zimuel	bb59a4f8d6	Fixed conf test with isinstance	2023-08-22 13:23:23 +02:00
Josh Devins	f26fb8a430	Simplify embedding model support and loading (#569 ) We were attempting to load SentenceTransformers by looking at the model prefix, however SentenceTransformers can also be loaded from other orgs in the model hub, as well as from local disk. This prefix checking failed in those two cases. To simplify the loading logic and deciding which wrapper to use, we’ve removed support for text_embedding tasks to load a plain Transformer. We now only support DPR embedding models and SentenceTransformer embedding models. If you try to load a plain Transformer model, it will be loaded by SentenceTransformers and a mean pooling layer will automatically be added by the SentenceTransformer library. Since we no longer automatically support non-DPR and non-SentenceTransformers, we should include somewhere example code for how to load a custom model without DPR or SentenceTransformers. See: https://github.com/UKPLab/sentence-transformers/blob/v2.2.2/sentence_transformers/SentenceTransformer.py#L801 Resolves #531	2023-07-31 18:18:46 +02:00
Fernando Briano	7ad1f430e4	[CI] Adds buildkite pull requests configuration (#570 )	2023-07-26 13:43:40 +01:00
Youhei Sakurai	4cf92fd9b7	Make eland_import_hub_model easier to find on Windows. (#559 )	2023-07-20 09:24:35 +01:00
Fernando Briano	664180d93d	[CI] Removes Jenkins .ci folder (#561 ) Continuing the migration to Buildkite.	2023-07-18 13:32:30 +01:00
Fernando Briano	2134c71ab4	Add Buildkite configuration (#515 ) * [CI] Adds Buildkite configuration * Removes GitHub Actions * Moves lint and docs tasks to Buildkite	2023-07-17 14:08:41 +01:00
Youhei Sakurai	b5bcba713d	Apply black to comply with the code style (#557 ) Relates https://github.com/elastic/eland/pull/552 Issue: ```console C:\Users\YouheiSakurai\git\myeland>python -m black --version python -m black, 23.3.0 (compiled: yes) Python (CPython) 3.11.0 C:\Users\YouheiSakurai\git\myeland>python -m black --check --target-version=py38 bin\eland_import_hub_model would reformat bin\eland_import_hub_model Oh no! 💥 💔 💥 1 file would be reformatted. ``` Solution: ``` C:\Users\YouheiSakurai\git\myeland>python -m black --target-version=py38 bin\eland_import_hub_model reformatted bin\eland_import_hub_model All done! ✨ 🍰 ✨ 1 file reformatted. ```	2023-07-13 09:55:00 +02:00
Valeriy Khakhutskyy	77781b90ff	[ML] Update trained model inference endpoint (#556 ) Infer trained model deployment API has been deprecated, so I changed the code to use the new one.	2023-07-11 10:55:11 +02:00
Valeriy Khakhutskyy	f38de0ed05	Fix failing unit tests (#558 ) I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility. Fixes #555	2023-07-10 15:15:58 +02:00
Youhei Sakurai	5ac8a053f0	Fix No module named 'torch' (#553 ) Do not import torch unless necessary	2023-07-07 09:11:11 +01:00
Youhei Sakurai	55967a7324	Minimize if main section (#554 ) For migration from scripts to console_scripts in setup.py, the current long if __name__ == "__main__": section is a blocker because the console_scripts requires to specify a function as an entrypoint. Move the logic into a main() function.	2023-07-05 10:49:16 +01:00
Dai Sugimori	bf3b092ed4	Add BertJapaneseTokenizer support with bert_ja tokenization configuration (#534 ) See elasticsearch#95546	2023-06-23 08:14:27 +01:00
Seth Michael Larson	5fd1221815	Fix autosummary directive by removing hack autosummaries	2023-06-15 10:50:19 -05:00
Seth Michael Larson	17c1c2e9c7	Switch to the 'Furo' Sphinx theme	2023-06-15 09:51:14 -05:00
Benjamin Trent	8b327f60b8	[ML] add ability to upload xlm-roberta tokenized models (#518 ) This allows XLMRoberta models to be uploaded to Elasticsearch. blocked by: elastic/elasticsearch#94089	2023-06-14 07:59:28 -04:00
David Kyle	68a22a8001	Default the optional es_version parameter (#545 )	2023-06-07 12:34:53 +01:00
Seth Michael Larson	afc7e41d6e	Update Dockerfile base image to use newer version	2023-06-02 14:20:01 -05:00
David Kyle	32ab988eb6	Tolerate different model output formats when measuring embedding size (#535 ) Only add the embedding_size config option if the target Elasticsearch cluster version supports it	2023-05-25 12:25:31 -05:00
David Kyle	7ca8376f68	Add Elasticsearch 8.8 snapshot to test matrix (#543 ) And increase the test ES node heap size to prevent circuit breaker exceptions due to better memory accounting in elastic/elasticsearch#89437.	2023-05-24 11:59:41 +01:00
István Zoltán Szabó	e0c08e42a0	[DOCS] Adds instructions on model install in air-gapped env (#542 ) Co-authored-by: David Kyle <david.kyle@elastic.co>	2023-05-24 12:53:04 +02:00
David Kyle	1e6f48f8f4	Generate valid NLP model id from file path (#541 ) The eland_import_hub_model script supports uploading a local file where the --hub-model-id argument is a file path. If the --es-model-id option is not used the model Id is generated from the hub model id and when that is a file path the path must be converted to a valid elasticsearch model id.	2023-05-22 15:37:36 +01:00
David Kyle	7820a31256	Limit NumPy to a range of versions and note why (#540 )	2023-05-22 10:47:06 +01:00
David Kyle	36bbbe0bdb	Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model. (#522 ) PyTorch models traced in version 1.13 of PyTorch cannot be evaluated in version 1.9 or earlier. With this upgrade Eland becomes incompatible with pre 8.7 Elasticsearch and will refuse to upload a model to the cluster. In this scenario either upgrade Elasticsearch or use an earlier version of Eland.	2023-05-19 16:29:38 +01:00
David Kyle	b507bb6d6c	Restrict NumPy and Pandas versions (#539 ) Shap is incompatible with NumPy 1.24 due to a deprecated usage becoming an error. There is no fix in Shap yet so an earlier version of NumPy must be used. Pandas 2.0 was recently released we will continue to use the latest 1.5 release to avoid any incompatibilities.	2023-05-19 16:04:33 +01:00
Seth Michael Larson	f7ea3bd476	Add a compatibility layer for Elasticsearch server 8.5.0 field_caps API	2023-05-02 15:40:20 -05:00
Seth Michael Larson	ca0cbe94ea	Fix readthedocs with Python 3.8	2023-05-02 12:21:57 -05:00
David Kyle	50d301f7cb	Set embedding_size config parameter for Text Embedding models (#532 )	2023-04-25 11:41:14 +01:00
David Kyle	940f2a9bad	[NLP] Add support for the pass_through task #526	2023-04-06 15:43:00 +01:00
David Kyle	8e0d897171	[NLP] Prevent TypeError with None check (#525 )	2023-04-03 14:56:19 +01:00
David Roberts	cebee6406f	Include pitfall of `--start` in the README (#506 ) Users who follow the Eland README as a guide to importing models can easily end up seeing inexplicably poor performance due to unknowingly running the model with one allocation and one thread per allocation. This change spells out the effect of `--start` and links to alternatives that allow better use of available hardware. Co-authored-by: David Kyle <david.kyle@elastic.co>	2023-03-30 20:28:48 +01:00
Seth Michael Larson	44e04b4905	Release v8.7.0 v8.7.0	2023-03-30 14:00:02 -05:00
David Kyle	7f4687c791	[ML] Text expansion model config support (#520 )	2023-03-08 15:40:14 +00:00
Benjamin Trent	d5578637cb	Choose text_embedding from auto when task type is unknown but its a sentence-transfomers model (#516 ) closes https://github.com/elastic/eland/issues/514	2023-02-09 12:50:30 -05:00
Valeriy Khakhutskyy	0576114a1d	[ML] Export ML model as sklearn Pipeline (#509 ) Closes #503 Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.	2023-02-01 16:17:06 +01:00
Valeriy Khakhutskyy	2ea96322b3	Update to latest ES versions and fix unit tests (#512 ) Update the test matrix to the latest Elasticsearch versions and fix the broken unit tests on the CI.	2023-01-31 20:55:29 +01:00
David Kyle	c55516f376	Fixes for two type hinting issues	2023-01-04 09:53:09 -06:00
David Kyle	211cc2c83f	Handle OSError for missing LightGBM dependency Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2022-11-02 11:32:27 -05:00
Benjamin Trent	82e34dbddb	Minor formatting fix for ML docs	2022-10-20 09:47:55 -05:00
Benjamin Trent	a8c8726634	[ML] add text_similarity task support (#486 ) Adds text_similarity task support. This is a cross-encoder transformer task where both sequences are given to the transformer at once. According to 🤗 (or at least how the cross-encoder models are concerned) this is a sequence classification task with just one classification "label". But really, it isn't labeled at all and is more akin to a regression model. related: elastic/elasticsearch#88439	2022-08-01 09:04:34 -04:00
Benjamin Trent	11ea68a443	Add docker steps for eland model upload (#489 )	2022-07-21 15:27:19 -04:00
István Zoltán Szabó	fbb01e5698	[DOCS] Adds important note about PyTorch version compatibility. (#487 )	2022-07-13 12:41:35 +02:00
Seth Michael Larson	c97e69410d	Release v8.3.0 v8.3.0	2022-07-11 13:14:13 -05:00
David Kyle	0eb36faa5b	Restrict PyTorch version not to be more advanced than that used in Elasticsearch (#479 ) Elasticsearch uses v1.11 of PyTorch. Models created with the latest PyTorch release (v1.12) are not compatible with v1.11. This pins the PyTorch version to 1.11 to prevent the incompatibility. The version of the Elasticsearch Python client is now required to be >= Eland. All users of Eland for importing NLP models should upgrade.	2022-07-07 14:56:42 +01:00
Benjamin Trent	947d4d22a9	Update python example (#477 )	2022-06-28 13:01:49 -04:00
David Kyle	23706e05b8	Add more exclusions to the dockerignore file	2022-06-28 10:34:02 -05:00
Benjamin Trent	8892f4fd64	[ML] adds new auto task type that attempts to automatically determine NLP task type from model config (#475 ) For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture. This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.	2022-06-23 08:32:23 -04:00
David Kyle	8448b3ba4e	Bump minimum PyTorch version to 1.11	2022-06-21 07:43:43 -05:00
David Kyle	081c8efaa0	Freeze the traced PyTorch model	2022-06-21 07:43:18 -05:00

1 2 3 4 5 ...

565 Commits