eland

mirror of https://github.com/elastic/eland.git synced 2025-07-11 00:02:14 +08:00

Author	SHA1	Message	Date
Dai Sugimori	bf3b092ed4	Add BertJapaneseTokenizer support with bert_ja tokenization configuration (#534 ) See elasticsearch#95546	2023-06-23 08:14:27 +01:00
Seth Michael Larson	5fd1221815	Fix autosummary directive by removing hack autosummaries	2023-06-15 10:50:19 -05:00
Seth Michael Larson	17c1c2e9c7	Switch to the 'Furo' Sphinx theme	2023-06-15 09:51:14 -05:00
Benjamin Trent	8b327f60b8	[ML] add ability to upload xlm-roberta tokenized models (#518 ) This allows XLMRoberta models to be uploaded to Elasticsearch. blocked by: elastic/elasticsearch#94089	2023-06-14 07:59:28 -04:00
David Kyle	68a22a8001	Default the optional es_version parameter (#545 )	2023-06-07 12:34:53 +01:00
Seth Michael Larson	afc7e41d6e	Update Dockerfile base image to use newer version	2023-06-02 14:20:01 -05:00
David Kyle	32ab988eb6	Tolerate different model output formats when measuring embedding size (#535 ) Only add the embedding_size config option if the target Elasticsearch cluster version supports it	2023-05-25 12:25:31 -05:00
David Kyle	7ca8376f68	Add Elasticsearch 8.8 snapshot to test matrix (#543 ) And increase the test ES node heap size to prevent circuit breaker exceptions due to better memory accounting in elastic/elasticsearch#89437.	2023-05-24 11:59:41 +01:00
István Zoltán Szabó	e0c08e42a0	[DOCS] Adds instructions on model install in air-gapped env (#542 ) Co-authored-by: David Kyle <david.kyle@elastic.co>	2023-05-24 12:53:04 +02:00
David Kyle	1e6f48f8f4	Generate valid NLP model id from file path (#541 ) The eland_import_hub_model script supports uploading a local file where the --hub-model-id argument is a file path. If the --es-model-id option is not used the model Id is generated from the hub model id and when that is a file path the path must be converted to a valid elasticsearch model id.	2023-05-22 15:37:36 +01:00
David Kyle	7820a31256	Limit NumPy to a range of versions and note why (#540 )	2023-05-22 10:47:06 +01:00
David Kyle	36bbbe0bdb	Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model. (#522 ) PyTorch models traced in version 1.13 of PyTorch cannot be evaluated in version 1.9 or earlier. With this upgrade Eland becomes incompatible with pre 8.7 Elasticsearch and will refuse to upload a model to the cluster. In this scenario either upgrade Elasticsearch or use an earlier version of Eland.	2023-05-19 16:29:38 +01:00
David Kyle	b507bb6d6c	Restrict NumPy and Pandas versions (#539 ) Shap is incompatible with NumPy 1.24 due to a deprecated usage becoming an error. There is no fix in Shap yet so an earlier version of NumPy must be used. Pandas 2.0 was recently released we will continue to use the latest 1.5 release to avoid any incompatibilities.	2023-05-19 16:04:33 +01:00
Seth Michael Larson	f7ea3bd476	Add a compatibility layer for Elasticsearch server 8.5.0 field_caps API	2023-05-02 15:40:20 -05:00
Seth Michael Larson	ca0cbe94ea	Fix readthedocs with Python 3.8	2023-05-02 12:21:57 -05:00
David Kyle	50d301f7cb	Set embedding_size config parameter for Text Embedding models (#532 )	2023-04-25 11:41:14 +01:00
David Kyle	940f2a9bad	[NLP] Add support for the pass_through task #526	2023-04-06 15:43:00 +01:00
David Kyle	8e0d897171	[NLP] Prevent TypeError with None check (#525 )	2023-04-03 14:56:19 +01:00
David Roberts	cebee6406f	Include pitfall of `--start` in the README (#506 ) Users who follow the Eland README as a guide to importing models can easily end up seeing inexplicably poor performance due to unknowingly running the model with one allocation and one thread per allocation. This change spells out the effect of `--start` and links to alternatives that allow better use of available hardware. Co-authored-by: David Kyle <david.kyle@elastic.co>	2023-03-30 20:28:48 +01:00
Seth Michael Larson	44e04b4905	Release v8.7.0 v8.7.0	2023-03-30 14:00:02 -05:00
David Kyle	7f4687c791	[ML] Text expansion model config support (#520 )	2023-03-08 15:40:14 +00:00
Benjamin Trent	d5578637cb	Choose text_embedding from auto when task type is unknown but its a sentence-transfomers model (#516 ) closes https://github.com/elastic/eland/issues/514	2023-02-09 12:50:30 -05:00
Valeriy Khakhutskyy	0576114a1d	[ML] Export ML model as sklearn Pipeline (#509 ) Closes #503 Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.	2023-02-01 16:17:06 +01:00
Valeriy Khakhutskyy	2ea96322b3	Update to latest ES versions and fix unit tests (#512 ) Update the test matrix to the latest Elasticsearch versions and fix the broken unit tests on the CI.	2023-01-31 20:55:29 +01:00
David Kyle	c55516f376	Fixes for two type hinting issues	2023-01-04 09:53:09 -06:00
David Kyle	211cc2c83f	Handle OSError for missing LightGBM dependency Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2022-11-02 11:32:27 -05:00
Benjamin Trent	82e34dbddb	Minor formatting fix for ML docs	2022-10-20 09:47:55 -05:00
Benjamin Trent	a8c8726634	[ML] add text_similarity task support (#486 ) Adds text_similarity task support. This is a cross-encoder transformer task where both sequences are given to the transformer at once. According to 🤗 (or at least how the cross-encoder models are concerned) this is a sequence classification task with just one classification "label". But really, it isn't labeled at all and is more akin to a regression model. related: elastic/elasticsearch#88439	2022-08-01 09:04:34 -04:00
Benjamin Trent	11ea68a443	Add docker steps for eland model upload (#489 )	2022-07-21 15:27:19 -04:00
István Zoltán Szabó	fbb01e5698	[DOCS] Adds important note about PyTorch version compatibility. (#487 )	2022-07-13 12:41:35 +02:00
Seth Michael Larson	c97e69410d	Release v8.3.0 v8.3.0	2022-07-11 13:14:13 -05:00
David Kyle	0eb36faa5b	Restrict PyTorch version not to be more advanced than that used in Elasticsearch (#479 ) Elasticsearch uses v1.11 of PyTorch. Models created with the latest PyTorch release (v1.12) are not compatible with v1.11. This pins the PyTorch version to 1.11 to prevent the incompatibility. The version of the Elasticsearch Python client is now required to be >= Eland. All users of Eland for importing NLP models should upgrade.	2022-07-07 14:56:42 +01:00
Benjamin Trent	947d4d22a9	Update python example (#477 )	2022-06-28 13:01:49 -04:00
David Kyle	23706e05b8	Add more exclusions to the dockerignore file	2022-06-28 10:34:02 -05:00
Benjamin Trent	8892f4fd64	[ML] adds new auto task type that attempts to automatically determine NLP task type from model config (#475 ) For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture. This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.	2022-06-23 08:32:23 -04:00
David Kyle	8448b3ba4e	Bump minimum PyTorch version to 1.11	2022-06-21 07:43:43 -05:00
David Kyle	081c8efaa0	Freeze the traced PyTorch model	2022-06-21 07:43:18 -05:00
Benjamin Trent	ec041ffdfd	[ML] ensure quantization is applied (#472 )	2022-06-15 09:23:24 -04:00
Lisa Cawley	07af00c741	[DOCS] Include missing attributes (#468 ) Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2022-05-31 15:50:11 -07:00
Seth Michael Larson	bbe7a70cb9	Also pin traitlets	2022-05-31 14:28:36 -07:00
Seth Michael Larson	14821a8b09	Remove 'numpydoc' to stop reformatting	2022-05-31 14:28:36 -07:00
Seth Michael Larson	673065ee42	Stop explicitly pulling master	2022-05-31 14:28:36 -07:00
Lisa Cawley	845c055d7c	[DOCS] Adds question_answering task type for eland_import_hub_model	2022-05-31 14:37:51 -05:00
Nigel Small	a4838f4d22	Ignore type checking for `agg_value`	2022-05-31 09:23:15 -05:00
Lisa Cawley	09dd56c399	Add authentication methods for import model script (#466 )	2022-05-18 07:44:37 -07:00
Benjamin Trent	fa30246937	[ML] fixes decision tree classifier upload to account for probabilities (#465 ) This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree. The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results. This means that our "probabilities" returned will be different than sklearn.	2022-05-17 08:11:20 -04:00
Seth Michael Larson	5bbb8e484a	Release 8.2.0 v8.2.0	2022-05-11 06:38:21 -05:00
Benjamin Trent	650e02d16e	[ML] improve general pytorch model import and add tests (#463 ) This improves the user consumed functions and classes for PyTorch NLP model upload to Elasticsearch. Previously it was difficult to wrap your own module for uploading to Elasticsearch. This commit splits some classes out, adds new ones, and adds tests showing how to wrap some simple modules.	2022-05-05 10:50:53 -04:00
Benjamin Trent	70fadc9986	[ML] add support for question_answering NLP tasks (#457 ) Adds support for `question_answering` NLP models within the pytorch model uploader. Related: https://github.com/elastic/elasticsearch/pull/85958	2022-05-04 13:15:33 -04:00
Benjamin Trent	afe08f8107	[ML] Improve NLP model import by using nicely defined types (#459 ) This adds some more definite types for our NLP tasks and tokenization configurations. This is the first step in allowing users to more easily import their own transformer models via something other than hugging face.	2022-05-03 15:19:03 -04:00

... 2 3 4 5 6 ...

602 Commits