eland

mirror of https://github.com/elastic/eland.git synced 2025-07-11 00:02:14 +08:00

Author	SHA1	Message	Date
Seth Michael Larson	c97e69410d	Release v8.3.0 v8.3.0	2022-07-11 13:14:13 -05:00
David Kyle	0eb36faa5b	Restrict PyTorch version not to be more advanced than that used in Elasticsearch (#479 ) Elasticsearch uses v1.11 of PyTorch. Models created with the latest PyTorch release (v1.12) are not compatible with v1.11. This pins the PyTorch version to 1.11 to prevent the incompatibility. The version of the Elasticsearch Python client is now required to be >= Eland. All users of Eland for importing NLP models should upgrade.	2022-07-07 14:56:42 +01:00
Benjamin Trent	947d4d22a9	Update python example (#477 )	2022-06-28 13:01:49 -04:00
David Kyle	23706e05b8	Add more exclusions to the dockerignore file	2022-06-28 10:34:02 -05:00
Benjamin Trent	8892f4fd64	[ML] adds new auto task type that attempts to automatically determine NLP task type from model config (#475 ) For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture. This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.	2022-06-23 08:32:23 -04:00
David Kyle	8448b3ba4e	Bump minimum PyTorch version to 1.11	2022-06-21 07:43:43 -05:00
David Kyle	081c8efaa0	Freeze the traced PyTorch model	2022-06-21 07:43:18 -05:00
Benjamin Trent	ec041ffdfd	[ML] ensure quantization is applied (#472 )	2022-06-15 09:23:24 -04:00
Lisa Cawley	07af00c741	[DOCS] Include missing attributes (#468 ) Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2022-05-31 15:50:11 -07:00
Seth Michael Larson	bbe7a70cb9	Also pin traitlets	2022-05-31 14:28:36 -07:00
Seth Michael Larson	14821a8b09	Remove 'numpydoc' to stop reformatting	2022-05-31 14:28:36 -07:00
Seth Michael Larson	673065ee42	Stop explicitly pulling master	2022-05-31 14:28:36 -07:00
Lisa Cawley	845c055d7c	[DOCS] Adds question_answering task type for eland_import_hub_model	2022-05-31 14:37:51 -05:00
Nigel Small	a4838f4d22	Ignore type checking for `agg_value`	2022-05-31 09:23:15 -05:00
Lisa Cawley	09dd56c399	Add authentication methods for import model script (#466 )	2022-05-18 07:44:37 -07:00
Benjamin Trent	fa30246937	[ML] fixes decision tree classifier upload to account for probabilities (#465 ) This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree. The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results. This means that our "probabilities" returned will be different than sklearn.	2022-05-17 08:11:20 -04:00
Seth Michael Larson	5bbb8e484a	Release 8.2.0 v8.2.0	2022-05-11 06:38:21 -05:00
Benjamin Trent	650e02d16e	[ML] improve general pytorch model import and add tests (#463 ) This improves the user consumed functions and classes for PyTorch NLP model upload to Elasticsearch. Previously it was difficult to wrap your own module for uploading to Elasticsearch. This commit splits some classes out, adds new ones, and adds tests showing how to wrap some simple modules.	2022-05-05 10:50:53 -04:00
Benjamin Trent	70fadc9986	[ML] add support for question_answering NLP tasks (#457 ) Adds support for `question_answering` NLP models within the pytorch model uploader. Related: https://github.com/elastic/elasticsearch/pull/85958	2022-05-04 13:15:33 -04:00
Benjamin Trent	afe08f8107	[ML] Improve NLP model import by using nicely defined types (#459 ) This adds some more definite types for our NLP tasks and tokenization configurations. This is the first step in allowing users to more easily import their own transformer models via something other than hugging face.	2022-05-03 15:19:03 -04:00
David Olaru	3255f55d71	Fix `--es-api-key` argument help text	2022-04-27 15:48:22 -05:00
David Olaru	492bb9683a	Add support for Cloud ID to hub model import script The Cloud ID simplifies sending data to a cluster on Elastic Cloud. With this change, the user will have the option specify a Cloud ID using the `--cloud-id` argument as an alternative to an Elasticsearch URL (`--url` argument). `--cloud-id` and `--url` are mutually exclusive arguments.	2022-04-27 15:48:22 -05:00
David Olaru	fe3422100c	Hub model import script improvements (#461 ) ## Changes ### Better logging Switched from `print` statements to `logging` for a cleaner and more informative output - timestamps and log level are shown. The logging is now a bit more verbose, but it will help users to better understand what the script is doing. ### Add support for ES authentication using username/password or api key Instead of being limited to passing credentials in the URL, there are now 2 additional methods: - username/password using `--es-username` and `--es-password` - API key using `--es-api-key` Credentials can also be specified as environment variables with `ES_USERNAME`/`ES_PASSWORD` or `ES_API_KEY` ### Graceful handling of missing PyTorch requirements In order to use the `eland_import_hub_model` script, PyTorch extras are required to be installed. If the user does not have the required packages installed, a helpful message is logged with a hint to install `eland[pytorch]` with `pip`. ### Graceful handling of already existing trained model If a trained model with the same ID as the one we're trying to import already exists, and `--clear-previous` was not specified, we now log a clearer message about why the script can't proceed along with a hint to use the `--clear-previous` flag. Prior to this change, we were letting the API exception seep through and the user was faced with a stack trace. ### `tqdm` added to main dependencies If the user doesn't have `eland[pytorch]` extras installed, the first module to be reported as missing is `tqdm`. Since this module is [used in eland codebase](`8294224e34/eland/ml/pytorch/_pytorch_model.py (L24)`) directly, it makes sense to me to have it as part of the main set of requirements. ### Nit: Set tqdm unit to `parts` in `_pytorch_model.put_model` The default unit is `it`, but `parts` better describes what the progress bar is tracking - uploading trained model definition parts.	2022-04-27 15:13:58 +01:00
David Olaru	b5ea1cf228	Align dependencies between requirement files and setup.py (#460 )	2022-04-27 07:14:49 -05:00
Benjamin Trent	8294224e34	[ML] Fix XGBoost model import for xgboost>=1.6	2022-04-20 09:20:50 -05:00
Seth Michael Larson	cb839a9ac9	Release 8.1.0 v8.1.0	2022-03-31 17:12:26 -05:00
P. Sai Vinay	76a52b7947	Add support for eland.Series.unqiue()	2022-03-31 08:33:15 -05:00
Benjamin Trent	15a3007288	[ML] add roberta bart transformer upload support (#443 ) Related to: https://github.com/elastic/elasticsearch/pull/84777 This allows BART and RoBERTa models to be uploaded to Elasticsearch for our currently defined NLP tasks.	2022-03-14 12:26:12 -04:00
David Kyle	5678525b15	Fix mypy type errors for elasticsearch-python v8.0.0	2022-03-08 17:50:39 -06:00
David Kyle	5c5e5af54d	Add --ca-certs and --insecure option for configuring TLS	2022-03-08 15:44:13 -06:00
Seth Michael Larson	abd05df50b	Release 8.0.0 v8.0.0	2022-02-10 14:29:54 -06:00
Ashton Sidhu	e3bff8a623	Add option to disable schema enforcement for `pandas_to_eland`	2022-01-14 07:35:58 -06:00
István Zoltán Szabó	9206941659	[DOCS] Adds NLP with PyTorch section to ML-related page in Eland docs	2022-01-11 09:08:00 -06:00
Benjamin Trent	72856e2c3f	[ML] Add support for MPNet PyTorch models	2022-01-10 11:21:30 -06:00
Ashton Sidhu	64daa07a65	Using the 'date' field for datetime64+timezone columns	2022-01-04 22:03:49 -06:00
Florian Winkler	3db93cd789	Allow using datetime types in filters	2022-01-04 14:46:18 -06:00
Seth Michael Larson	c14bc24032	Release 8.0.0-beta1 v8.0.0b1	2021-12-16 07:42:38 -06:00
Seth Michael Larson	ffe7c792dc	Update Notebook examples for 8.0	2021-12-15 16:01:32 -06:00
Seth Michael Larson	cd0897f5d7	Add a warning when connecting to incompatible Elasticsearch versions	2021-12-15 14:08:20 -06:00
Seth Michael Larson	109387184a	Support the v8.0 Elasticsearch client	2021-12-09 15:01:26 -06:00
Josh Devins	1ffbe002c4	Upgrade PyTorch dependencies to latest In preparation for an 8.0 release, this updates PyTorch NLP dependencies to more recent and latest minor versions. Amongst other things, this introduces a fix from transformers that is helpful for text embedding tasks with certain DPR models. See: https://github.com/huggingface/transformers/issues/13670 Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2021-12-06 09:05:54 -06:00
Seth Michael Larson	e6bb917d83	Add quotes to versions in test-matrix.yml	2021-12-03 09:37:37 -06:00
Seth Michael Larson	4e489de424	Bump version to 8.0.0	2021-12-02 08:41:11 -06:00
Seth Michael Larson	f98ebd4c29	Update Jenkins jobs for 8.x and 7.x	2021-12-01 14:01:48 -06:00
Josh Devins	5bc1a824a7	Add PyTorch modules to noxfile We added the `pytorch` module which is type checked but was not in the noxfile as such. This change also addresses type errors that arose after adding type checking.	2021-11-29 08:03:25 -08:00
Josh Devins	7209f61773	Adds max_length padding to transformer tracing (#411 ) The padding parameter needs to be set on the tokenization call and not in the constructor. Furthermore, the True value will only pad to the largest input in a batch, however we don't trace with batches so this value had no effect. The proper place to pass this parameter is in the tokenization call itself and the proper value to use is "max_length" which will pad the input to the maximum input size specified by the model. Although we measure no functional or performance impact of this setting, it has been suggested that this is a best practice. See: https://huggingface.co/transformers/serialization.html#dummy-inputs-and-standard-lengths	2021-11-11 13:18:55 +01:00
Benjamin Trent	a3b0907c5b	[ML] Add inference results tests for PyTorch transformer models	2021-11-10 06:50:10 -06:00
Seth Michael Larson	66e3e4eaad	Set 'script.max_compilations_rate: use-context'	2021-11-02 10:09:25 -04:00
Josh Devins	1e5b475bee	Adds NLP with PyTorch basic example to README The Machine Learning section now has two sub-sections — one for traditional regression/classification and the other for NLP with PyTorch. The examples show two ways to upload models from the Hugging Face model hub.	2021-11-02 08:00:33 -05:00
Josh Devins	df51f8af07	Document how to install transitive binary dependencies, add repo Dockerfile Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2021-10-28 12:05:39 -05:00

... 2 3 4 5 6 ...

572 Commits