572 Commits

Author SHA1 Message Date
Seth Michael Larson
c97e69410d
Release v8.3.0 v8.3.0 2022-07-11 13:14:13 -05:00
David Kyle
0eb36faa5b
Restrict PyTorch version not to be more advanced than that used in Elasticsearch (#479)
Elasticsearch uses v1.11 of PyTorch. Models created with the latest PyTorch 
release (v1.12) are not compatible with v1.11. This pins the PyTorch version
to 1.11 to prevent the incompatibility. The version of the Elasticsearch Python
client is now required to be >= Eland.

All users of Eland for importing NLP models should upgrade.
2022-07-07 14:56:42 +01:00
Benjamin Trent
947d4d22a9
Update python example (#477) 2022-06-28 13:01:49 -04:00
David Kyle
23706e05b8
Add more exclusions to the dockerignore file 2022-06-28 10:34:02 -05:00
Benjamin Trent
8892f4fd64
[ML] adds new auto task type that attempts to automatically determine NLP task type from model config (#475)
For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture. 

This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.
2022-06-23 08:32:23 -04:00
David Kyle
8448b3ba4e
Bump minimum PyTorch version to 1.11 2022-06-21 07:43:43 -05:00
David Kyle
081c8efaa0
Freeze the traced PyTorch model 2022-06-21 07:43:18 -05:00
Benjamin Trent
ec041ffdfd
[ML] ensure quantization is applied (#472) 2022-06-15 09:23:24 -04:00
Lisa Cawley
07af00c741
[DOCS] Include missing attributes (#468)
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2022-05-31 15:50:11 -07:00
Seth Michael Larson
bbe7a70cb9 Also pin traitlets 2022-05-31 14:28:36 -07:00
Seth Michael Larson
14821a8b09 Remove 'numpydoc' to stop reformatting 2022-05-31 14:28:36 -07:00
Seth Michael Larson
673065ee42 Stop explicitly pulling master 2022-05-31 14:28:36 -07:00
Lisa Cawley
845c055d7c
[DOCS] Adds question_answering task type for eland_import_hub_model 2022-05-31 14:37:51 -05:00
Nigel Small
a4838f4d22
Ignore type checking for agg_value 2022-05-31 09:23:15 -05:00
Lisa Cawley
09dd56c399
Add authentication methods for import model script (#466) 2022-05-18 07:44:37 -07:00
Benjamin Trent
fa30246937
[ML] fixes decision tree classifier upload to account for probabilities (#465)
This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree.

The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results.

This means that our "probabilities" returned will be different than sklearn.
2022-05-17 08:11:20 -04:00
Seth Michael Larson
5bbb8e484a Release 8.2.0 v8.2.0 2022-05-11 06:38:21 -05:00
Benjamin Trent
650e02d16e
[ML] improve general pytorch model import and add tests (#463)
This improves the user consumed functions and classes for PyTorch NLP model upload to Elasticsearch.

Previously it was difficult to wrap your own module for uploading to Elasticsearch.

This commit splits some classes out, adds new ones, and adds tests showing how to wrap some simple modules.
2022-05-05 10:50:53 -04:00
Benjamin Trent
70fadc9986
[ML] add support for question_answering NLP tasks (#457)
Adds support for `question_answering` NLP models within the pytorch model uploader.

Related: https://github.com/elastic/elasticsearch/pull/85958
2022-05-04 13:15:33 -04:00
Benjamin Trent
afe08f8107
[ML] Improve NLP model import by using nicely defined types (#459)
This adds some more definite types for our NLP tasks and tokenization configurations.

This is the first step in allowing users to more easily import their own transformer models via something other than hugging face.
2022-05-03 15:19:03 -04:00
David Olaru
3255f55d71 Fix --es-api-key argument help text 2022-04-27 15:48:22 -05:00
David Olaru
492bb9683a Add support for Cloud ID to hub model import script
The Cloud ID simplifies sending data to a cluster on Elastic Cloud.

With this change, the user will have the option specify a Cloud ID using the `--cloud-id` argument as an alternative to an Elasticsearch URL (`--url` argument).

`--cloud-id` and `--url` are mutually exclusive arguments.
2022-04-27 15:48:22 -05:00
David Olaru
fe3422100c
Hub model import script improvements (#461)
## Changes 
### Better logging
Switched from `print` statements to `logging` for a cleaner and more informative output - timestamps and log level are shown. The logging is now a bit more verbose, but it will help users to better understand what the script is doing.

### Add support for ES authentication using username/password or api key
Instead of being limited to passing credentials in the URL, there are now 2 additional methods:
- username/password using `--es-username` and `--es-password`
- API key using `--es-api-key`

Credentials can also be specified as environment variables with `ES_USERNAME`/`ES_PASSWORD` or `ES_API_KEY`

### Graceful handling of missing PyTorch requirements
In order to use the `eland_import_hub_model` script, PyTorch extras are required to be installed. If the user does not have the required packages installed, a helpful message is logged with a hint to install `eland[pytorch]` with `pip`.

### Graceful handling of already existing trained model
If a trained model with the same ID as the one we're trying to import already exists, and `--clear-previous` was not specified, we now log a clearer message about why the script can't proceed along with a hint to use the `--clear-previous` flag. 

Prior to this change, we were letting the API exception seep through and the user was faced with a stack trace.

### `tqdm` added to main dependencies
If the user doesn't have `eland[pytorch]` extras installed, the first module to be reported as missing is `tqdm`. Since this module is [used in eland codebase](8294224e34/eland/ml/pytorch/_pytorch_model.py (L24)) directly, it makes sense to me to have it as part of the main set of requirements.

### Nit: Set tqdm unit to `parts` in `_pytorch_model.put_model`
The default unit is `it`, but `parts` better describes what the progress bar is tracking - uploading trained model definition parts.
2022-04-27 15:13:58 +01:00
David Olaru
b5ea1cf228
Align dependencies between requirement files and setup.py (#460) 2022-04-27 07:14:49 -05:00
Benjamin Trent
8294224e34
[ML] Fix XGBoost model import for xgboost>=1.6 2022-04-20 09:20:50 -05:00
Seth Michael Larson
cb839a9ac9
Release 8.1.0 v8.1.0 2022-03-31 17:12:26 -05:00
P. Sai Vinay
76a52b7947
Add support for eland.Series.unqiue() 2022-03-31 08:33:15 -05:00
Benjamin Trent
15a3007288
[ML] add roberta bart transformer upload support (#443)
Related to: https://github.com/elastic/elasticsearch/pull/84777

This allows BART and RoBERTa models to be uploaded to Elasticsearch for our currently defined NLP tasks.
2022-03-14 12:26:12 -04:00
David Kyle
5678525b15
Fix mypy type errors for elasticsearch-python v8.0.0 2022-03-08 17:50:39 -06:00
David Kyle
5c5e5af54d
Add --ca-certs and --insecure option for configuring TLS 2022-03-08 15:44:13 -06:00
Seth Michael Larson
abd05df50b
Release 8.0.0 v8.0.0 2022-02-10 14:29:54 -06:00
Ashton Sidhu
e3bff8a623
Add option to disable schema enforcement for pandas_to_eland 2022-01-14 07:35:58 -06:00
István Zoltán Szabó
9206941659
[DOCS] Adds NLP with PyTorch section to ML-related page in Eland docs 2022-01-11 09:08:00 -06:00
Benjamin Trent
72856e2c3f
[ML] Add support for MPNet PyTorch models 2022-01-10 11:21:30 -06:00
Ashton Sidhu
64daa07a65
Using the 'date' field for datetime64+timezone columns 2022-01-04 22:03:49 -06:00
Florian Winkler
3db93cd789
Allow using datetime types in filters 2022-01-04 14:46:18 -06:00
Seth Michael Larson
c14bc24032
Release 8.0.0-beta1 v8.0.0b1 2021-12-16 07:42:38 -06:00
Seth Michael Larson
ffe7c792dc
Update Notebook examples for 8.0 2021-12-15 16:01:32 -06:00
Seth Michael Larson
cd0897f5d7
Add a warning when connecting to incompatible Elasticsearch versions 2021-12-15 14:08:20 -06:00
Seth Michael Larson
109387184a
Support the v8.0 Elasticsearch client 2021-12-09 15:01:26 -06:00
Josh Devins
1ffbe002c4
Upgrade PyTorch dependencies to latest
In preparation for an 8.0 release, this updates PyTorch NLP dependencies
to more recent and latest minor versions. Amongst other things, this
introduces a fix from transformers that is helpful for text embedding
tasks with certain DPR models.

See: https://github.com/huggingface/transformers/issues/13670

Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-12-06 09:05:54 -06:00
Seth Michael Larson
e6bb917d83
Add quotes to versions in test-matrix.yml 2021-12-03 09:37:37 -06:00
Seth Michael Larson
4e489de424
Bump version to 8.0.0 2021-12-02 08:41:11 -06:00
Seth Michael Larson
f98ebd4c29
Update Jenkins jobs for 8.x and 7.x 2021-12-01 14:01:48 -06:00
Josh Devins
5bc1a824a7
Add PyTorch modules to noxfile
We added the `pytorch` module which is type checked but was not in the
noxfile as such. This change also addresses type errors that arose after
adding type checking.
2021-11-29 08:03:25 -08:00
Josh Devins
7209f61773
Adds max_length padding to transformer tracing (#411)
The padding parameter needs to be set on the tokenization call and not
in the constructor. Furthermore, the True value will only pad to the
largest input in a batch, however we don't trace with batches so this
value had no effect. The proper place to pass this parameter is in the
tokenization call itself and the proper value to use is "max_length"
which will pad the input to the maximum input size specified by the
model. Although we measure no functional or performance impact of this
setting, it has been suggested that this is a best practice.

See: https://huggingface.co/transformers/serialization.html#dummy-inputs-and-standard-lengths
2021-11-11 13:18:55 +01:00
Benjamin Trent
a3b0907c5b
[ML] Add inference results tests for PyTorch transformer models 2021-11-10 06:50:10 -06:00
Seth Michael Larson
66e3e4eaad
Set 'script.max_compilations_rate: use-context' 2021-11-02 10:09:25 -04:00
Josh Devins
1e5b475bee
Adds NLP with PyTorch basic example to README
The Machine Learning section now has two sub-sections — one for
traditional regression/classification and the other for NLP with
PyTorch. The examples show two ways to upload models from the Hugging
Face model hub.
2021-11-02 08:00:33 -05:00
Josh Devins
df51f8af07
Document how to install transitive binary dependencies, add repo Dockerfile
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-10-28 12:05:39 -05:00