582 Commits

Author SHA1 Message Date
Quentin Pradet
566bb9e990
Allow importing private HuggingFace models (#608) 2023-09-25 15:10:58 +02:00
Quentin Pradet
5ec760635b
Recommend installing Eland in a virtual environment (#606) 2023-09-22 13:14:05 +02:00
Jonathan Buttner
a8b76c390f
Setting chunk size to 1mb (#605) 2023-09-20 11:40:11 -04:00
Bart Broere
12200039f5
Fix iteritems deprecation (#593) 2023-09-19 12:00:32 +02:00
David Kyle
301cda8d69
Error measuring embedding size for some DPR models (#573)
Fixes an error unpacking a tuple that contains a single element.
2023-09-19 10:44:15 +01:00
Bart Broere
5c5ef63a69
Use the workaround if we can't determine the server's version (#581) 2023-09-15 15:29:36 +04:00
Quentin Pradet
eb69496627
Add dummy pipeline to prepare publishing a Docker image (#590) 2023-09-06 07:12:06 +02:00
Quentin Pradet
64ffbcec0f
Revert "Update Docker image to Debian 12 Bookworm (#586)" (#588) 2023-09-05 12:36:42 +04:00
Quentin Pradet
4d2c6e2f4d
Fix Buildkite builds on pull requests (#589) 2023-09-05 12:20:24 +04:00
Quentin Pradet
ea4c2d1251
Fix downloads badge URL (#587) 2023-09-05 11:57:36 +04:00
Quentin Pradet
c7a58e3783
Fix README so that copy/pastes work without warnings (#584) 2023-09-05 11:56:25 +04:00
Quentin Pradet
0be509730a
Update Docker image to Debian 12 Bookworm (#586) 2023-09-04 19:28:38 +04:00
David Kyle
95864a9ace
Update README.md with note about installing extras for NLP (#582) 2023-08-31 10:34:36 +01:00
Enrico Zimuel
f14bbaf4b0
Added build and twine to requirements-dev 2023-08-24 16:02:12 +02:00
Enrico Zimuel
ac8c7c341e
Readded author info v8.9.0 2023-08-24 11:18:17 +02:00
Enrico Zimuel
2304fdc593
Updated docs 2023-08-24 11:12:30 +02:00
Enrico Zimuel
ebdebdf16f
Prep for 8.9.0 release 2023-08-24 11:11:48 +02:00
Enrico Zimuel
932092c0e5
Fixed test for mean using ES 8.9.0 2023-08-24 10:46:14 +02:00
Enrico Zimuel
08b7fac32b
Updated test to ES 8.9-SNAPSHOT 2023-08-23 13:53:15 +02:00
Enrico Zimuel
bb59a4f8d6
Fixed conf test with isinstance 2023-08-22 13:23:23 +02:00
Josh Devins
f26fb8a430
Simplify embedding model support and loading (#569)
We were attempting to load SentenceTransformers by looking at the model
prefix, however SentenceTransformers can also be loaded from other
orgs in the model hub, as well as from local disk. This prefix checking
failed in those two cases. To simplify the loading logic and deciding
which wrapper to use, we’ve removed support for text_embedding tasks to
load a plain Transformer. We now only support DPR embedding models and
SentenceTransformer embedding models. If you try to load a plain
Transformer model, it will be loaded by SentenceTransformers and a mean
pooling layer will automatically be added by the SentenceTransformer
library. Since we no longer automatically support non-DPR and
non-SentenceTransformers, we should include somewhere example code for
how to load a custom model without DPR or SentenceTransformers. 

See: https://github.com/UKPLab/sentence-transformers/blob/v2.2.2/sentence_transformers/SentenceTransformer.py#L801

Resolves #531
2023-07-31 18:18:46 +02:00
Fernando Briano
7ad1f430e4
[CI] Adds buildkite pull requests configuration (#570) 2023-07-26 13:43:40 +01:00
Youhei Sakurai
4cf92fd9b7
Make eland_import_hub_model easier to find on Windows. (#559) 2023-07-20 09:24:35 +01:00
Fernando Briano
664180d93d
[CI] Removes Jenkins .ci folder (#561)
Continuing the migration to Buildkite.
2023-07-18 13:32:30 +01:00
Fernando Briano
2134c71ab4
Add Buildkite configuration (#515)
* [CI] Adds Buildkite configuration
* Removes GitHub Actions
* Moves lint and docs tasks to Buildkite
2023-07-17 14:08:41 +01:00
Youhei Sakurai
b5bcba713d
Apply black to comply with the code style (#557)
Relates https://github.com/elastic/eland/pull/552

**Issue**:

```console
C:\Users\YouheiSakurai\git\myeland>python -m black --version
python -m black, 23.3.0 (compiled: yes)
Python (CPython) 3.11.0

C:\Users\YouheiSakurai\git\myeland>python -m black --check --target-version=py38 bin\eland_import_hub_model
would reformat bin\eland_import_hub_model

Oh no! 💥 💔 💥
1 file would be reformatted.
```

**Solution**:
```
C:\Users\YouheiSakurai\git\myeland>python -m black --target-version=py38 bin\eland_import_hub_model
reformatted bin\eland_import_hub_model

All done!  🍰 
1 file reformatted.
```
2023-07-13 09:55:00 +02:00
Valeriy Khakhutskyy
77781b90ff
[ML] Update trained model inference endpoint (#556)
Infer trained model deployment API has been deprecated, so I changed the code to use the new one.
2023-07-11 10:55:11 +02:00
Valeriy Khakhutskyy
f38de0ed05
Fix failing unit tests (#558)
I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility.

Fixes #555
2023-07-10 15:15:58 +02:00
Youhei Sakurai
5ac8a053f0
Fix No module named 'torch' (#553)
Do not import torch unless necessary
2023-07-07 09:11:11 +01:00
Youhei Sakurai
55967a7324
Minimize if main section (#554)
For migration from scripts to console_scripts in setup.py,
the current long if __name__ == "__main__": section is a 
blocker because the console_scripts requires to specify a
function as an entrypoint.
Move the logic into a main() function.
2023-07-05 10:49:16 +01:00
Dai Sugimori
bf3b092ed4
Add BertJapaneseTokenizer support with bert_ja tokenization configuration (#534)
See elasticsearch#95546
2023-06-23 08:14:27 +01:00
Seth Michael Larson
5fd1221815
Fix autosummary directive by removing hack autosummaries 2023-06-15 10:50:19 -05:00
Seth Michael Larson
17c1c2e9c7
Switch to the 'Furo' Sphinx theme 2023-06-15 09:51:14 -05:00
Benjamin Trent
8b327f60b8
[ML] add ability to upload xlm-roberta tokenized models (#518)
This allows XLMRoberta models to be uploaded to Elasticsearch.

blocked by: elastic/elasticsearch#94089
2023-06-14 07:59:28 -04:00
David Kyle
68a22a8001
Default the optional es_version parameter (#545) 2023-06-07 12:34:53 +01:00
Seth Michael Larson
afc7e41d6e
Update Dockerfile base image to use newer version 2023-06-02 14:20:01 -05:00
David Kyle
32ab988eb6
Tolerate different model output formats when measuring embedding size (#535)
Only add the embedding_size config option if the target Elasticsearch 
cluster version supports it
2023-05-25 12:25:31 -05:00
David Kyle
7ca8376f68
Add Elasticsearch 8.8 snapshot to test matrix (#543)
And increase the test ES node heap size to prevent circuit 
breaker exceptions due to better memory accounting in
elastic/elasticsearch#89437.
2023-05-24 11:59:41 +01:00
István Zoltán Szabó
e0c08e42a0
[DOCS] Adds instructions on model install in air-gapped env (#542)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-05-24 12:53:04 +02:00
David Kyle
1e6f48f8f4
Generate valid NLP model id from file path (#541)
The eland_import_hub_model script supports uploading a local file where
the --hub-model-id argument is a file path. If the --es-model-id option is
not used the model Id is generated from the hub model id and when that 
is a file path the path must be converted to a valid elasticsearch model id.
2023-05-22 15:37:36 +01:00
David Kyle
7820a31256
Limit NumPy to a range of versions and note why (#540) 2023-05-22 10:47:06 +01:00
David Kyle
36bbbe0bdb
Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model. (#522)
PyTorch models traced in version 1.13 of PyTorch cannot be evaluated in 
version 1.9 or earlier. With this upgrade Eland becomes incompatible with
pre 8.7 Elasticsearch and will refuse to upload a model to the cluster. 
In this scenario either upgrade Elasticsearch or use an earlier version of Eland.
2023-05-19 16:29:38 +01:00
David Kyle
b507bb6d6c
Restrict NumPy and Pandas versions (#539)
Shap is incompatible with NumPy 1.24 due to a deprecated usage becoming
an error. There is no fix in Shap yet so an earlier version of NumPy must
be used.
Pandas 2.0 was recently released we will continue to use the latest 1.5 release 
to avoid any incompatibilities.
2023-05-19 16:04:33 +01:00
Seth Michael Larson
f7ea3bd476
Add a compatibility layer for Elasticsearch server 8.5.0 field_caps API 2023-05-02 15:40:20 -05:00
Seth Michael Larson
ca0cbe94ea
Fix readthedocs with Python 3.8 2023-05-02 12:21:57 -05:00
David Kyle
50d301f7cb
Set embedding_size config parameter for Text Embedding models (#532) 2023-04-25 11:41:14 +01:00
David Kyle
940f2a9bad
[NLP] Add support for the pass_through task #526 2023-04-06 15:43:00 +01:00
David Kyle
8e0d897171
[NLP] Prevent TypeError with None check (#525) 2023-04-03 14:56:19 +01:00
David Roberts
cebee6406f
Include pitfall of --start in the README (#506)
Users who follow the Eland README as a guide to importing
models can easily end up seeing inexplicably poor performance
due to unknowingly running the model with one allocation and
one thread per allocation.

This change spells out the effect of `--start` and links to
alternatives that allow better use of available hardware.

Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-03-30 20:28:48 +01:00
Seth Michael Larson
44e04b4905
Release v8.7.0 v8.7.0 2023-03-30 14:00:02 -05:00