eland/docs/guide/machine-learning.asciidoc
2024-03-21 14:21:32 +04:00

194 lines
6.1 KiB
Plaintext

[[machine-learning]]
== Machine Learning
[discrete]
[[ml-trained-models]]
=== Trained models
Eland allows transforming trained models from scikit-learn, XGBoost,
and LightGBM libraries to be serialized and used as an inference
model in {es}.
[source,python]
------------------------
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])
>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
es_client="http://localhost:9200",
model_id="xgb-classifier",
model=xgb_model,
feature_names=["f0", "f1", "f2", "f3", "f4"],
)
# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
------------------------
[discrete]
[[ml-nlp-pytorch]]
=== Natural language processing (NLP) with PyTorch
IMPORTANT: You need to install the appropriate version of PyTorch to import an
NLP model. Run `python -m pip install 'eland[pytorch]'` to install that version.
For NLP tasks, Eland enables you to import PyTorch models into {es}. Use the
`eland_import_hub_model` script to download and install supported
https://huggingface.co/transformers[transformer models] from the
https://huggingface.co/models[Hugging Face model hub]. For example:
[source,bash]
------------------------
$ eland_import_hub_model <authentication> \ <1>
--url http://localhost:9200/ \ <2>
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \ <3>
--task-type ner \ <4>
--start
------------------------
<1> Use an authentication method to access your cluster. Refer to <<ml-nlp-pytorch-auth>>.
<2> The cluster URL. Alternatively, use `--cloud-id`.
<3> Specify the identifier for the model in the Hugging Face model hub.
<4> Specify the type of NLP task. Supported values are `fill_mask`, `ner`,
`question_answering`, `text_classification`, `text_embedding`, and `zero_shot_classification`.
[discrete]
[[ml-nlp-pytorch-docker]]
==== Import model with Docker
IMPORTANT: To use the Docker container, you need to clone the Eland repository: https://github.com/elastic/eland
If you want to use Eland without installing it, you can use the Docker image:
You can use the container interactively:
```bash
$ docker run -it --rm --network host docker.elastic.co/eland/eland
```
Running installed scripts is also possible without an interactive shell, for example:
```bash
docker run -it --rm docker.elastic.co/eland/eland \
eland_import_hub_model \
--url $ELASTICSEARCH_URL \
--hub-model-id elastic/distilbert-base-uncased-finetuned-conll03-english \
--start
```
Replace the `$ELASTICSEARCH_URL` with the URL for your Elasticsearch cluster. For authentication purposes, include an administrator username and password in the URL in the following format: `https://username:password@host:port`.
[discrete]
[[ml-nlp-pytorch-air-gapped]]
==== Install models in an air-gapped environment
You can install models in a restricted or closed network by pointing the
`eland_import_hub_model` script to local files.
For an offline install of a Hugging Face model, the model first needs to be
cloned locally, Git and https://git-lfs.com/[Git Large File Storage] are
required to be installed in your system.
1. Select a model you want to use from Hugging Face. Refer to the
{ml-docs}/ml-nlp-model-ref.html[compatible third party model] list for more
information on the supported architectures.
2. Clone the selected model from Hugging Face by using the model URL. For
example:
+
--
[source,bash]
----
git clone https://huggingface.co/dslim/bert-base-NER
----
This command results in a local copy of
of the model in the directory `bert-base-NER`.
--
3. Use the `eland_import_hub_model` script with the `--hub-model-id` set to the
directory of the cloned model to install it:
+
--
[source,bash]
----
eland_import_hub_model \
--url 'XXXX' \
--hub-model-id /PATH/TO/MODEL \
--task-type ner \
--es-username elastic --es-password XXX \
--es-model-id bert-base-ner
----
If you use the Docker image to run `eland_import_hub_model` you must bind mount
the model directory, so the container can read the files:
[source,bash]
----
docker run --mount type=bind,source=/PATH/TO/MODELS,destination=/models,readonly -it --rm docker.elastic.co/eland/eland \
eland_import_hub_model \
--url 'XXXX' \
--hub-model-id /models/bert-base-NER \
--task-type ner \
--es-username elastic --es-password XXX \
--es-model-id bert-base-ner
----
Once it's uploaded to {es}, the model will have the ID specified by
`--es-model-id`. If it is not set, the model ID is derived from
`--hub-model-id`; spaces and path delimiters are converted to double
underscores `__`.
--
[discrete]
[[ml-nlp-pytorch-auth]]
==== Authentication methods
The following authentication options are available when using the import script:
* Elasticsearch username and password authentication (specified with the `-u` and `-p` options):
+
--
[source,bash]
--------------------------------------------------
eland_import_hub_model -u <username> -p <password> --cloud-id <cloud-id> ...
--------------------------------------------------
These `-u` and `-p` options also work when you use `--url`.
--
* Elasticsearch username and password authentication (embedded in the URL):
+
--
[source,bash]
--------------------------------------------------
eland_import_hub_model --url https://<user>:<password>@<hostname>:<port> ...
--------------------------------------------------
--
* Elasticsearch API key authentication:
+
--
[source,bash]
--------------------------------------------------
eland_import_hub_model --es-api-key <api-key> --url https://<hostname>:<port> ...
--------------------------------------------------
--
* HuggingFace Hub access token (for private models):
+
--
[source,bash]
--------------------------------------------------
eland_import_hub_model --hub-access-token <access-token> ...
--------------------------------------------------
--