From e0c08e42a05ccc0c8802158a5f1714028b6d83d0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 24 May 2023 12:53:04 +0200 Subject: [PATCH] [DOCS] Adds instructions on model install in air-gapped env (#542) Co-authored-by: David Kyle --- docs/guide/machine-learning.asciidoc | 98 +++++++++++++++++++--------- 1 file changed, 67 insertions(+), 31 deletions(-) diff --git a/docs/guide/machine-learning.asciidoc b/docs/guide/machine-learning.asciidoc index b402701..34e14ff 100644 --- a/docs/guide/machine-learning.asciidoc +++ b/docs/guide/machine-learning.asciidoc @@ -39,12 +39,12 @@ model in {es}. === Natural language processing (NLP) with PyTorch -IMPORTANT: You need to use PyTorch `1.11.0` or earlier to import an NLP model. -Run `pip install torch==1.11` to install the aproppriate version of PyTorch. +IMPORTANT: You need to use PyTorch `1.13` or earlier to import an NLP model. +Run `pip install torch==1.13` to install the aproppriate version of PyTorch. -For NLP tasks, Eland enables you to import PyTorch trained BERT models into {es}. -Models can be either plain PyTorch models, or supported -https://huggingface.co/transformers[transformers] models from the +For NLP tasks, Eland enables you to import PyTorch models into {es}. Use the +`eland_import_hub_model` script to download and install supported +https://huggingface.co/transformers[transformer models] from the https://huggingface.co/models[Hugging Face model hub]. For example: [source,bash] @@ -61,32 +61,6 @@ $ eland_import_hub_model \ <1> <4> Specify the type of NLP task. Supported values are `fill_mask`, `ner`, `question_answering`, `text_classification`, `text_embedding`, and `zero_shot_classification`. -[source,python] ------------------------- ->>> import elasticsearch ->>> from pathlib import Path ->>> from eland.ml.pytorch import PyTorchModel ->>> from eland.ml.pytorch.transformers import TransformerModel - -# Load a Hugging Face transformers model directly from the model hub ->>> tm = TransformerModel("elastic/distilbert-base-cased-finetuned-conll03-english", "ner") -Downloading: 100%|██████████| 257/257 [00:00<00:00, 108kB/s] -Downloading: 100%|██████████| 954/954 [00:00<00:00, 372kB/s] -Downloading: 100%|██████████| 208k/208k [00:00<00:00, 668kB/s] -Downloading: 100%|██████████| 112/112 [00:00<00:00, 43.9kB/s] -Downloading: 100%|██████████| 249M/249M [00:23<00:00, 11.2MB/s] - -# Export the model in a TorchScript representation which Elasticsearch uses ->>> tmp_path = "models" ->>> Path(tmp_path).mkdir(parents=True, exist_ok=True) ->>> model_path, config, vocab_path = tm.save(tmp_path) - -# Import model into Elasticsearch ->>> es = elasticsearch.Elasticsearch("http://elastic:mlqa_admin@localhost:9200", timeout=300) # 5 minute timeout ->>> ptm = PyTorchModel(es, tm.elasticsearch_model_id()) ->>> ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config) -100%|██████████| 63/63 [00:12<00:00, 5.02it/s] ------------------------- [discrete] [[ml-nlp-pytorch-docker]] @@ -118,6 +92,68 @@ docker run -it --rm elastic/eland \ Replace the `$ELASTICSEARCH_URL` with the URL for your Elasticsearch cluster. For authentication purposes, include an administrator username and password in the URL in the following format: `https://username:password@host:port`. +[discrete] +[[ml-nlp-pytorch-air-gapped]] +==== Install models in an air-gapped environment + +You can install models in a restricted or closed network by pointing the +`eland_import_hub_model` script to local files. + +For an offline install of a Hugging Face model, the model first needs to be +cloned locally, Git and https://git-lfs.com/[Git Large File Storage] are +required to be installed in your system. + +1. Select a model you want to use from Hugging Face. Refer to the +{ml-docs}/ml-nlp-model-ref.html[compatible third party model] list for more +information on the supported architectures. + +2. Clone the selected model from Hugging Face by using the model URL. For +example: ++ +-- +[source,bash] +---- +git clone https://huggingface.co/dslim/bert-base-NER +---- +This command results in a local copy of +of the model in the directory `bert-base-NER`. +-- + +3. Use the `eland_import_hub_model` script with the `--hub-model-id` set to the +directory of the cloned model to install it: ++ +-- +[source,bash] +---- +eland_import_hub_model \ + --url 'XXXX' \ + --hub-model-id /PATH/TO/MODEL \ + --task-type ner \ + --es-username elastic --es-password XXX \ + --es-model-id bert-base-ner +---- + +If you use the Docker image to run `eland_import_hub_model` you must bind mount +the model directory, so the container can read the files: + +[source,bash] +---- +docker run --mount type=bind,source=/PATH/TO/MODELS,destination=/models,readonly -it --rm elastic/eland \ + eland_import_hub_model \ + --url 'XXXX' \ + --hub-model-id /models/bert-base-NER \ + --task-type ner \ + --es-username elastic --es-password XXX \ + --es-model-id bert-base-ner +---- +Once it's uploaded to {es}, the model will have the ID specified by +`--es-model-id`. If it is not set, the model ID is derived from +`--hub-model-id`; spaces and path delimiters are converted to double +underscores `__`. + +-- + + [discrete] [[ml-nlp-pytorch-auth]] ==== Authentication methods