[[machine-learning]] == Machine Learning [discrete] [[ml-trained-models]] === Trained models Eland allows transforming *some* https://eland.readthedocs.io/en/latest/reference/api/eland.ml.MLModel.import_model.html#parameters[trained models] from scikit-learn, XGBoost, and LightGBM libraries to be serialized and used as an inference model in {es}. [source,python] ------------------------ >>> from xgboost import XGBClassifier >>> from eland.ml import MLModel # Train and exercise an XGBoost ML model locally >>> xgb_model = XGBClassifier(booster="gbtree") >>> xgb_model.fit(training_data[0], training_data[1]) >>> xgb_model.predict(training_data[0]) [0 1 1 0 1 0 0 0 1 0] # Import the model into Elasticsearch >>> es_model = MLModel.import_model( es_client="http://localhost:9200", model_id="xgb-classifier", model=xgb_model, feature_names=["f0", "f1", "f2", "f3", "f4"], ) # Exercise the ML model in Elasticsearch with the training data >>> es_model.predict(training_data[0]) [0 1 1 0 1 0 0 0 1 0] ------------------------ [discrete] [[ml-nlp-pytorch]] === Natural language processing (NLP) with PyTorch IMPORTANT: You need to install the appropriate version of PyTorch to import an NLP model. Run `python -m pip install 'eland[pytorch]'` to install that version. For NLP tasks, Eland enables you to import PyTorch models into {es}. Use the `eland_import_hub_model` script to download and install supported https://huggingface.co/transformers[transformer models] from the https://huggingface.co/models[Hugging Face model hub]. For example: [source,bash] ------------------------ $ eland_import_hub_model \ <1> --url http://localhost:9200/ \ <2> --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \ <3> --task-type ner \ <4> --start ------------------------ <1> Use an authentication method to access your cluster. Refer to <>. <2> The cluster URL. Alternatively, use `--cloud-id`. <3> Specify the identifier for the model in the Hugging Face model hub. <4> Specify the type of NLP task. Supported values are `fill_mask`, `ner`, `question_answering`, `text_classification`, `text_embedding`, `text_expansion`, `text_similarity` and `zero_shot_classification`. For more information about the available options, run `eland_import_hub_model` with the `--help` option. [source,bash] ------------------------ $ eland_import_hub_model --help ------------------------ [discrete] [[ml-nlp-pytorch-docker]] ==== Import model with Docker IMPORTANT: To use the Docker container, you need to clone the Eland repository: https://github.com/elastic/eland If you want to use Eland without installing it, you can use the Docker image: You can use the container interactively: ```bash $ docker run -it --rm --network host docker.elastic.co/eland/eland ``` Running installed scripts is also possible without an interactive shell, for example: ```bash docker run -it --rm docker.elastic.co/eland/eland \ eland_import_hub_model \ --url $ELASTICSEARCH_URL \ --hub-model-id elastic/distilbert-base-uncased-finetuned-conll03-english \ --start ``` Replace the `$ELASTICSEARCH_URL` with the URL for your Elasticsearch cluster. For authentication purposes, include an administrator username and password in the URL in the following format: `https://username:password@host:port`. [discrete] [[ml-nlp-pytorch-air-gapped]] ==== Install models in an air-gapped environment You can install models in a restricted or closed network by pointing the `eland_import_hub_model` script to local files. For an offline install of a Hugging Face model, the model first needs to be cloned locally, Git and https://git-lfs.com/[Git Large File Storage] are required to be installed in your system. 1. Select a model you want to use from Hugging Face. Refer to the {ml-docs}/ml-nlp-model-ref.html[compatible third party model] list for more information on the supported architectures. 2. Clone the selected model from Hugging Face by using the model URL. For example: + -- [source,bash] ---- git clone https://huggingface.co/dslim/bert-base-NER ---- This command results in a local copy of of the model in the directory `bert-base-NER`. -- 3. Use the `eland_import_hub_model` script with the `--hub-model-id` set to the directory of the cloned model to install it: + -- [source,bash] ---- eland_import_hub_model \ --url 'XXXX' \ --hub-model-id /PATH/TO/MODEL \ --task-type ner \ --es-username elastic --es-password XXX \ --es-model-id bert-base-ner ---- If you use the Docker image to run `eland_import_hub_model` you must bind mount the model directory, so the container can read the files: [source,bash] ---- docker run --mount type=bind,source=/PATH/TO/MODEL,destination=/model,readonly -it --rm docker.elastic.co/eland/eland \ eland_import_hub_model \ --url 'XXXX' \ --hub-model-id /model \ --task-type ner \ --es-username elastic --es-password XXX \ --es-model-id bert-base-ner ---- Once it's uploaded to {es}, the model will have the ID specified by `--es-model-id`. If it is not set, the model ID is derived from `--hub-model-id`; spaces and path delimiters are converted to double underscores `__`. -- [discrete] [[ml-nlp-pytorch-proxy]] ==== Connect to Elasticsearch through a proxy Behind the scenes, Eland uses the `requests` Python library, which https://requests.readthedocs.io/en/latest/user/advanced/#proxies[allows configuring proxies through an environment variable]. For example, to use an HTTP proxy to connect to an HTTPS Elasticsearch cluster, you need to set the `HTTPS_PROXY` environment variable when invoking Eland: [source,bash] -------------------------------------------------- HTTPS_PROXY=http://proxy-host:proxy-port eland_import_hub_model ... -------------------------------------------------- If you disabled security on your Elasticsearch cluster, you should use `HTTP_PROXY` instead. [discrete] [[ml-nlp-pytorch-auth]] ==== Authentication methods The following authentication options are available when using the import script: * Elasticsearch username and password authentication (specified with the `-u` and `-p` options): + -- [source,bash] -------------------------------------------------- eland_import_hub_model -u -p --cloud-id ... -------------------------------------------------- These `-u` and `-p` options also work when you use `--url`. -- * Elasticsearch username and password authentication (embedded in the URL): + -- [source,bash] -------------------------------------------------- eland_import_hub_model --url https://:@: ... -------------------------------------------------- -- * Elasticsearch API key authentication: + -- [source,bash] -------------------------------------------------- eland_import_hub_model --es-api-key --url https://: ... -------------------------------------------------- -- * HuggingFace Hub access token (for private models): + -- [source,bash] -------------------------------------------------- eland_import_hub_model --hub-access-token ... -------------------------------------------------- -- [discrete] [[ml-nlp-pytorch-tls]] ==== TLS/SSL The following TLS/SSL options for Elasticsearch are available when using the import script: * Specify alternate CA bundle to verify the cluster certificate: + -- [source,bash] -------------------------------------------------- eland_import_hub_model --ca-certs CA_CERTS ... -------------------------------------------------- -- * Disable TLS/SSL verification altogether (strongly discouraged): + -- [source,bash] -------------------------------------------------- eland_import_hub_model --insecure ... -------------------------------------------------- --