docs-builder: add pull-requests: write permission to docs-build workflow (#800 )

Fix lint (#798 )
Update README.md (#796 )
2025-07-11 00:02:14 +08:00 · 2025-06-23 15:39:36 +04:00 · 2025-06-05 15:52:19 +04:00 · 2025-05-16 15:56:20 +01:00 · 2025-04-30 17:25:32 +04:00 · 2025-04-30 16:25:33 +04:00
20 changed files with 472 additions and 326 deletions
--- a/.buildkite/pipeline.yml
+++ b/.buildkite/pipeline.yml
@ -45,5 +45,6 @@ steps:
          - '3.10'
          - '3.9'
        stack:          
-          - '9.0.0-SNAPSHOT'
+          - '9.0.0'
+          - '9.1.0-SNAPSHOT'
    command: ./.buildkite/run-tests
--- a/.github/workflows/docs-build.yml
+++ b/.github/workflows/docs-build.yml
@ -16,4 +16,4 @@ jobs:
      deployments: write
      id-token: write
      contents: read
-      pull-requests: read
+      pull-requests: write
--- a/CHANGELOG.rst
+++ b/CHANGELOG.rst
@ -2,6 +2,14 @@
 Changelog
 =========

+9.0.1 (2025-04-30)
+------------------
+
+* Forbid Elasticsearch 8 client or server (`#780 <https://github.com/elastic/eland/pull/780>`_)
+* Fix DeBERTa tokenization (`#769 <https://github.com/elastic/eland/pull/769>`_)
+* Upgrade PyTorch to 2.5.1 (`#785 <https://github.com/elastic/eland/pull/785>`_)
+* Upgrade LightGBM to 4.6.0 (`#782 <https://github.com/elastic/eland/pull/782>`_)
+
 9.0.0 (2025-04-15)
 ------------------

--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -78,9 +78,15 @@ Once your changes and tests are ready to submit for review:
    # Run Auto-format, lint, mypy type checker for your changes
    $ nox -s format

-    # Run the test suite
-    $ pytest --doctest-modules eland/ tests/
-    $ pytest --nbval tests/notebook/
+    # Launch Elasticsearch with a trial licence and ML enabled
+    $ docker run --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.license.self_generated.type=trial" docker.elastic.co/elasticsearch/elasticsearch:9.0.0
+
+    # See all test suites
+    $ nox -l
+    # Run a specific test suite
+    $ nox -rs "test-3.12(pandas_version='2.2.3')"
+    # Run a specific test
+    $ nox -rs "test-3.12(pandas_version='2.2.3')" -- -k test_learning_to_rank

    ```

--- a/2
+++ b/2
@ -18,7 +18,7 @@ RUN --mount=type=cache,target=/root/.cache/pip \
    if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
      python3 -m pip install \
        --no-cache-dir --disable-pip-version-check --extra-index-url https://download.pytorch.org/whl/cpu  \
-        torch==2.3.1+cpu .[all]; \
+        torch==2.5.1+cpu .[all]; \
    else \
      python3 -m pip install \
        --no-cache-dir --disable-pip-version-check \
--- a/Dockerfile.wolfi
+++ b/Dockerfile.wolfi
@ -13,7 +13,7 @@ RUN --mount=type=cache,target=/root/.cache/pip \
    if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
      python3 -m pip install \
        --no-cache-dir --disable-pip-version-check --extra-index-url https://download.pytorch.org/whl/cpu  \
-        torch==2.3.1+cpu .[all]; \
+        torch==2.5.1+cpu .[all]; \
    else \
      python3 -m pip install \
        --no-cache-dir --disable-pip-version-check \
--- a/README.md
+++ b/README.md
@ -53,7 +53,8 @@ $ conda install -c conda-forge eland

 ### Compatibility

- Supports Python 3.9, 3.10, 3.11, 3.12 and Pandas 1.5
+- Supports Python 3.9, 3.10, 3.11 and 3.12.
+- Supports Pandas 1.5 and 2.
 - Supports Elasticsearch 8+ clusters, recommended 8.16 or later for all features to work.
  If you are using the NLP with PyTorch feature make sure your Eland minor version matches the minor 
  version of your Elasticsearch cluster. For all other features it is sufficient for the major versions
--- a/eland/_version.py
+++ b/eland/_version.py
@ -18,7 +18,7 @@
 __title__ = "eland"
 __description__ = "Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch"
 __url__ = "https://github.com/elastic/eland"
-__version__ = "9.0.0"
+__version__ = "9.0.1"
 __author__ = "Steve Dodson"
 __author_email__ = "steve.dodson@elastic.co"
 __maintainer__ = "Elastic Client Library Maintainers"
--- a/eland/cli/eland_import_hub_model.py
+++ b/eland/cli/eland_import_hub_model.py
@ -236,14 +236,9 @@ def check_cluster_version(es_client, logger):
            f"Elasticsearch version {major_version} does not support NLP models. Please upgrade Elasticsearch to the latest version"
        )
        exit(1)
-
-    # PyTorch was upgraded to version 2.3.1 in 8.15.2
-    # and is incompatible with earlier versions
-    if sem_ver < (8, 15, 2):
-        import torch
-
+    elif major_version < 9:
        logger.error(
-            f"Eland uses PyTorch version {torch.__version__} which is incompatible with Elasticsearch versions prior to 8.15.2. Please upgrade Elasticsearch to at least version 8.15.2"
+            "Eland 9.x does not support Elasticsearch 8.x. Please upgrade Elasticsearch first."
        )
        exit(1)

--- a/eland/index.py
+++ b/eland/index.py
@ -50,10 +50,7 @@ class Index:
        # index_field.setter
        self._is_source_field = False

-        # The type:ignore is due to mypy not being smart enough
-        # to recognize the property.setter has a different type
-        # than the property.getter.
-        self.es_index_field = es_index_field  # type: ignore
+        self.es_index_field = es_index_field

    @property
    def sort_field(self) -> str:
--- a/eland/ml/_model_serializer.py
+++ b/eland/ml/_model_serializer.py
@ -19,7 +19,7 @@ import base64
 import gzip
 import json
 from abc import ABC
-from typing import Any, Dict, List, Optional, Sequence
+from typing import Any, Dict, List, Optional, Sequence, Tuple


 def add_if_exists(d: Dict[str, Any], k: str, v: Any) -> None:
@ -58,6 +58,9 @@ class ModelSerializer(ABC):
            "ascii"
        )

+    def bounds(self) -> Tuple[float, float]:
+        raise NotImplementedError
+

 class TreeNode:
    def __init__(
@ -129,6 +132,14 @@ class Tree(ModelSerializer):
        add_if_exists(d, "tree_structure", [t.to_dict() for t in self._tree_structure])
        return {"tree": d}

+    def bounds(self) -> Tuple[float, float]:
+        leaf_values = [
+            tree_node._leaf_value[0]
+            for tree_node in self._tree_structure
+            if tree_node._leaf_value is not None
+        ]
+        return min(leaf_values), max(leaf_values)
+

 class Ensemble(ModelSerializer):
    def __init__(
@ -158,3 +169,9 @@ class Ensemble(ModelSerializer):
        add_if_exists(d, "classification_weights", self._classification_weights)
        add_if_exists(d, "aggregate_output", self._output_aggregator)
        return {"ensemble": d}
+
+    def bounds(self) -> Tuple[float, float]:
+        min_bound, max_bound = tuple(
+            map(sum, zip(*[model.bounds() for model in self._trained_models]))
+        )
+        return min_bound, max_bound
--- a/eland/ml/pytorch/_pytorch_model.py
+++ b/eland/ml/pytorch/_pytorch_model.py
@ -126,6 +126,7 @@ class PyTorchModel:
    def infer(
        self,
        docs: List[Mapping[str, str]],
+        inference_config: Optional[Mapping[str, Any]] = None,
        timeout: str = DEFAULT_TIMEOUT,
    ) -> Any:
        if docs is None:
@ -133,6 +134,8 @@ class PyTorchModel:

        __body: Dict[str, Any] = {}
        __body["docs"] = docs
+        if inference_config is not None:
+            __body["inference_config"] = inference_config

        __path = f"/_ml/trained_models/{_quote(self.model_id)}/_infer"
        __query: Dict[str, Any] = {}
--- a/eland/ml/pytorch/nlp_ml_model.py
+++ b/eland/ml/pytorch/nlp_ml_model.py
@ -86,7 +86,7 @@ class NlpXLMRobertaTokenizationConfig(NlpTokenizationConfig):
        )


-class DebertaV2Config(NlpTokenizationConfig):
+class NlpDebertaV2TokenizationConfig(NlpTokenizationConfig):
    def __init__(
        self,
        *,
--- a/eland/ml/pytorch/transformers.py
+++ b/eland/ml/pytorch/transformers.py
@ -25,17 +25,13 @@ import os.path
 import random
 import re
 from abc import ABC, abstractmethod
-from typing import Any, Dict, List, Optional, Set, Tuple, Union
+from typing import Dict, List, Optional, Set, Tuple, Union

 import torch  # type: ignore
 import transformers  # type: ignore
-from sentence_transformers import SentenceTransformer  # type: ignore
-from torch import Tensor, nn
+from torch import Tensor
 from torch.profiler import profile  # type: ignore
 from transformers import (
-    AutoConfig,
-    AutoModel,
-    AutoModelForQuestionAnswering,
    BertTokenizer,
    PretrainedConfig,
    PreTrainedModel,
@ -44,11 +40,11 @@ from transformers import (
 )

 from eland.ml.pytorch.nlp_ml_model import (
-    DebertaV2Config,
    FillMaskInferenceOptions,
    NerInferenceOptions,
    NlpBertJapaneseTokenizationConfig,
    NlpBertTokenizationConfig,
+    NlpDebertaV2TokenizationConfig,
    NlpMPNetTokenizationConfig,
    NlpRobertaTokenizationConfig,
    NlpTokenizationConfig,
@ -65,8 +61,13 @@ from eland.ml.pytorch.nlp_ml_model import (
    ZeroShotClassificationInferenceOptions,
 )
 from eland.ml.pytorch.traceable_model import TraceableModel
+from eland.ml.pytorch.wrappers import (
+    _DistilBertWrapper,
+    _DPREncoderWrapper,
+    _QuestionAnsweringWrapperModule,
+    _SentenceTransformerWrapperModule,
+)

-DEFAULT_OUTPUT_KEY = "sentence_embedding"
 SUPPORTED_TASK_TYPES = {
    "fill_mask",
    "ner",
@ -172,284 +173,6 @@ def task_type_from_model_config(model_config: PretrainedConfig) -> Optional[str]
    return potential_task_types.pop()


-class _QuestionAnsweringWrapperModule(nn.Module):  # type: ignore
-    """
-    A wrapper around a question answering model.
-    Our inference engine only takes the first tuple if the inference response
-    is a tuple.
-
-    This wrapper transforms the output to be a stacked tensor if its a tuple.
-
-    Otherwise it passes it through
-    """
-
-    def __init__(self, model: PreTrainedModel):
-        super().__init__()
-        self._hf_model = model
-        self.config = model.config
-
-    @staticmethod
-    def from_pretrained(model_id: str, *, token: Optional[str] = None) -> Optional[Any]:
-        model = AutoModelForQuestionAnswering.from_pretrained(
-            model_id, token=token, torchscript=True
-        )
-        if isinstance(
-            model.config,
-            (
-                transformers.MPNetConfig,
-                transformers.XLMRobertaConfig,
-                transformers.RobertaConfig,
-                transformers.BartConfig,
-            ),
-        ):
-            return _TwoParameterQuestionAnsweringWrapper(model)
-        else:
-            return _QuestionAnsweringWrapper(model)
-
-
-class _QuestionAnsweringWrapper(_QuestionAnsweringWrapperModule):
-    def __init__(self, model: PreTrainedModel):
-        super().__init__(model=model)
-
-    def forward(
-        self,
-        input_ids: Tensor,
-        attention_mask: Tensor,
-        token_type_ids: Tensor,
-        position_ids: Tensor,
-    ) -> Tensor:
-        """Wrap the input and output to conform to the native process interface."""
-
-        inputs = {
-            "input_ids": input_ids,
-            "attention_mask": attention_mask,
-            "token_type_ids": token_type_ids,
-            "position_ids": position_ids,
-        }
-
-        # remove inputs for specific model types
-        if isinstance(self._hf_model.config, transformers.DistilBertConfig):
-            del inputs["token_type_ids"]
-            del inputs["position_ids"]
-        response = self._hf_model(**inputs)
-        if isinstance(response, tuple):
-            return torch.stack(list(response), dim=0)
-        return response
-
-
-class _TwoParameterQuestionAnsweringWrapper(_QuestionAnsweringWrapperModule):
-    def __init__(self, model: PreTrainedModel):
-        super().__init__(model=model)
-
-    def forward(self, input_ids: Tensor, attention_mask: Tensor) -> Tensor:
-        """Wrap the input and output to conform to the native process interface."""
-        inputs = {
-            "input_ids": input_ids,
-            "attention_mask": attention_mask,
-        }
-        response = self._hf_model(**inputs)
-        if isinstance(response, tuple):
-            return torch.stack(list(response), dim=0)
-        return response
-
-
-class _DistilBertWrapper(nn.Module):  # type: ignore
-    """
-    In Elasticsearch the BERT tokenizer is used for DistilBERT models but
-    the BERT tokenizer produces 4 inputs where DistilBERT models expect 2.
-
-    Wrap the model's forward function in a method that accepts the 4
-    arguments passed to a BERT model then discard the token_type_ids
-    and the position_ids to match the wrapped DistilBERT model forward
-    function
-    """
-
-    def __init__(self, model: transformers.PreTrainedModel):
-        super().__init__()
-        self._model = model
-        self.config = model.config
-
-    @staticmethod
-    def try_wrapping(model: PreTrainedModel) -> Optional[Any]:
-        if isinstance(model.config, transformers.DistilBertConfig):
-            return _DistilBertWrapper(model)
-        else:
-            return model
-
-    def forward(
-        self,
-        input_ids: Tensor,
-        attention_mask: Tensor,
-        _token_type_ids: Tensor = None,
-        _position_ids: Tensor = None,
-    ) -> Tensor:
-        """Wrap the input and output to conform to the native process interface."""
-
-        return self._model(input_ids=input_ids, attention_mask=attention_mask)
-
-
-class _SentenceTransformerWrapperModule(nn.Module):  # type: ignore
-    """
-    A wrapper around sentence-transformer models to provide pooling,
-    normalization and other graph layers that are not defined in the base
-    HuggingFace transformer model.
-    """
-
-    def __init__(self, model: PreTrainedModel, output_key: str = DEFAULT_OUTPUT_KEY):
-        super().__init__()
-        self._hf_model = model
-        self._st_model = SentenceTransformer(model.config.name_or_path)
-        self._output_key = output_key
-        self.config = model.config
-
-        self._remove_pooling_layer()
-        self._replace_transformer_layer()
-
-    @staticmethod
-    def from_pretrained(
-        model_id: str,
-        tokenizer: PreTrainedTokenizer,
-        *,
-        token: Optional[str] = None,
-        output_key: str = DEFAULT_OUTPUT_KEY,
-    ) -> Optional[Any]:
-        model = AutoModel.from_pretrained(model_id, token=token, torchscript=True)
-        if isinstance(
-            tokenizer,
-            (
-                transformers.BartTokenizer,
-                transformers.MPNetTokenizer,
-                transformers.RobertaTokenizer,
-                transformers.XLMRobertaTokenizer,
-                transformers.DebertaV2Tokenizer,
-            ),
-        ):
-            return _TwoParameterSentenceTransformerWrapper(model, output_key)
-        else:
-            return _SentenceTransformerWrapper(model, output_key)
-
-    def _remove_pooling_layer(self) -> None:
-        """
-        Removes any last pooling layer which is not used to create embeddings.
-        Leaving this layer in will cause it to return a NoneType which in turn
-        will fail to load in libtorch. Alternatively, we can just use the output
-        of the pooling layer as a dummy but this also affects (if only in a
-        minor way) the performance of inference, so we're better off removing
-        the layer if we can.
-        """
-
-        if hasattr(self._hf_model, "pooler"):
-            self._hf_model.pooler = None
-
-    def _replace_transformer_layer(self) -> None:
-        """
-        Replaces the HuggingFace Transformer layer in the SentenceTransformer
-        modules so we can set it with one that has pooling layer removed and
-        was loaded ready for TorchScript export.
-        """
-
-        self._st_model._modules["0"].auto_model = self._hf_model
-
-
-class _SentenceTransformerWrapper(_SentenceTransformerWrapperModule):
-    def __init__(self, model: PreTrainedModel, output_key: str = DEFAULT_OUTPUT_KEY):
-        super().__init__(model=model, output_key=output_key)
-
-    def forward(
-        self,
-        input_ids: Tensor,
-        attention_mask: Tensor,
-        token_type_ids: Tensor,
-        position_ids: Tensor,
-    ) -> Tensor:
-        """Wrap the input and output to conform to the native process interface."""
-
-        inputs = {
-            "input_ids": input_ids,
-            "attention_mask": attention_mask,
-            "token_type_ids": token_type_ids,
-            "position_ids": position_ids,
-        }
-
-        # remove inputs for specific model types
-        if isinstance(self._hf_model.config, transformers.DistilBertConfig):
-            del inputs["token_type_ids"]
-
-        return self._st_model(inputs)[self._output_key]
-
-
-class _TwoParameterSentenceTransformerWrapper(_SentenceTransformerWrapperModule):
-    def __init__(self, model: PreTrainedModel, output_key: str = DEFAULT_OUTPUT_KEY):
-        super().__init__(model=model, output_key=output_key)
-
-    def forward(self, input_ids: Tensor, attention_mask: Tensor) -> Tensor:
-        """Wrap the input and output to conform to the native process interface."""
-        inputs = {
-            "input_ids": input_ids,
-            "attention_mask": attention_mask,
-        }
-        return self._st_model(inputs)[self._output_key]
-
-
-class _DPREncoderWrapper(nn.Module):  # type: ignore
-    """
-    AutoModel loading does not work for DPRContextEncoders, this only exists as
-    a workaround. This may never be fixed so this is likely permanent.
-    See: https://github.com/huggingface/transformers/issues/13670
-    """
-
-    _SUPPORTED_MODELS = {
-        transformers.DPRContextEncoder,
-        transformers.DPRQuestionEncoder,
-    }
-    _SUPPORTED_MODELS_NAMES = set([x.__name__ for x in _SUPPORTED_MODELS])
-
-    def __init__(
-        self,
-        model: Union[transformers.DPRContextEncoder, transformers.DPRQuestionEncoder],
-    ):
-        super().__init__()
-        self._model = model
-        self.config = model.config
-
-    @staticmethod
-    def from_pretrained(model_id: str, *, token: Optional[str] = None) -> Optional[Any]:
-        config = AutoConfig.from_pretrained(model_id, token=token)
-
-        def is_compatible() -> bool:
-            is_dpr_model = config.model_type == "dpr"
-            has_architectures = (
-                config.architectures is not None and len(config.architectures) == 1
-            )
-            is_supported_architecture = has_architectures and (
-                config.architectures[0] in _DPREncoderWrapper._SUPPORTED_MODELS_NAMES
-            )
-            return is_dpr_model and is_supported_architecture
-
-        if is_compatible():
-            model = getattr(transformers, config.architectures[0]).from_pretrained(
-                model_id, torchscript=True
-            )
-            return _DPREncoderWrapper(model)
-        else:
-            return None
-
-    def forward(
-        self,
-        input_ids: Tensor,
-        attention_mask: Tensor,
-        token_type_ids: Tensor,
-        _position_ids: Tensor,
-    ) -> Tensor:
-        """Wrap the input and output to conform to the native process interface."""
-
-        return self._model(
-            input_ids=input_ids,
-            attention_mask=attention_mask,
-            token_type_ids=token_type_ids,
-        )
-
-
 class _TransformerTraceableModel(TraceableModel):
    """A base class representing a HuggingFace transformer model that can be traced."""

@ -489,12 +212,17 @@ class _TransformerTraceableModel(TraceableModel):
                transformers.MPNetTokenizer,
                transformers.RobertaTokenizer,
                transformers.XLMRobertaTokenizer,
-                transformers.DebertaV2Tokenizer,
            ),
        ):
-            del inputs["token_type_ids"]
            return (inputs["input_ids"], inputs["attention_mask"])

+        if isinstance(self._tokenizer, transformers.DebertaV2Tokenizer):
+            return (
+                inputs["input_ids"],
+                inputs["attention_mask"],
+                inputs["token_type_ids"],
+            )
+
        position_ids = torch.arange(inputs["input_ids"].size(1), dtype=torch.long)
        inputs["position_ids"] = position_ids
        return (
@ -694,7 +422,12 @@ class TransformerModel:
                " ".join(m) for m, _ in sorted(ranks.items(), key=lambda kv: kv[1])
            ]
            vocab_obj["merges"] = merges
-        sp_model = getattr(self._tokenizer, "sp_model", None)
+
+        if isinstance(self._tokenizer, transformers.DebertaV2Tokenizer):
+            sp_model = self._tokenizer._tokenizer.spm
+        else:
+            sp_model = getattr(self._tokenizer, "sp_model", None)
+
        if sp_model:
            id_correction = getattr(self._tokenizer, "fairseq_offset", 0)
            scores = []
@ -733,7 +466,7 @@ class TransformerModel:
                max_sequence_length=_max_sequence_length
            )
        elif isinstance(self._tokenizer, transformers.DebertaV2Tokenizer):
-            return DebertaV2Config(
+            return NlpDebertaV2TokenizationConfig(
                max_sequence_length=_max_sequence_length,
                do_lower_case=getattr(self._tokenizer, "do_lower_case", None),
            )
--- a/eland/ml/pytorch/wrappers.py
+++ b/eland/ml/pytorch/wrappers.py
@ -0,0 +1,317 @@
+#  Licensed to Elasticsearch B.V. under one or more contributor
+#  license agreements. See the NOTICE file distributed with
+#  this work for additional information regarding copyright
+#  ownership. Elasticsearch B.V. licenses this file to you under
+#  the Apache License, Version 2.0 (the "License"); you may
+#  not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+# 	http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
+
+"""
+This module contains the wrapper classes for the Hugging Face models.
+Wrapping is necessary to ensure that the forward method of the model
+is called with the same arguments the ml-cpp pytorch_inference process
+uses.
+"""
+
+from typing import Any, Optional, Union
+
+import torch  # type: ignore
+import transformers  # type: ignore
+from sentence_transformers import SentenceTransformer  # type: ignore
+from torch import Tensor, nn
+from transformers import (
+    AutoConfig,
+    AutoModel,
+    AutoModelForQuestionAnswering,
+    PreTrainedModel,
+    PreTrainedTokenizer,
+)
+
+DEFAULT_OUTPUT_KEY = "sentence_embedding"
+
+
+class _QuestionAnsweringWrapperModule(nn.Module):  # type: ignore
+    """
+    A wrapper around a question answering model.
+    Our inference engine only takes the first tuple if the inference response
+    is a tuple.
+
+    This wrapper transforms the output to be a stacked tensor if its a tuple.
+
+    Otherwise it passes it through
+    """
+
+    def __init__(self, model: PreTrainedModel):
+        super().__init__()
+        self._hf_model = model
+        self.config = model.config
+
+    @staticmethod
+    def from_pretrained(model_id: str, *, token: Optional[str] = None) -> Optional[Any]:
+        model = AutoModelForQuestionAnswering.from_pretrained(
+            model_id, token=token, torchscript=True
+        )
+        if isinstance(
+            model.config,
+            (
+                transformers.MPNetConfig,
+                transformers.XLMRobertaConfig,
+                transformers.RobertaConfig,
+                transformers.BartConfig,
+            ),
+        ):
+            return _TwoParameterQuestionAnsweringWrapper(model)
+        else:
+            return _QuestionAnsweringWrapper(model)
+
+
+class _QuestionAnsweringWrapper(_QuestionAnsweringWrapperModule):
+    def __init__(self, model: PreTrainedModel):
+        super().__init__(model=model)
+
+    def forward(
+        self,
+        input_ids: Tensor,
+        attention_mask: Tensor,
+        token_type_ids: Tensor,
+        position_ids: Tensor,
+    ) -> Tensor:
+        """Wrap the input and output to conform to the native process interface."""
+
+        inputs = {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+            "token_type_ids": token_type_ids,
+            "position_ids": position_ids,
+        }
+
+        # remove inputs for specific model types
+        if isinstance(self._hf_model.config, transformers.DistilBertConfig):
+            del inputs["token_type_ids"]
+            del inputs["position_ids"]
+        response = self._hf_model(**inputs)
+        if isinstance(response, tuple):
+            return torch.stack(list(response), dim=0)
+        return response
+
+
+class _TwoParameterQuestionAnsweringWrapper(_QuestionAnsweringWrapperModule):
+    def __init__(self, model: PreTrainedModel):
+        super().__init__(model=model)
+
+    def forward(self, input_ids: Tensor, attention_mask: Tensor) -> Tensor:
+        """Wrap the input and output to conform to the native process interface."""
+        inputs = {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+        }
+        response = self._hf_model(**inputs)
+        if isinstance(response, tuple):
+            return torch.stack(list(response), dim=0)
+        return response
+
+
+class _DistilBertWrapper(nn.Module):  # type: ignore
+    """
+    In Elasticsearch the BERT tokenizer is used for DistilBERT models but
+    the BERT tokenizer produces 4 inputs where DistilBERT models expect 2.
+
+    Wrap the model's forward function in a method that accepts the 4
+    arguments passed to a BERT model then discard the token_type_ids
+    and the position_ids to match the wrapped DistilBERT model forward
+    function
+    """
+
+    def __init__(self, model: transformers.PreTrainedModel):
+        super().__init__()
+        self._model = model
+        self.config = model.config
+
+    @staticmethod
+    def try_wrapping(model: PreTrainedModel) -> Optional[Any]:
+        if isinstance(model.config, transformers.DistilBertConfig):
+            return _DistilBertWrapper(model)
+        else:
+            return model
+
+    def forward(
+        self,
+        input_ids: Tensor,
+        attention_mask: Tensor,
+        _token_type_ids: Tensor = None,
+        _position_ids: Tensor = None,
+    ) -> Tensor:
+        """Wrap the input and output to conform to the native process interface."""
+
+        return self._model(input_ids=input_ids, attention_mask=attention_mask)
+
+
+class _SentenceTransformerWrapperModule(nn.Module):  # type: ignore
+    """
+    A wrapper around sentence-transformer models to provide pooling,
+    normalization and other graph layers that are not defined in the base
+    HuggingFace transformer model.
+    """
+
+    def __init__(self, model: PreTrainedModel, output_key: str = DEFAULT_OUTPUT_KEY):
+        super().__init__()
+        self._hf_model = model
+        self._st_model = SentenceTransformer(model.config.name_or_path)
+        self._output_key = output_key
+        self.config = model.config
+
+        self._remove_pooling_layer()
+        self._replace_transformer_layer()
+
+    @staticmethod
+    def from_pretrained(
+        model_id: str,
+        tokenizer: PreTrainedTokenizer,
+        *,
+        token: Optional[str] = None,
+        output_key: str = DEFAULT_OUTPUT_KEY,
+    ) -> Optional[Any]:
+        model = AutoModel.from_pretrained(model_id, token=token, torchscript=True)
+        if isinstance(
+            tokenizer,
+            (
+                transformers.BartTokenizer,
+                transformers.MPNetTokenizer,
+                transformers.RobertaTokenizer,
+                transformers.XLMRobertaTokenizer,
+                transformers.DebertaV2Tokenizer,
+            ),
+        ):
+            return _TwoParameterSentenceTransformerWrapper(model, output_key)
+        else:
+            return _SentenceTransformerWrapper(model, output_key)
+
+    def _remove_pooling_layer(self) -> None:
+        """
+        Removes any last pooling layer which is not used to create embeddings.
+        Leaving this layer in will cause it to return a NoneType which in turn
+        will fail to load in libtorch. Alternatively, we can just use the output
+        of the pooling layer as a dummy but this also affects (if only in a
+        minor way) the performance of inference, so we're better off removing
+        the layer if we can.
+        """
+
+        if hasattr(self._hf_model, "pooler"):
+            self._hf_model.pooler = None
+
+    def _replace_transformer_layer(self) -> None:
+        """
+        Replaces the HuggingFace Transformer layer in the SentenceTransformer
+        modules so we can set it with one that has pooling layer removed and
+        was loaded ready for TorchScript export.
+        """
+
+        self._st_model._modules["0"].auto_model = self._hf_model
+
+
+class _SentenceTransformerWrapper(_SentenceTransformerWrapperModule):
+    def __init__(self, model: PreTrainedModel, output_key: str = DEFAULT_OUTPUT_KEY):
+        super().__init__(model=model, output_key=output_key)
+
+    def forward(
+        self,
+        input_ids: Tensor,
+        attention_mask: Tensor,
+        token_type_ids: Tensor,
+        position_ids: Tensor,
+    ) -> Tensor:
+        """Wrap the input and output to conform to the native process interface."""
+
+        inputs = {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+            "token_type_ids": token_type_ids,
+            "position_ids": position_ids,
+        }
+
+        # remove inputs for specific model types
+        if isinstance(self._hf_model.config, transformers.DistilBertConfig):
+            del inputs["token_type_ids"]
+
+        return self._st_model(inputs)[self._output_key]
+
+
+class _TwoParameterSentenceTransformerWrapper(_SentenceTransformerWrapperModule):
+    def __init__(self, model: PreTrainedModel, output_key: str = DEFAULT_OUTPUT_KEY):
+        super().__init__(model=model, output_key=output_key)
+
+    def forward(self, input_ids: Tensor, attention_mask: Tensor) -> Tensor:
+        """Wrap the input and output to conform to the native process interface."""
+        inputs = {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+        }
+        return self._st_model(inputs)[self._output_key]
+
+
+class _DPREncoderWrapper(nn.Module):  # type: ignore
+    """
+    AutoModel loading does not work for DPRContextEncoders, this only exists as
+    a workaround. This may never be fixed so this is likely permanent.
+    See: https://github.com/huggingface/transformers/issues/13670
+    """
+
+    _SUPPORTED_MODELS = {
+        transformers.DPRContextEncoder,
+        transformers.DPRQuestionEncoder,
+    }
+    _SUPPORTED_MODELS_NAMES = set([x.__name__ for x in _SUPPORTED_MODELS])
+
+    def __init__(
+        self,
+        model: Union[transformers.DPRContextEncoder, transformers.DPRQuestionEncoder],
+    ):
+        super().__init__()
+        self._model = model
+        self.config = model.config
+
+    @staticmethod
+    def from_pretrained(model_id: str, *, token: Optional[str] = None) -> Optional[Any]:
+        config = AutoConfig.from_pretrained(model_id, token=token)
+
+        def is_compatible() -> bool:
+            is_dpr_model = config.model_type == "dpr"
+            has_architectures = (
+                config.architectures is not None and len(config.architectures) == 1
+            )
+            is_supported_architecture = has_architectures and (
+                config.architectures[0] in _DPREncoderWrapper._SUPPORTED_MODELS_NAMES
+            )
+            return is_dpr_model and is_supported_architecture
+
+        if is_compatible():
+            model = getattr(transformers, config.architectures[0]).from_pretrained(
+                model_id, torchscript=True
+            )
+            return _DPREncoderWrapper(model)
+        else:
+            return None
+
+    def forward(
+        self,
+        input_ids: Tensor,
+        attention_mask: Tensor,
+        token_type_ids: Tensor,
+        _position_ids: Tensor,
+    ) -> Tensor:
+        """Wrap the input and output to conform to the native process interface."""
+
+        return self._model(
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+        )
--- a/noxfile.py
+++ b/noxfile.py
@ -116,9 +116,6 @@ def test(session, pandas_version: str):
        "--nbval",
    )

-    # PyTorch 2.3.1 doesn't support Python 3.12
-    if session.python == "3.12":
-        pytest_args += ("--ignore=eland/ml/pytorch",)
    session.run(
        *pytest_args,
        *(session.posargs or ("eland/", "tests/")),
--- a/setup.py
+++ b/setup.py
@ -57,10 +57,10 @@ with open(path.join(here, "README.md"), "r", "utf-8") as f:
 extras = {
    "xgboost": ["xgboost>=0.90,<2"],
    "scikit-learn": ["scikit-learn>=1.3,<1.6"],
-    "lightgbm": ["lightgbm>=2,<4"],
+    "lightgbm": ["lightgbm>=3,<5"],
    "pytorch": [
        "requests<3",
-        "torch==2.3.1",
+        "torch==2.5.1",
        "tqdm",
        "sentence-transformers>=2.1.0,<=2.7.0",
        # sentencepiece is a required dependency for the slow tokenizers
@ -86,7 +86,7 @@ setup(
    keywords="elastic eland pandas python",
    packages=find_packages(include=["eland", "eland.*"]),
    install_requires=[
-        "elasticsearch>=8.3,<9",
+        "elasticsearch>=9,<10",
        "pandas>=1.5,<3",
        "matplotlib>=3.6",
        "numpy>=1.2.0,<2",
--- a/tests/ml/pytorch/test_pytorch_model_config_pytest.py
+++ b/tests/ml/pytorch/test_pytorch_model_config_pytest.py
@ -39,6 +39,7 @@ try:
    from eland.ml.pytorch import (
        FillMaskInferenceOptions,
        NlpBertTokenizationConfig,
+        NlpDebertaV2TokenizationConfig,
        NlpMPNetTokenizationConfig,
        NlpRobertaTokenizationConfig,
        NlpXLMRobertaTokenizationConfig,
@ -57,10 +58,6 @@ except ImportError:
 from tests import ES_VERSION

 pytestmark = [
-    pytest.mark.skipif(
-        ES_VERSION < (8, 15, 1),
-        reason="Eland uses Pytorch 2.3.1, versions of Elasticsearch prior to 8.15.1 are incompatible with PyTorch 2.3.1",
-    ),
    pytest.mark.skipif(
        not HAS_SKLEARN, reason="This test requires 'scikit-learn' package to run"
    ),
@ -149,6 +146,14 @@ if HAS_PYTORCH and HAS_SKLEARN and HAS_TRANSFORMERS:
            1024,
            None,
        ),
+        (
+            "microsoft/deberta-v3-xsmall",
+            "fill_mask",
+            FillMaskInferenceOptions,
+            NlpDebertaV2TokenizationConfig,
+            512,
+            None,
+        ),
    ]
 else:
    MODEL_CONFIGURATIONS = []
--- a/tests/ml/pytorch/test_pytorch_model_upload_pytest.py
+++ b/tests/ml/pytorch/test_pytorch_model_upload_pytest.py
@ -38,10 +38,6 @@ except ImportError:
 from tests import ES_TEST_CLIENT, ES_VERSION

 pytestmark = [
-    pytest.mark.skipif(
-        ES_VERSION < (8, 15, 2),
-        reason="Eland uses Pytorch 2.3.1, versions of Elasticsearch prior to 8.15.2 are incompatible with PyTorch 2.3.1",
-    ),
    pytest.mark.skipif(
        not HAS_SKLEARN, reason="This test requires 'scikit-learn' package to run"
    ),
@ -67,6 +63,8 @@ TEXT_EMBEDDING_MODELS = [
    )
 ]

+TEXT_SIMILARITY_MODELS = ["mixedbread-ai/mxbai-rerank-xsmall-v1"]
+

@pytest.fixture(scope="function", autouse=True)
 def setup_and_tear_down():
@ -135,3 +133,25 @@ class TestPytorchModel:
                    )
                    > 0
                )
+
+    @pytest.mark.skipif(
+        ES_VERSION < (8, 16, 0), reason="requires 8.16.0 for DeBERTa models"
+    )
+    @pytest.mark.parametrize("model_id", TEXT_SIMILARITY_MODELS)
+    def test_text_similarity(self, model_id):
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            ptm = download_model_and_start_deployment(
+                tmp_dir, False, model_id, "text_similarity"
+            )
+            result = ptm.infer(
+                docs=[
+                    {
+                        "text_field": "The Amazon rainforest covers most of the Amazon basin in South America"
+                    },
+                    {"text_field": "Paris is the capital of France"},
+                ],
+                inference_config={"text_similarity": {"text": "France"}},
+            )
+
+            assert result.body["inference_results"][0]["predicted_value"] < 0
+            assert result.body["inference_results"][1]["predicted_value"] > 0
--- a/tests/ml/test_ml_model_pytest.py
+++ b/tests/ml/test_ml_model_pytest.py
@ -22,6 +22,7 @@ import pytest

 from eland.ml import MLModel
 from eland.ml.ltr import FeatureLogger, LTRModelConfig, QueryFeatureExtractor
+from eland.ml.transformers import get_model_transformer
 from tests import (
    ES_IS_SERVERLESS,
    ES_TEST_CLIENT,
@ -219,6 +220,46 @@ class TestMLModel:
        # Clean up
        es_model.delete_model()

+    def _normalize_ltr_score_from_XGBRanker(self, ranker, ltr_model_config, scores):
+        """Normalize the scores of an XGBRanker model as ES implementation of LTR would do.
+
+        Parameters
+        ----------
+        ranker : XGBRanker
+            The XGBRanker model to retrieve the minimum score from.
+
+        ltr_model_config : LTRModelConfig
+            LTR model config.
+
+        Returns
+        -------
+        scores : List[float]
+            Normalized scores for the model.
+        """
+
+        should_rescore = (
+            (ES_VERSION[0] == 8 and ES_VERSION >= (8, 19))
+            or (
+                ES_VERSION[0] == 9
+                and (ES_VERSION[1] >= 1 or (ES_VERSION[1] == 0 and ES_VERSION[2] >= 1))
+            )
+            or ES_IS_SERVERLESS
+        )
+
+        if should_rescore:
+            # In 8.19+, 9.0.1 and 9.1, the scores are normalized if there are negative scores
+            min_model_score, _ = (
+                get_model_transformer(
+                    ranker, feature_names=ltr_model_config.feature_names
+                )
+                .transform()
+                .bounds()
+            )
+            if min_model_score < 0:
+                scores = [score - min_model_score for score in scores]
+
+        return scores
+
    @requires_elasticsearch_version((8, 12))
    @requires_xgboost
    @pytest.mark.parametrize("compress_model_definition", [True, False])
@ -330,6 +371,11 @@ class TestMLModel:
            ],
            reverse=True,
        )
+
+        expected_scores = self._normalize_ltr_score_from_XGBRanker(
+            ranker, ltr_model_config, expected_scores
+        )
+
        np.testing.assert_almost_equal(expected_scores, doc_scores, decimal=2)

        # Verify prediction is not supported for LTR
Author	SHA1	Message	Date
Jan Calanog	cef4710695	docs-builder: add `pull-requests: write` permission to docs-build workflow (#800 )	2025-06-23 15:39:36 +04:00
Quentin Pradet	44ead02b05	Fix lint (#798 )	2025-06-05 15:52:19 +04:00
Miguel Grinberg	cb7c4fb122	Update README.md (#796 ) Update Pandas support to include v2	2025-05-16 15:56:20 +01:00
Quentin Pradet	9e8f164677	Release 9.0.1	2025-04-30 17:25:32 +04:00
Quentin Pradet	3c3ffd7403	Forbid Elasticsearch 8 client or server (#780 )	2025-04-30 16:25:33 +04:00
David Kyle	f5c2dcfc9d	Remove version checks in test (#792 )	2025-04-30 16:24:05 +04:00
David Kyle	878cde6126	Upgrade PyTorch to 2.5.1 (#785 ) PyTorch was upgraded to 2.5.1 in ml-cpp on the 8.18 and 9.0 branches in elastic/ml-cpp#2800	2025-04-30 10:57:45 +01:00
Mark J. Hoy	ec45c395fd	add 9.0.1 for LTR rescoring (#790 )	2025-04-25 08:19:23 -04:00
Quentin Pradet	00dc55b3bd	Update instructions to run ML tests with Elasticsearch (#781 ) * Update instructions to run ML tests with Elasticsearch * Update CONTRIBUTING.md Co-authored-by: David Kyle <david.kyle@elastic.co> --------- Co-authored-by: David Kyle <david.kyle@elastic.co>	2025-04-24 15:42:00 +04:00
Quentin Pradet	8147eb517a	Allow lightgbm 4.6.0 (#782 )	2025-04-24 15:40:39 +04:00
Quentin Pradet	4728d9b648	Run PyTorch tests on 3.12 too (#779 ) PyTorch 2.3.1 does support Python 3.12.	2025-04-24 14:26:50 +04:00
Mark J. Hoy	51a2b9cc19	Add 9.1.0 Snapshot to Build and Fix test_ml_model Tests to Normalized Expected Scores if Min Score is Less Than Zero (#777 ) * normalized expected scores if min is < 0 * only normalize scores for ES after 8.19+ / 9.1+ * add 9.1.0 snapshot to build matrix * get min score from booster trees * removing typing on function definition * properly flatten our tree leaf scores * simplify getting min score * debugging messages * get all the matches in better way * Fix model score normalization. * lint * lint again * lint; correct return for bounds map/list * revert to Aurelian's fix * re-lint :/ --------- Co-authored-by: Aurelien FOUCRET <aurelien.foucret@elastic.co>	2025-04-23 15:53:32 +00:00
David Kyle	a9c36927f6	Fix tokeniser for DeBERTa models (#769 )	2025-04-23 09:10:02 +01:00