Compare commits

...

199 Commits
v8.2.0 ... main

Author SHA1 Message Date
Jan Calanog
cef4710695
docs-builder: add pull-requests: write permission to docs-build workflow (#800) 2025-06-23 15:39:36 +04:00
Quentin Pradet
44ead02b05
Fix lint (#798) 2025-06-05 15:52:19 +04:00
Miguel Grinberg
cb7c4fb122
Update README.md (#796)
Update Pandas support to include v2
2025-05-16 15:56:20 +01:00
Quentin Pradet
9e8f164677
Release 9.0.1 2025-04-30 17:25:32 +04:00
Quentin Pradet
3c3ffd7403
Forbid Elasticsearch 8 client or server (#780) 2025-04-30 16:25:33 +04:00
David Kyle
f5c2dcfc9d
Remove version checks in test (#792) 2025-04-30 16:24:05 +04:00
David Kyle
878cde6126
Upgrade PyTorch to 2.5.1 (#785)
PyTorch was upgraded to 2.5.1 in ml-cpp on the 8.18 and 9.0 branches in elastic/ml-cpp#2800
2025-04-30 10:57:45 +01:00
Mark J. Hoy
ec45c395fd
add 9.0.1 for LTR rescoring (#790) 2025-04-25 08:19:23 -04:00
Quentin Pradet
00dc55b3bd
Update instructions to run ML tests with Elasticsearch (#781)
* Update instructions to run ML tests with Elasticsearch

* Update CONTRIBUTING.md

Co-authored-by: David Kyle <david.kyle@elastic.co>

---------

Co-authored-by: David Kyle <david.kyle@elastic.co>
2025-04-24 15:42:00 +04:00
Quentin Pradet
8147eb517a
Allow lightgbm 4.6.0 (#782) 2025-04-24 15:40:39 +04:00
Quentin Pradet
4728d9b648
Run PyTorch tests on 3.12 too (#779)
PyTorch 2.3.1 does support Python 3.12.
2025-04-24 14:26:50 +04:00
Mark J. Hoy
51a2b9cc19
Add 9.1.0 Snapshot to Build and Fix test_ml_model Tests to Normalized Expected Scores if Min Score is Less Than Zero (#777)
* normalized expected scores if min is < 0

* only normalize scores for ES after 8.19+ / 9.1+

* add 9.1.0 snapshot to build matrix

* get min score from booster trees

* removing typing on function definition

* properly flatten our tree leaf scores

* simplify getting min score

* debugging messages

* get all the matches in better way

* Fix model score normalization.

* lint

* lint again

* lint; correct return for bounds map/list

* revert to Aurelian's fix

* re-lint :/

---------

Co-authored-by: Aurelien FOUCRET <aurelien.foucret@elastic.co>
2025-04-23 15:53:32 +00:00
David Kyle
a9c36927f6
Fix tokeniser for DeBERTa models (#769) 2025-04-23 09:10:02 +01:00
Quentin Pradet
87380ef716
Release 9.0.0
Co-authored-by: Miguel Grinberg <miguel.grinberg@gmail.com>
2025-04-16 15:21:04 +04:00
Quentin Pradet
9ca76d7888
Revert "Release 8.18.0" (#774)
This reverts commit ced3cdfe32bd04e3d127b18f66f9b143b2956564.
2025-04-16 14:53:51 +04:00
Quentin Pradet
ced3cdfe32
Release 8.18.0 2025-04-15 20:52:30 +04:00
kosabogi
87379c53de
[DOCS] Clean up CLI examples in ML docs (#766)
* [DOCS] Clean up CLI examples in ML docs

* Fixes spaces

* Rebuild for testing copy-paste
2025-04-07 10:06:37 +02:00
Paulo
1ddae81769
Update the documentation to reflect the partial support of eland/sckitlearn (#768) 2025-04-03 15:56:23 +02:00
Colleen McGinnis
9302bef7db
remove unused substitutions (#763) 2025-03-21 09:24:09 -05:00
Colleen McGinnis
ca64672fd7
[docs] Migrate docs from AsciiDoc to Markdown (#762)
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
2025-02-26 17:48:16 +01:00
Colleen McGinnis
6692251d9e
add the new ci checks (#761) 2025-02-26 16:40:43 +01:00
David Kyle
ee4d701aa4
Upgrade transformers to 4.47 (#752)
The upgrade fixes a crash tracing the baai/bge-m3 model
2025-02-12 17:30:45 +00:00
Quentin Pradet
acdeeeded2
Allow nox 2025.02.09 (#754) 2025-02-12 16:33:59 +04:00
Quentin Pradet
8350f06ea8
Fix pipeline labels (#751) 2025-02-12 15:07:51 +04:00
Quentin Pradet
e846fb7697
Add backport action (#750) 2025-02-12 15:07:43 +04:00
Quentin Pradet
c4ac64e3a0
Allow scikit-learn 1.5 to address CVE-2024-5206 (#729) 2025-02-12 14:34:13 +04:00
Jan Calanog
214c4645e9
github-action: Add AsciiDoc freeze warning (#748)
* github-action: Add AsciiDoc freeze warning

* Update .github/workflows/comment-on-asciidoc-changes.yml
2025-02-12 07:45:07 +04:00
Quentin Pradet
871e52b37a
Pin nox to avoid session.env issue (#753) 2025-02-11 18:36:57 +04:00
Quentin Pradet
aa5196edee
Switch to black's 2025 code style (#749) 2025-02-11 14:57:16 +04:00
Bart Broere
75c57b0775
Support Pandas 2 (#742)
* Fix test setup to match pandas 2.0 demands

* Use the now deprecated _append method

(Better solution might exist)

* Deal with numeric_only being removed in metrics test

* Skip mad metric for other pandas versions

* Account for differences between pandas versions in describe methods

* Run black

* Check Pandas version first

* Mirror behaviour of installed Pandas version when running value_counts

* Allow passing arguments to the individual asserters

* Fix for method _construct_axes_from_arguments no longer existing

* Skip mad metric if it does not exist

* Account for pandas 2.0 timestamp default behaviour

* Deal with empty vs other inferred data types

* Account for default datetime precision change

* Run Black

* Solution for differences in inferred_type only

* Fix csv and json issues

* Skip two doctests

* Passing a set as indexer is no longer allowed

* Don't validate output where it differs between Pandas versions in the environment

* Update test matrix and packaging metadata

* Update version of Python in the docs

* Update Python version in demo notebook

* Match noxfile

* Symmetry

* Fix trailing comma in JSON

* Revert some changes in setup.py to fix building the documentation

* Revert "Revert some changes in setup.py to fix building the documentation"

This reverts commit ea9879753129d8d8390b3cbbce57155a8b4fb346.

* Use PANDAS_VERSION from eland.common

* Still skip the doctest, but make the output pandas 2 instead of 1

* Still skip doctest, but switch to pandas 2 output

* Prepare for pandas 3

* Reference the right column

* Ignore output in tests but switch to pandas 2 output

* Add line comment about NBVAL_IGNORE_OUTPUT

* Restore missing line and add stderr cell

* Use non-private method instead

* Fix indentation and parameter issues

* If index is not specified, and pandas 1 is present, set it to True

From pandas 2 and upwards, index is set to None by default

* Run black

* Newer version of black might have different opinions?

* Add line comment

* Remove unused import

* Add reason for ignore statement

* Add reason for skip

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2025-02-04 17:43:43 +04:00
Valeriy Khakhutskyy
77589b26b8
Remove ML model export as sklearn Pipeline and clean up code (#744)
* Revert "[ML] Export ML model as sklearn Pipeline (#509)"

This reverts commit 0576114a1d886eafabca3191743a9bea9dc20b1a.

* Keep useful changes

* formatting

* Remove obsolete test matrix configuration and update version references in documentation and Noxfile

* formatting

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2025-02-04 11:36:50 +04:00
Bart Broere
9b5badb941
Drop Python 3.8 support and introduce Python 3.12 CI/CD (#743) 2025-01-22 21:55:57 +04:00
Quentin Pradet
f99adce23f
Build documentation using Docker again (#746) 2025-01-14 18:16:39 +04:00
Quentin Pradet
7774a506ae
Release 8.17.0 2025-01-07 10:58:59 +04:00
Dai Sugimori
82492fe771
Expansion support (#740) 2024-11-23 00:20:58 +09:00
Quentin Pradet
04102f2a4e
Release 8.16.0 2024-11-14 09:07:39 +04:00
Valeriy Khakhutskyy
9aec8fc751
Add deprecation warning for ESGradientBoostingModel subclasses (#738)
Introduce a warning indicating that exporting data frame analytics models as ESGradientBoostingModel subclasses is deprecated and will be removed in version 9.0.0.

The implementation of ESGradientBoostingModel relies on importing undocumented private classes that were changed in 1.4 to https://github.com/scikit-learn/scikit-learn/pull/26278. This dependency makes the code difficult to maintain, while the functionality is not widely used by users. Therefore, we will deprecate this functionality in 8.16 and remove it completely in 9.0.0. 

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-11-11 14:26:11 +01:00
Quentin Pradet
79d9a6ae29
Release 8.15.4 2024-10-18 10:52:52 +04:00
Quentin Pradet
939f4d672c
Revert "Add feedback request to README" (#735) 2024-10-18 08:06:42 +04:00
Quentin Pradet
1312e96220
Revert "Allow reading Elasticsearch certs in Wolfi image" (#734)
This reverts commit 5dabe9c0996e62d8bf4b493dcea7d4bc161dead4.
2024-10-11 16:52:41 +04:00
Quentin Pradet
2916b51fa7
Release 8.15.3 2024-10-09 16:16:52 +04:00
Quentin Pradet
5dabe9c099
Allow reading Elasticsearch certs in Wolfi image (#732)
The config/certs directory of Elasticsearch is not readable by other
users and groups. This work in the public image, which uses the root
user, but the Wolfi image does not. Using the same user id fixes the
problem.
2024-10-09 15:37:05 +04:00
Max Hniebergall
06b65e211e
Add support for DeBERTa-V2 tokenizer (#717) 2024-10-03 14:04:19 -04:00
Quentin Pradet
a45c7bc357
Release 8.15.2 2024-10-02 13:54:03 +04:00
Quentin Pradet
d1e533ffb9
Fix Docker image build on Linux (#728)
* Fix Docker image build on Linux

* Build Docker images in CI

* Fix bash syntax

* Only load, not push

* Parallelize docker build

It's currently the slowest step.

* Only build Linux images
2024-10-02 10:33:35 +04:00
Quentin Pradet
a83ce20fcc
Release 8.15.1 2024-10-01 15:31:24 +04:00
David Kyle
03af8a6319
Fix path in docker model upload example (#726) 2024-10-01 08:53:28 +01:00
David Kyle
5253501704
Upgrade PyTorch to version 2.3.1 (#718)
Upgrades the PyTorch, transformers and sentence transformer requirements.
Elasticsearch has upgraded to PyTorch to 2.3.1 in 8.16 and 8.15.2. For 
compatibility reasons Eland will refuse to upload to an Elasticsearch cluster 
that has is using an earlier version of PyTorch.
2024-09-30 10:22:02 +01:00
David Kyle
ec66b5f320
Add ES 8.16 and 8.15.2 to test matrix (#725) 2024-09-27 13:37:31 +01:00
Quentin Pradet
64d05e4c68
Restore public Dockerfile (#722) 2024-09-25 12:49:46 +04:00
Quentin Pradet
f79180be42
Migrate to Wolfi base Docker image (#720) 2024-09-03 18:02:08 +04:00
Miguel Grinberg
0ce3db26e8
Release 8.15.0 (#715)
* Release 8.15.0

* update release notes
2024-08-13 09:47:48 +01:00
David Kyle
5a76f826df
Add note about using text_similarity for rerank to the CLI (#716) 2024-08-12 14:40:12 +01:00
David Kyle
fd8886da6a
Default truncation to second for text similarity the task type(#713)
In reranking the first input (the query) is generally shorter. In this case
it makes more sense to truncate the second input (the document text)
2024-08-05 11:47:15 +01:00
Aurélien FOUCRET
bee6d0e1f7
Remove input fields from exported LTR models (#708) 2024-07-05 14:31:22 +02:00
Bart Broere
f18aa35e8e
Deal with the possibility of lists (#707) 2024-06-28 22:25:47 +04:00
Quentin Pradet
56a46d0f85
Rename Buildkite team from clients-team to devtools-team (#702) 2024-06-12 11:39:25 +04:00
Quentin Pradet
c497683064
Quote remaining eland[pytorch] for ZSH users (#701) 2024-06-10 16:50:03 +00:00
Quentin Pradet
0ddc21b895
Release 8.14.0 2024-06-10 15:56:43 +04:00
István Zoltán Szabó
5a3e7d78b3
[DOCS] Completes the list of available NLP task types. (#699) 2024-06-10 12:30:07 +02:00
Bart Broere
1014ecdb39
Fix non _source fields missing from the result hits (#693) 2024-06-10 11:09:52 +04:00
David Kyle
632074c0f0
Make eland_import_hub_model script compatible with serverless (#698)
Checks for build_flavor == serverless rather than a version
2024-06-07 14:46:12 +01:00
Bart Broere
35a96ab3f0
Fix missing method str.removeprefix in Python 3.8 (#695) 2024-05-24 10:25:04 +04:00
Quentin Pradet
116416b3e8
Stop duplicating requirements (#691) 2024-05-14 15:59:39 +04:00
Ashok Kumar
5b728c29c1
Replace check for Elasticsearch to str/list in ensure_es_client (#690) 2024-05-04 09:01:31 +04:00
Quentin Pradet
e76b32eee2
Release 8.13.1 2024-05-03 09:20:45 +04:00
Quentin Pradet
fd38e26df1
Support HTTP proxies in eland_import_hub_model (#688)
* Document TLS/SSL options for import script

* Mention --help option

* Add HTTP proxy support

* Mention HTTP_PROXY too

---------

Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-05-02 21:03:44 +04:00
Quentin Pradet
f7f6e0aba9
Document TLS/SSL options for import script (#667) 2024-05-02 18:06:40 +04:00
Aurélien FOUCRET
9cea2385e6
Work around LTR model cache in tests (#685) 2024-04-08 14:00:36 +04:00
Quentin Pradet
1921792df8
Release 8.13.0 2024-03-27 18:18:21 +04:00
David Kyle
c16e36c051
Add Python 3.11 to support matrix (#681) 2024-03-27 10:34:35 +00:00
David Kyle
ae0bba34c6
Upgrade torch to 2.1.2 (#671)
Compatible with Elasticsearch 8.13 where the same upgrade has been made
2024-03-26 10:06:50 +00:00
Iulia Feroli
aaec995b1b
Update overview.asciidoc to replace tuple reference to API Key (#678) 2024-03-21 15:31:19 +04:00
Iulia Feroli
de83f3f905
Improve PyTorch installation instructions (#677) 2024-03-21 14:21:32 +04:00
David Kyle
8e8c49ddbf
Mute the Learning to Rank tests (#676) 2024-03-21 10:13:31 +00:00
David Kyle
5d34dc3cc4
Add override option to specify the model's max input size(#674)
If the max input size cannot be found in the configuration the user
can specify it as a parameter to the eland_import_hub_model script
2024-03-20 10:02:43 +00:00
Bart Broere
9b335315bb
Mirror pandas' to_csv lineterminator instead of line_terminator (#595)
* Mirror pandas' to_csv lineterminator instead of line_terminator

(even though it looks a little weird perhaps)

* Remove squeeze argument

* Revert "Merge branch 'remove-squeeze-argument' into patch-2"

This reverts commit 8b9ab5647e244d78ec3471b80ee7c42e019cf347.

* Don't remove the parameter yet since people might use it

* Add pending deprecation warning

---------

Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-02-23 14:23:58 +04:00
Quentin Pradet
28eda95ba9
Add feedback request to README (#665) 2024-02-15 15:23:45 +04:00
Quentin Pradet
f4b30753ad
Fix CI badge in README (#664) 2024-02-15 15:14:16 +04:00
Bart Broere
33cf029efe
Implement eland.DataFrame.to_json (#661)
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-02-15 11:32:54 +04:00
Aurélien FOUCRET
9d492b03aa
Release 8.12.1
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-02-01 10:50:18 +04:00
Quentin Pradet
fd2ceab846
Run Buildkite docs jobs in pull requests from forks (#652) 2024-01-31 20:55:19 +04:00
Quentin Pradet
02190e74e7
Switch to 2024 black style (#657) 2024-01-31 14:47:19 +04:00
Aurélien FOUCRET
2a6a4b1f06
Fix missing value support for XGBRanker. (#654)
* Fix missing value support for XGBRanker.

* lint

* Sort expected scores

* lint
2024-01-23 18:42:24 +01:00
Quentin Pradet
1190364abb
Release 8.12.0 2024-01-19 12:42:45 +04:00
David Kyle
64216d44fb
Add prefix_string config option to the import model hub script (#642) 2024-01-19 12:06:57 +04:00
Liam Thompson
0a6e3db157
[DOCS] Make online retail notebook runnable in Colab (#641)
* Make online retail notebook runnable in Colab

* Fix broken query
2024-01-18 15:55:20 +04:00
Aurélien FOUCRET
5169cc926a
Improve LTR (#651)
* Ensure the feature logger is using NaN for non matching query feature extractors (consistent with ES).

* Default score is None instead of 0.

* LTR model import API improvements.

* Fix feature logger tests.

* Fix export in eland.ml.ltr

* Apply suggestions from code review

Co-authored-by: Adam Demjen <demjened@gmail.com>

* Fix supported models for LTR

---------

Co-authored-by: Adam Demjen <demjened@gmail.com>
2024-01-17 13:01:47 +04:00
Aurélien FOUCRET
d2291889f8
Fix typo (#650) 2024-01-12 09:34:09 -05:00
Aurélien FOUCRET
d3ed669a5e
LTR feature logger (#648) 2024-01-12 13:52:04 +01:00
Adam Demjen
926f0b9b5c
Add XGBRanker and transformer (#649)
* Add XGBRanker and transformer

* Map XGBoostRegressorTransformer to XGBRanker

* Add unit tests

* Remove unused import

* Revert addition of type

* Update function comment

* Distinguish objective based on model class
2024-01-11 15:48:13 -05:00
Adam Demjen
840871f9d9
Accept LTR inference config when creating model (#645)
* Support for supplying inference_config

* Fix linting errors

* Add unit test

* Add LTR type, throw exception on predict, refine test

* Add search step to LTR test

* Fix linter errors

* Update rescoring assertion in test + type defs

* Fix linting error

* Remove failing assertion
2024-01-08 09:19:03 -05:00
Aurélien FOUCRET
05c5859b8a
Adding a new movie dataset to the tests. (#646) 2024-01-04 16:14:56 +01:00
Aurélien FOUCRET
0f91224daf
Add 8.12 to CI and remove 8.10 (#647) 2024-01-04 10:06:19 -05:00
Bart Broere
927acc86ad
Small cosmetic fix to the docs (#640) 2023-11-30 08:34:59 +01:00
David Kyle
6ef418f465
Release 8.11.1 2023-11-22 11:55:53 +01:00
David Kyle
081250cdec
Fix failed import of ST RoBERTa models (#637)
Fixes an error uploading the sentence-transformers/all-distilroberta-v1 model
which failed with "missing 2 required positional arguments: 'token_type_ids' 
and 'position_ids'". The cause was that the tokenizer type was not recognised 
due to a typo
2023-11-21 12:53:43 +00:00
Quentin Pradet
af26897313
Bumpy numpy and shap (#636) 2023-11-21 13:17:53 +01:00
David Kyle
add61a69ec
Update CI machine types to N2 (#634)
Use `n2-standard-2` for lint and doc builds
Use `n2-standard-4` for tests
2023-11-21 11:33:04 +00:00
David Kyle
b689759278
Skip model config tests (#635)
For #633
2023-11-21 11:07:55 +00:00
Liam Thompson
87d18bd850
Fix colab link (#632)
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2023-11-16 10:24:06 +00:00
Quentin Pradet
dfc522eb31
Allow es-doc members to trigger CI (#631) 2023-11-13 11:55:39 +01:00
Liam Thompson
508de981ff
Make demo notebook runnable in Colab (#630)
* Make demo notebook runnable in Colab

* Index using IDs starting from 0

* Trivial change to trigger CI
2023-11-10 08:44:19 +01:00
Quentin Pradet
41db37246f
Release 8.11.0 2023-11-08 11:51:14 +01:00
Valeriy Khakhutskyy
6cecb454e3
[ML] Better memory estimation for NLP models (#568)
This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference.

This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).
2023-11-06 12:18:20 +01:00
Bart Broere
28e6d92430
Stream writes in to_csv()
Co-authored-by: P. Sai Vinay <pvinay1998@gmail.com>
2023-11-06 11:39:31 +01:00
Quentin Pradet
adf0535608 Fix docs build
Some dependencies like numpy are pinned to versions that do not support
Python 3.12. Python 3.10 is the latest version supported by Eland.
2023-11-06 13:25:30 +04:00
Bart Broere
5e5f36bdf8
Deal with the mad aggregation being removed in Pandas 2 (#602) 2023-11-06 06:12:16 +01:00
David Kyle
5b3a83e7f2
[NLP] Support E5 small multi-lingual (#625)
Although E5 small is a BERT based model it takes 2 parameters to forward
not 4. Use the tokenizer type to decide the number of parameters
2023-10-31 17:49:43 +00:00
David Kyle
ab6e44f430
[NLP] Tests for NLP model configurations (#623)
Add tests for generated Elasticsearch model configurations
2023-10-19 12:39:57 +01:00
Quentin Pradet
0c0a8ab19f
Bump tested stack versions (#621) 2023-10-11 19:48:47 +02:00
Bart Broere
36b941e336
Use _append instead of append since it's still available after 2.0 of pandas (#603) 2023-10-11 15:41:05 +01:00
Quentin Pradet
6a4fd511cc
Release 8.10.1 (#620) 2023-10-11 12:56:24 +02:00
Quentin Pradet
c6ce4b2c46
Fix direct usage of TransformerModel (#619) 2023-10-11 11:56:14 +02:00
Bart Broere
48e290a927
Prepare for deprecation of is_datetime_or_timedelta_dtype in Pandas 2.0 (#592) 2023-10-10 19:37:13 +01:00
Quentin Pradet
bb0c111a68
Release Eland 8.10.0 2023-10-09 11:49:12 +02:00
Quentin Pradet
9273636026
Reduce Docker image size and support arm64 (#615)
Co-authored-by: David Olaru <dolaru@elastic.co>

* Reduce Docker image size from 4.8GB to 2.2GB

* Use torch+cpu variant if target platform is linux/amd64

Avoids downloading large & unnecessary NVIDIA deps defined in the package on PyPI

* Build linux/arm64 image using buildx and QEMU
2023-10-05 18:43:52 +04:00
Quentin Pradet
b8a7b60c03
Stop mentioning Python 3.7 and Pandas 1.13 are supported (#612) 2023-10-04 10:56:51 +02:00
Quentin Pradet
3be610b6fc
Recommend using pre-built Docker image (#614)
* Recommend using pre-built Docker image

* Update README.md

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2023-10-03 19:40:24 +02:00
Quentin Pradet
352e31ed14
Add Buildkite pipeline to push Docker image (#613)
* Add Buildkite pipeline to push Docker image

* Fix lint

* Fix Read the Docs build

* Replace distutils with packaging
2023-10-03 14:39:54 +02:00
Quentin Pradet
9d7c042bdb
Bump transformers to fix private model support (#611) 2023-09-26 14:54:23 +02:00
Enrico Zimuel
235c490e0c
Updated bullseye docker image (#610) 2023-09-26 09:53:24 +02:00
Bart Broere
3908f43905
Remove deprecated check_less_precise (#596) 2023-09-26 07:34:52 +02:00
Quentin Pradet
566bb9e990
Allow importing private HuggingFace models (#608) 2023-09-25 15:10:58 +02:00
Quentin Pradet
5ec760635b
Recommend installing Eland in a virtual environment (#606) 2023-09-22 13:14:05 +02:00
Jonathan Buttner
a8b76c390f
Setting chunk size to 1mb (#605) 2023-09-20 11:40:11 -04:00
Bart Broere
12200039f5
Fix iteritems deprecation (#593) 2023-09-19 12:00:32 +02:00
David Kyle
301cda8d69
Error measuring embedding size for some DPR models (#573)
Fixes an error unpacking a tuple that contains a single element.
2023-09-19 10:44:15 +01:00
Bart Broere
5c5ef63a69
Use the workaround if we can't determine the server's version (#581) 2023-09-15 15:29:36 +04:00
Quentin Pradet
eb69496627
Add dummy pipeline to prepare publishing a Docker image (#590) 2023-09-06 07:12:06 +02:00
Quentin Pradet
64ffbcec0f
Revert "Update Docker image to Debian 12 Bookworm (#586)" (#588) 2023-09-05 12:36:42 +04:00
Quentin Pradet
4d2c6e2f4d
Fix Buildkite builds on pull requests (#589) 2023-09-05 12:20:24 +04:00
Quentin Pradet
ea4c2d1251
Fix downloads badge URL (#587) 2023-09-05 11:57:36 +04:00
Quentin Pradet
c7a58e3783
Fix README so that copy/pastes work without warnings (#584) 2023-09-05 11:56:25 +04:00
Quentin Pradet
0be509730a
Update Docker image to Debian 12 Bookworm (#586) 2023-09-04 19:28:38 +04:00
David Kyle
95864a9ace
Update README.md with note about installing extras for NLP (#582) 2023-08-31 10:34:36 +01:00
Enrico Zimuel
f14bbaf4b0
Added build and twine to requirements-dev 2023-08-24 16:02:12 +02:00
Enrico Zimuel
ac8c7c341e
Readded author info 2023-08-24 11:18:17 +02:00
Enrico Zimuel
2304fdc593
Updated docs 2023-08-24 11:12:30 +02:00
Enrico Zimuel
ebdebdf16f
Prep for 8.9.0 release 2023-08-24 11:11:48 +02:00
Enrico Zimuel
932092c0e5
Fixed test for mean using ES 8.9.0 2023-08-24 10:46:14 +02:00
Enrico Zimuel
08b7fac32b
Updated test to ES 8.9-SNAPSHOT 2023-08-23 13:53:15 +02:00
Enrico Zimuel
bb59a4f8d6
Fixed conf test with isinstance 2023-08-22 13:23:23 +02:00
Josh Devins
f26fb8a430
Simplify embedding model support and loading (#569)
We were attempting to load SentenceTransformers by looking at the model
prefix, however SentenceTransformers can also be loaded from other
orgs in the model hub, as well as from local disk. This prefix checking
failed in those two cases. To simplify the loading logic and deciding
which wrapper to use, we’ve removed support for text_embedding tasks to
load a plain Transformer. We now only support DPR embedding models and
SentenceTransformer embedding models. If you try to load a plain
Transformer model, it will be loaded by SentenceTransformers and a mean
pooling layer will automatically be added by the SentenceTransformer
library. Since we no longer automatically support non-DPR and
non-SentenceTransformers, we should include somewhere example code for
how to load a custom model without DPR or SentenceTransformers. 

See: https://github.com/UKPLab/sentence-transformers/blob/v2.2.2/sentence_transformers/SentenceTransformer.py#L801

Resolves #531
2023-07-31 18:18:46 +02:00
Fernando Briano
7ad1f430e4
[CI] Adds buildkite pull requests configuration (#570) 2023-07-26 13:43:40 +01:00
Youhei Sakurai
4cf92fd9b7
Make eland_import_hub_model easier to find on Windows. (#559) 2023-07-20 09:24:35 +01:00
Fernando Briano
664180d93d
[CI] Removes Jenkins .ci folder (#561)
Continuing the migration to Buildkite.
2023-07-18 13:32:30 +01:00
Fernando Briano
2134c71ab4
Add Buildkite configuration (#515)
* [CI] Adds Buildkite configuration
* Removes GitHub Actions
* Moves lint and docs tasks to Buildkite
2023-07-17 14:08:41 +01:00
Youhei Sakurai
b5bcba713d
Apply black to comply with the code style (#557)
Relates https://github.com/elastic/eland/pull/552

**Issue**:

```console
C:\Users\YouheiSakurai\git\myeland>python -m black --version
python -m black, 23.3.0 (compiled: yes)
Python (CPython) 3.11.0

C:\Users\YouheiSakurai\git\myeland>python -m black --check --target-version=py38 bin\eland_import_hub_model
would reformat bin\eland_import_hub_model

Oh no! 💥 💔 💥
1 file would be reformatted.
```

**Solution**:
```
C:\Users\YouheiSakurai\git\myeland>python -m black --target-version=py38 bin\eland_import_hub_model
reformatted bin\eland_import_hub_model

All done!  🍰 
1 file reformatted.
```
2023-07-13 09:55:00 +02:00
Valeriy Khakhutskyy
77781b90ff
[ML] Update trained model inference endpoint (#556)
Infer trained model deployment API has been deprecated, so I changed the code to use the new one.
2023-07-11 10:55:11 +02:00
Valeriy Khakhutskyy
f38de0ed05
Fix failing unit tests (#558)
I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility.

Fixes #555
2023-07-10 15:15:58 +02:00
Youhei Sakurai
5ac8a053f0
Fix No module named 'torch' (#553)
Do not import torch unless necessary
2023-07-07 09:11:11 +01:00
Youhei Sakurai
55967a7324
Minimize if main section (#554)
For migration from scripts to console_scripts in setup.py,
the current long if __name__ == "__main__": section is a 
blocker because the console_scripts requires to specify a
function as an entrypoint.
Move the logic into a main() function.
2023-07-05 10:49:16 +01:00
Dai Sugimori
bf3b092ed4
Add BertJapaneseTokenizer support with bert_ja tokenization configuration (#534)
See elasticsearch#95546
2023-06-23 08:14:27 +01:00
Seth Michael Larson
5fd1221815
Fix autosummary directive by removing hack autosummaries 2023-06-15 10:50:19 -05:00
Seth Michael Larson
17c1c2e9c7
Switch to the 'Furo' Sphinx theme 2023-06-15 09:51:14 -05:00
Benjamin Trent
8b327f60b8
[ML] add ability to upload xlm-roberta tokenized models (#518)
This allows XLMRoberta models to be uploaded to Elasticsearch.

blocked by: elastic/elasticsearch#94089
2023-06-14 07:59:28 -04:00
David Kyle
68a22a8001
Default the optional es_version parameter (#545) 2023-06-07 12:34:53 +01:00
Seth Michael Larson
afc7e41d6e
Update Dockerfile base image to use newer version 2023-06-02 14:20:01 -05:00
David Kyle
32ab988eb6
Tolerate different model output formats when measuring embedding size (#535)
Only add the embedding_size config option if the target Elasticsearch 
cluster version supports it
2023-05-25 12:25:31 -05:00
David Kyle
7ca8376f68
Add Elasticsearch 8.8 snapshot to test matrix (#543)
And increase the test ES node heap size to prevent circuit 
breaker exceptions due to better memory accounting in
elastic/elasticsearch#89437.
2023-05-24 11:59:41 +01:00
István Zoltán Szabó
e0c08e42a0
[DOCS] Adds instructions on model install in air-gapped env (#542)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-05-24 12:53:04 +02:00
David Kyle
1e6f48f8f4
Generate valid NLP model id from file path (#541)
The eland_import_hub_model script supports uploading a local file where
the --hub-model-id argument is a file path. If the --es-model-id option is
not used the model Id is generated from the hub model id and when that 
is a file path the path must be converted to a valid elasticsearch model id.
2023-05-22 15:37:36 +01:00
David Kyle
7820a31256
Limit NumPy to a range of versions and note why (#540) 2023-05-22 10:47:06 +01:00
David Kyle
36bbbe0bdb
Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model. (#522)
PyTorch models traced in version 1.13 of PyTorch cannot be evaluated in 
version 1.9 or earlier. With this upgrade Eland becomes incompatible with
pre 8.7 Elasticsearch and will refuse to upload a model to the cluster. 
In this scenario either upgrade Elasticsearch or use an earlier version of Eland.
2023-05-19 16:29:38 +01:00
David Kyle
b507bb6d6c
Restrict NumPy and Pandas versions (#539)
Shap is incompatible with NumPy 1.24 due to a deprecated usage becoming
an error. There is no fix in Shap yet so an earlier version of NumPy must
be used.
Pandas 2.0 was recently released we will continue to use the latest 1.5 release 
to avoid any incompatibilities.
2023-05-19 16:04:33 +01:00
Seth Michael Larson
f7ea3bd476
Add a compatibility layer for Elasticsearch server 8.5.0 field_caps API 2023-05-02 15:40:20 -05:00
Seth Michael Larson
ca0cbe94ea
Fix readthedocs with Python 3.8 2023-05-02 12:21:57 -05:00
David Kyle
50d301f7cb
Set embedding_size config parameter for Text Embedding models (#532) 2023-04-25 11:41:14 +01:00
David Kyle
940f2a9bad
[NLP] Add support for the pass_through task #526 2023-04-06 15:43:00 +01:00
David Kyle
8e0d897171
[NLP] Prevent TypeError with None check (#525) 2023-04-03 14:56:19 +01:00
David Roberts
cebee6406f
Include pitfall of --start in the README (#506)
Users who follow the Eland README as a guide to importing
models can easily end up seeing inexplicably poor performance
due to unknowingly running the model with one allocation and
one thread per allocation.

This change spells out the effect of `--start` and links to
alternatives that allow better use of available hardware.

Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-03-30 20:28:48 +01:00
Seth Michael Larson
44e04b4905
Release v8.7.0 2023-03-30 14:00:02 -05:00
David Kyle
7f4687c791
[ML] Text expansion model config support (#520) 2023-03-08 15:40:14 +00:00
Benjamin Trent
d5578637cb
Choose text_embedding from auto when task type is unknown but its a sentence-transfomers model (#516)
closes https://github.com/elastic/eland/issues/514
2023-02-09 12:50:30 -05:00
Valeriy Khakhutskyy
0576114a1d
[ML] Export ML model as sklearn Pipeline (#509)
Closes #503

Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.
2023-02-01 16:17:06 +01:00
Valeriy Khakhutskyy
2ea96322b3
Update to latest ES versions and fix unit tests (#512)
Update the test matrix to the latest Elasticsearch versions and fix the broken unit tests on the CI.
2023-01-31 20:55:29 +01:00
David Kyle
c55516f376
Fixes for two type hinting issues 2023-01-04 09:53:09 -06:00
David Kyle
211cc2c83f
Handle OSError for missing LightGBM dependency
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2022-11-02 11:32:27 -05:00
Benjamin Trent
82e34dbddb
Minor formatting fix for ML docs 2022-10-20 09:47:55 -05:00
Benjamin Trent
a8c8726634
[ML] add text_similarity task support (#486)
Adds text_similarity task support. This is a cross-encoder transformer task where both sequences are given to the transformer at once.

According to 🤗 (or at least how the cross-encoder models are concerned) this is a sequence classification task with just one classification "label". But really, it isn't labeled at all and is more akin to a regression model.

related: elastic/elasticsearch#88439
2022-08-01 09:04:34 -04:00
Benjamin Trent
11ea68a443
Add docker steps for eland model upload (#489) 2022-07-21 15:27:19 -04:00
István Zoltán Szabó
fbb01e5698
[DOCS] Adds important note about PyTorch version compatibility. (#487) 2022-07-13 12:41:35 +02:00
Seth Michael Larson
c97e69410d
Release v8.3.0 2022-07-11 13:14:13 -05:00
David Kyle
0eb36faa5b
Restrict PyTorch version not to be more advanced than that used in Elasticsearch (#479)
Elasticsearch uses v1.11 of PyTorch. Models created with the latest PyTorch 
release (v1.12) are not compatible with v1.11. This pins the PyTorch version
to 1.11 to prevent the incompatibility. The version of the Elasticsearch Python
client is now required to be >= Eland.

All users of Eland for importing NLP models should upgrade.
2022-07-07 14:56:42 +01:00
Benjamin Trent
947d4d22a9
Update python example (#477) 2022-06-28 13:01:49 -04:00
David Kyle
23706e05b8
Add more exclusions to the dockerignore file 2022-06-28 10:34:02 -05:00
Benjamin Trent
8892f4fd64
[ML] adds new auto task type that attempts to automatically determine NLP task type from model config (#475)
For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture. 

This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.
2022-06-23 08:32:23 -04:00
David Kyle
8448b3ba4e
Bump minimum PyTorch version to 1.11 2022-06-21 07:43:43 -05:00
David Kyle
081c8efaa0
Freeze the traced PyTorch model 2022-06-21 07:43:18 -05:00
Benjamin Trent
ec041ffdfd
[ML] ensure quantization is applied (#472) 2022-06-15 09:23:24 -04:00
Lisa Cawley
07af00c741
[DOCS] Include missing attributes (#468)
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2022-05-31 15:50:11 -07:00
Seth Michael Larson
bbe7a70cb9 Also pin traitlets 2022-05-31 14:28:36 -07:00
Seth Michael Larson
14821a8b09 Remove 'numpydoc' to stop reformatting 2022-05-31 14:28:36 -07:00
Seth Michael Larson
673065ee42 Stop explicitly pulling master 2022-05-31 14:28:36 -07:00
Lisa Cawley
845c055d7c
[DOCS] Adds question_answering task type for eland_import_hub_model 2022-05-31 14:37:51 -05:00
Nigel Small
a4838f4d22
Ignore type checking for agg_value 2022-05-31 09:23:15 -05:00
Lisa Cawley
09dd56c399
Add authentication methods for import model script (#466) 2022-05-18 07:44:37 -07:00
Benjamin Trent
fa30246937
[ML] fixes decision tree classifier upload to account for probabilities (#465)
This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree.

The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results.

This means that our "probabilities" returned will be different than sklearn.
2022-05-17 08:11:20 -04:00
248 changed files with 14592 additions and 7476 deletions

View File

@ -1,6 +1,8 @@
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}
ENV FORCE_COLOR=1
WORKDIR /code/eland
RUN python -m pip install nox

View File

@ -0,0 +1,11 @@
#!/usr/bin/env bash
set -eo pipefail
export LC_ALL=en_US.UTF-8
echo "--- Building the Wolfi image"
# Building the linux/arm64 image takes about one hour on Buildkite, which is too slow
docker build --file Dockerfile.wolfi .
echo "--- Building the public image"
docker build .

View File

@ -0,0 +1,8 @@
#!/usr/bin/env bash
docker build --file .buildkite/Dockerfile --tag elastic/eland --build-arg PYTHON_VERSION=${PYTHON_VERSION} .
docker run \
--name doc_build \
--rm \
elastic/eland \
bash -c "apt-get update && apt-get install --yes pandoc && nox -s docs"

7
.buildkite/lint-code.sh Executable file
View File

@ -0,0 +1,7 @@
#!/usr/bin/env bash
docker build --file .buildkite/Dockerfile --tag elastic/eland --build-arg PYTHON_VERSION=${PYTHON_VERSION} .
docker run \
--name linter \
--rm \
elastic/eland \
nox -s lint

50
.buildkite/pipeline.yml Normal file
View File

@ -0,0 +1,50 @@
steps:
- label: ":terminal: Lint code"
env:
PYTHON_VERSION: 3
agents:
provider: "gcp"
machineType: "n2-standard-2"
commands:
- ./.buildkite/lint-code.sh
- label: ":books: Build documentation"
env:
PYTHON_VERSION: 3.9-bookworm
agents:
provider: "gcp"
machineType: "n2-standard-2"
commands:
- ./.buildkite/build-documentation.sh
- label: ":docker: Build Wolfi image"
env:
PYTHON_VERSION: 3.11-bookworm
agents:
provider: "gcp"
machineType: "n2-standard-2"
commands:
- ./.buildkite/build-docker-images.sh
- label: ":python: {{ matrix.python }} :elasticsearch: {{ matrix.stack }} :pandas: {{ matrix.pandas }}"
agents:
provider: "gcp"
machineType: "n2-standard-4"
env:
PYTHON_VERSION: "{{ matrix.python }}"
PANDAS_VERSION: "{{ matrix.pandas }}"
TEST_SUITE: "xpack"
ELASTICSEARCH_VERSION: "{{ matrix.stack }}"
matrix:
setup:
# Python and pandas versions need to be added to the nox configuration too
# (in the decorators of the test method in noxfile.py)
pandas:
- '1.5.0'
- '2.2.3'
python:
- '3.12'
- '3.11'
- '3.10'
- '3.9'
stack:
- '9.0.0'
- '9.1.0-SNAPSHOT'
command: ./.buildkite/run-tests

View File

@ -0,0 +1,28 @@
{
"jobs": [
{
"enabled": true,
"pipeline_slug": "eland",
"allow_org_users": true,
"allowed_repo_permissions": ["admin", "write"],
"build_on_commit": true,
"build_on_comment": true,
"trigger_comment_regex": "^(?:(?:buildkite\\W+)?(?:build|test)\\W+(?:this|it))",
"always_trigger_comment_regex": "^(?:(?:buildkite\\W+)?(?:build|test)\\W+(?:this|it))",
"skip_ci_labels": ["skip-ci"],
"skip_ci_on_only_changed": ["\\.md$"]
},
{
"enabled": true,
"pipeline_slug": "docs-build-pr",
"allow_org_users": true,
"allowed_repo_permissions": ["admin", "write"],
"build_on_commit": true,
"build_on_comment": true,
"trigger_comment_regex": "^(?:(?:buildkite\\W+)?(?:build|test)\\W+(?:this|it))",
"always_trigger_comment_regex": "^(?:(?:buildkite\\W+)?(?:build|test)\\W+(?:this|it))",
"skip_ci_labels": ["skip-ci"],
"skip_ci_on_only_changed": ["\\.md$"]
}
]
}

View File

@ -0,0 +1,28 @@
steps:
- input: "Build parameters"
fields:
- text: "Release version"
key: "RELEASE_VERSION"
default: ""
format: "\\d{1,}.\\d{1,}.\\d{1,}"
hint: "The version to release e.g. '8.10.0' (without the v prefix)."
- select: "Environment"
key: "ENVIRONMENT"
options:
- label: "Staging"
value: "staging"
- label: "Production"
value: "production"
- wait
- label: "Release Docker Artifacts for Eland"
command: |
set -eo pipefail
export RELEASE_VERSION=$(buildkite-agent meta-data get RELEASE_VERSION)
export ENVIRONMENT=$(buildkite-agent meta-data get ENVIRONMENT)
export BUILDKIT_PROGRESS=plain
bash .buildkite/release-docker/run.sh
# Run on GCP to use `docker`
agents:
provider: gcp

View File

@ -0,0 +1,37 @@
#!/usr/bin/env bash
set -eo pipefail
export LC_ALL=en_US.UTF-8
echo "Publishing Eland $RELEASE_VERSION Docker image to $ENVIRONMENT"
set +x
# login to docker registry
docker_registry=$(vault read -field registry "secret/ci/elastic-eland/container-library/eland-$ENVIRONMENT")
docker_username=$(vault read -field username "secret/ci/elastic-eland/container-library/eland-$ENVIRONMENT")
docker_password=$(vault read -field password "secret/ci/elastic-eland/container-library/eland-$ENVIRONMENT")
echo "$docker_password" | docker login "$docker_registry" --username "$docker_username" --password-stdin
unset docker_username docker_password
set -x
tmp_dir=$(mktemp --directory)
pushd "$tmp_dir"
git clone https://github.com/elastic/eland
pushd eland
git checkout "v${RELEASE_VERSION}"
git --no-pager show
# Create builder that supports QEMU emulation (needed for linux/arm64)
docker buildx rm --force eland-multiarch-builder || true
docker buildx create --name eland-multiarch-builder --bootstrap --use
docker buildx build --push \
--file Dockerfile.wolfi \
--tag "$docker_registry/eland/eland:$RELEASE_VERSION" \
--tag "$docker_registry/eland/eland:latest" \
--platform linux/amd64,linux/arm64 \
"$PWD"
popd
popd
rm -rf "$tmp_dir"

View File

@ -16,7 +16,12 @@ fi
set -euxo pipefail
SCRIPT_PATH=$(dirname $(realpath -s $0))
# realpath on MacOS use different flags than on Linux
if [[ "$OSTYPE" == "darwin"* ]]; then
SCRIPT_PATH=$(dirname $(realpath $0))
else
SCRIPT_PATH=$(dirname $(realpath -s $0))
fi
moniker=$(echo "$ELASTICSEARCH_VERSION" | tr -C "[:alnum:]" '-')
suffix=rest-test
@ -37,6 +42,11 @@ NETWORK_NAME=${NETWORK_NAME-"$network_default"}
set +x
# Set vm.max_map_count kernel setting to 262144 if we're in CI
if [[ "$BUILDKITE" == "true" ]]; then
sudo sysctl -w vm.max_map_count=262144
fi
function cleanup_volume {
if [[ "$(docker volume ls -q -f name=$1)" ]]; then
echo -e "\033[34;1mINFO:\033[0m Removing volume $1\033[0m"
@ -44,7 +54,7 @@ function cleanup_volume {
fi
}
function container_running {
if [[ "$(docker ps -q -f name=$1)" ]]; then
if [[ "$(docker ps -q -f name=$1)" ]]; then
return 0;
else return 1;
fi
@ -106,6 +116,12 @@ environment=($(cat <<-END
--env node.attr.testattr=test
--env path.repo=/tmp
--env repositories.url.allowed_urls=http://snapshot.test*
--env ELASTIC_PASSWORD=$ELASTIC_PASSWORD
--env xpack.license.self_generated.type=trial
--env xpack.security.enabled=false
--env xpack.security.http.ssl.enabled=false
--env xpack.security.transport.ssl.enabled=false
--env xpack.ml.max_machine_memory_percent=90
END
))
@ -114,29 +130,14 @@ volumes=($(cat <<-END
END
))
if [[ "$ELASTICSEARCH_VERSION" != *oss* ]]; then
environment+=($(cat <<-END
--env ELASTIC_PASSWORD=$ELASTIC_PASSWORD
--env xpack.license.self_generated.type=trial
--env xpack.security.enabled=false
--env xpack.security.http.ssl.enabled=false
--env xpack.security.transport.ssl.enabled=false
--env xpack.ml.max_machine_memory_percent=90
END
))
fi
url="http://$NODE_NAME"
if [[ "$ELASTICSEARCH_VERSION" != *oss* ]]; then
url="http://elastic:$ELASTIC_PASSWORD@$NODE_NAME"
fi
url="http://elastic:$ELASTIC_PASSWORD@$NODE_NAME"
# Pull the container, retry on failures up to 5 times with
# short delays between each attempt. Fixes most transient network errors.
docker_pull_attempts=0
until [ "$docker_pull_attempts" -ge 5 ]
do
docker pull docker.elastic.co/elasticsearch/"$ELASTICSEARCH_VERSION" && break
docker pull docker.elastic.co/elasticsearch/$ELASTICSEARCH_VERSION && break
docker_pull_attempts=$((docker_pull_attempts+1))
sleep 10
done
@ -146,7 +147,7 @@ set -x
docker run \
--name "$NODE_NAME" \
--network "$NETWORK_NAME" \
--env ES_JAVA_OPTS=-"Xms1g -Xmx1g" \
--env ES_JAVA_OPTS=-"Xms2g -Xmx2g" \
"${environment[@]}" \
"${volumes[@]}" \
--publish "$HTTP_PORT":9200 \

View File

@ -12,7 +12,7 @@
# When run in CI the test-matrix is used to define additional variables
# TEST_SUITE -- either `oss` or `xpack`, defaults to `oss` in `run-tests`
# TEST_SUITE -- `xpack`
#
PYTHON_VERSION=${PYTHON_VERSION-3.8}
@ -25,7 +25,7 @@ echo -e "\033[34;1mINFO:\033[0m PANDAS_VERSION ${PANDAS_VERSION}\033[0m"
echo -e "\033[1m>>>>> Build [elastic/eland container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m"
docker build --file .ci/Dockerfile --tag elastic/eland --build-arg PYTHON_VERSION=${PYTHON_VERSION} .
docker build --file .buildkite/Dockerfile --tag elastic/eland --build-arg PYTHON_VERSION=${PYTHON_VERSION} .
echo -e "\033[1m>>>>> Run [elastic/eland container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m"

View File

@ -9,11 +9,9 @@ if [[ -z $ELASTICSEARCH_VERSION ]]; then
fi
set -euxo pipefail
TEST_SUITE=${TEST_SUITE-xpack}
NODE_NAME=localhost
PANDAS_VERSION=${PANDAS_VERSION-1.3.0}
PANDAS_VERSION=${PANDAS_VERSION-1.5.0}
elasticsearch_image=elasticsearch
elasticsearch_url=http://elastic:changeme@${NODE_NAME}:9200
@ -29,7 +27,7 @@ function cleanup {
NODE_NAME=${NODE_NAME} \
NETWORK_NAME=elasticsearch \
CLEANUP=true \
bash ./.ci/run-elasticsearch.sh
bash ./.buildkite/run-elasticsearch.sh
# Report status and exit
if [[ "$status" == "0" ]]; then
echo -e "\n\033[32;1mSUCCESS run-tests\033[0m"
@ -41,15 +39,15 @@ function cleanup {
}
trap cleanup EXIT
echo -e "\033[1m>>>>> Start [$ELASTICSEARCH_VERSION container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m"
echo "--- :elasticsearch: Starting Elasticsearch"
ELASTICSEARCH_VERSION=${elasticsearch_image}:${ELASTICSEARCH_VERSION} \
NODE_NAME=${NODE_NAME} \
NETWORK_NAME=host \
DETACH=true \
bash .ci/run-elasticsearch.sh
bash .buildkite/run-elasticsearch.sh
echo -e "\033[1m>>>>> Repository specific tests >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m"
echo "+++ :python: Run tests"
ELASTICSEARCH_CONTAINER=${elasticsearch_image}:${ELASTICSEARCH_VERSION} \
NETWORK_NAME=host \
@ -57,5 +55,4 @@ ELASTICSEARCH_CONTAINER=${elasticsearch_image}:${ELASTICSEARCH_VERSION} \
ELASTICSEARCH_URL=${elasticsearch_url} \
TEST_SUITE=${TEST_SUITE} \
PANDAS_VERSION=${PANDAS_VERSION} \
bash .ci/run-repository.sh
bash .buildkite/run-repository.sh

View File

@ -1,82 +0,0 @@
---
##### GLOBAL METADATA
- meta:
cluster: clients-ci
##### JOB DEFAULTS
- job:
project-type: matrix
logrotate:
daysToKeep: 30
numToKeep: 100
parameters:
- string:
name: branch_specifier
default: refs/heads/main
description: the Git branch specifier to build (&lt;branchName&gt;, &lt;tagName&gt;,
&lt;commitId&gt;, etc.)
properties:
- github:
url: https://github.com/elastic/eland
- inject:
properties-content: HOME=$JENKINS_HOME
concurrent: true
node: flyweight
scm:
- git:
name: origin
credentials-id: f6c7695a-671e-4f4f-a331-acdce44ff9ba
reference-repo: /var/lib/jenkins/.git-references/eland.git
branches:
- ${branch_specifier}
url: git@github.com:elastic/eland.git
basedir: ''
wipe-workspace: 'True'
triggers:
- github
axes:
- axis:
type: slave
name: label
values:
- linux
- axis:
type: yaml
filename: .ci/test-matrix.yml
name: ELASTICSEARCH_VERSION
- axis:
type: yaml
filename: .ci/test-matrix.yml
name: PYTHON_VERSION
- axis:
type: yaml
filename: .ci/test-matrix.yml
name: PANDAS_VERSION
- axis:
type: yaml
filename: .ci/test-matrix.yml
name: TEST_SUITE
yaml-strategy:
exclude-key: exclude
filename: .ci/test-matrix.yml
wrappers:
- ansicolor
- timeout:
type: absolute
timeout: 120
fail: true
- timestamps
- workspace-cleanup
builders:
- shell: |-
#!/usr/local/bin/runbld
.ci/run-tests
publishers:
- email:
recipients: build-lang-clients@elastic.co
- junit:
results: "build/output/*-junit.xml"
allow-empty-results: true

View File

@ -1,14 +0,0 @@
---
- job:
name: elastic+eland+7.x
display-name: 'elastic / eland # 7.x'
description: Eland is a data science client with a Pandas-like interface
junit_results: "*-junit.xml"
parameters:
- string:
name: branch_specifier
default: refs/heads/7.x
description: The Git branch specifier to build
triggers:
- github
- timed: '@daily'

View File

@ -1,14 +0,0 @@
---
- job:
name: elastic+eland+main
display-name: 'elastic / eland # main'
description: Eland is a data science client with a Pandas-like interface
junit_results: "*-junit.xml"
parameters:
- string:
name: branch_specifier
default: refs/heads/main
description: The Git branch specifier to build
triggers:
- github
- timed: '@daily'

View File

@ -1,19 +0,0 @@
---
- job:
name: elastic+eland+pull-request
display-name: 'elastic / eland # pull-request'
description: Testing of eland pull requests.
scm:
- git:
branches:
- ${ghprbActualCommit}
refspec: +refs/pull/*:refs/remotes/origin/pr/*
triggers:
- github-pull-request:
org-list:
- elastic
allow-whitelist-orgs-as-admins: true
github-hooks: true
status-context: clients-ci
cancel-builds-on-update: true
publishers: []

View File

@ -1,20 +0,0 @@
---
ELASTICSEARCH_VERSION:
- '8.1.0-SNAPSHOT'
- '8.0.0-SNAPSHOT'
PANDAS_VERSION:
- '1.2.0'
- '1.3.0'
PYTHON_VERSION:
- '3.10'
- '3.9'
- '3.8'
- '3.7'
TEST_SUITE:
- xpack
exclude: ~

View File

@ -1,4 +1,62 @@
docs/*
# docs and example
example/*
# Git
.git
# Nox
.nox
# Compiled python modules.
*.pyc
__pycache__/
# Setuptools distribution folder.
dist/
# Build folder
build/
# pytest results
tests/dataframe/results/*csv
result_images/
# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
eland.egg-info/
# PyCharm files
.idea/
# vscode files
.vscode/
# pytest files
.pytest_cache/
# Ignore MacOSX files
.DS_Store
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# Environments
.env
.venv
.nox
env/
venv/
ENV/
env.bak/
venv.bak/
.mypy_cache
# Coverage
.coverage

26
.github/workflows/backport.yml vendored Normal file
View File

@ -0,0 +1,26 @@
name: Backport
on:
pull_request_target:
types:
- closed
- labeled
jobs:
backport:
name: Backport
runs-on: ubuntu-latest
# Only react to merged PRs for security reasons.
# See https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request_target.
if: >
github.event.pull_request.merged
&& (
github.event.action == 'closed'
|| (
github.event.action == 'labeled'
&& contains(github.event.label.name, 'backport')
)
)
steps:
- uses: tibdex/backport@9565281eda0731b1d20c4025c43339fb0a23812e # v2.0.4
with:
github_token: ${{ secrets.GITHUB_TOKEN }}

View File

@ -1,38 +0,0 @@
name: CI
on: [push, pull_request]
defaults:
run:
shell: bash
jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v2
- name: Set up Python 3
uses: actions/setup-python@v2
with:
python-version: 3
- name: Install dependencies
run: python3 -m pip install nox
- name: Lint the code
run: nox -s lint
docs:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v2
- name: Set up Python 3
uses: actions/setup-python@v2
with:
python-version: 3
- name: Install dependencies
run: |
sudo apt-get install --yes pandoc
python3 -m pip install nox
- name: Build documentation
run: nox -s docs

19
.github/workflows/docs-build.yml vendored Normal file
View File

@ -0,0 +1,19 @@
name: docs-build
on:
push:
branches:
- main
pull_request_target: ~
merge_group: ~
jobs:
docs-preview:
uses: elastic/docs-builder/.github/workflows/preview-build.yml@main
with:
path-pattern: docs/**
permissions:
deployments: write
id-token: write
contents: read
pull-requests: write

14
.github/workflows/docs-cleanup.yml vendored Normal file
View File

@ -0,0 +1,14 @@
name: docs-cleanup
on:
pull_request_target:
types:
- closed
jobs:
docs-preview:
uses: elastic/docs-builder/.github/workflows/preview-cleanup.yml@main
permissions:
contents: none
id-token: write
deployments: write

14
.readthedocs.yml Normal file
View File

@ -0,0 +1,14 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.11"
python:
install:
- path: .
- requirements: docs/requirements-docs.txt
sphinx:
configuration: docs/sphinx/conf.py

View File

@ -2,6 +2,331 @@
Changelog
=========
9.0.1 (2025-04-30)
------------------
* Forbid Elasticsearch 8 client or server (`#780 <https://github.com/elastic/eland/pull/780>`_)
* Fix DeBERTa tokenization (`#769 <https://github.com/elastic/eland/pull/769>`_)
* Upgrade PyTorch to 2.5.1 (`#785 <https://github.com/elastic/eland/pull/785>`_)
* Upgrade LightGBM to 4.6.0 (`#782 <https://github.com/elastic/eland/pull/782>`_)
9.0.0 (2025-04-15)
------------------
* Drop Python 3.8, Support Python 3.12 (`#743 <https://github.com/elastic/eland/pull/743>`_)
* Support Pandas 2 (`#742 <https://github.com/elastic/eland/pull/742>`_)
* Upgrade transformers to 4.47 (`#752 <https://github.com/elastic/eland/pull/752>`_)
* Remove ML model export as sklearn Pipeline (`#744 <https://github.com/elastic/eland/pull/744>`_)
* Allow scikit-learn 1.5 (`#729 <https://github.com/elastic/eland/pull/729>`_)
* Migrate docs from AsciiDoc to Markdown (`#762 <https://github.com/elastic/eland/pull/762>`_)
8.17.0 (2025-01-07)
-------------------
* Support sparse embedding models such as SPLADE-v3-DistilBERT (`#740 <https://github.com/elastic/eland/pull/740>`_)
8.16.0 (2024-11-13)
-------------------
* Add deprecation warning for ESGradientBoostingModel subclasses (`#738 <https://github.com/elastic/eland/pull/738>`_)
8.15.4 (2024-10-17)
-------------------
* Revert "Allow reading Elasticsearch certs in Wolfi image" (`#734 <https://github.com/elastic/eland/pull/734>`_)
8.15.3 (2024-10-09)
-------------------
* Added support for DeBERTa-V2 tokenizer (`#717 <https://github.com/elastic/eland/pull/717>`_)
* Fixed ``--ca-cert`` with a shared Elasticsearch Docker volume (`#732 <https://github.com/elastic/eland/pull/732>`_)
8.15.2 (2024-10-02)
-------------------
* Fixed Docker image build (`#728 <https://github.com/elastic/eland/pull/728>`_)
8.15.1 (2024-10-01)
-------------------
* Upgraded PyTorch to version 2.3.1, which is compatible with Elasticsearch 8.15.2 or above (`#718 <https://github.com/elastic/eland/pull/718>`_)
* Migrated to distroless Wolfi base Docker image (`#720 <https://github.com/elastic/eland/pull/720>`_)
8.15.0 (2024-08-12)
-------------------
* Added a default truncation of ``second`` for text similarity (`#713 <https://github.com/elastic/eland/pull/713>`_)
* Added note about using text_similarity for rerank in the CLI (`#716 <https://github.com/elastic/eland/pull/716>`_)
* Added support for lists in result hits (`#707 <https://github.com/elastic/eland/pull/707>`_)
* Removed input fields from exported LTR models (`#708 <https://github.com/elastic/eland/pull/708>`_)
8.14.0 (2024-06-10)
-------------------
Added
^^^^^
* Added Elasticsearch Serverless support in DataFrames (`#690`_, contributed by `@AshokChoudhary11`_) and eland_import_hub_model (`#698`_)
Fixed
^^^^^
* Fixed Python 3.8 support (`#695`_, contributed by `@bartbroere`_)
* Fixed non _source fields missing from the results hits (`#693`_, contributed by `@bartbroere`_)
.. _@AshokChoudhary11: https://github.com/AshokChoudhary11
.. _#690: https://github.com/elastic/eland/pull/690
.. _#693: https://github.com/elastic/eland/pull/693
.. _#695: https://github.com/elastic/eland/pull/695
.. _#698: https://github.com/elastic/eland/pull/698
8.13.1 (2024-05-03)
-------------------
Added
^^^^^
* Added support for HTTP proxies in eland_import_hub_model (`#688`_)
.. _#688: https://github.com/elastic/eland/pull/688
8.13.0 (2024-03-27)
-------------------
Added
^^^^^
* Added support for Python 3.11 (`#681`_)
* Added ``eland.DataFrame.to_json`` function (`#661`_, contributed by `@bartbroere`_)
* Added override option to specify the model's max input size (`#674`_)
Changed
^^^^^^^
* Upgraded torch to 2.1.2 (`#671`_)
* Mirrored pandas' ``lineterminator`` instead of ``line_terminator`` in ``to_csv`` (`#595`_, contributed by `@bartbroere`_)
.. _#595: https://github.com/elastic/eland/pull/595
.. _#661: https://github.com/elastic/eland/pull/661
.. _#671: https://github.com/elastic/eland/pull/671
.. _#674: https://github.com/elastic/eland/pull/674
.. _#681: https://github.com/elastic/eland/pull/681
8.12.1 (2024-01-30)
-------------------
Fixed
^^^^^
* Fix missing value support for XGBRanker (`#654`_)
.. _#654: https://github.com/elastic/eland/pull/654
8.12.0 (2024-01-18)
-------------------
Added
^^^^^
* Supported XGBRanker model (`#649`_)
* Accepted LTR (Learning to rank) model config when importing model (`#645`_, `#651`_)
* Added LTR feature logger (`#648`_)
* Added ``prefix_string`` config option to the import model hub script (`#642`_)
* Made online retail analysis notebook runnable in Colab (`#641`_)
* Added new movie dataset to the tests (`#646`_)
.. _#641: https://github.com/elastic/eland/pull/641
.. _#642: https://github.com/elastic/eland/pull/642
.. _#645: https://github.com/elastic/eland/pull/645
.. _#646: https://github.com/elastic/eland/pull/646
.. _#648: https://github.com/elastic/eland/pull/648
.. _#649: https://github.com/elastic/eland/pull/649
.. _#651: https://github.com/elastic/eland/pull/651
8.11.1 (2023-11-22)
-------------------
Added
^^^^^
* Make demo notebook runnable in Colab (`#630`_)
Changed
^^^^^^^
* Bump Shap version to 0.43 (`#636`_)
Fixed
^^^^^
* Fix failed import of Sentence Transformer RoBERTa models (`#637`_)
.. _#630: https://github.com/elastic/eland/pull/630
.. _#636: https://github.com/elastic/eland/pull/636
.. _#637: https://github.com/elastic/eland/pull/637
8.11.0 (2023-11-08)
-------------------
Added
^^^^^
* Support E5 small multilingual model (`#625`_)
Changed
^^^^^^^
* Stream writes in ``ed.DataFrame.to_csv()`` (`#579`_)
* Improve memory estimation for NLP models (`#568`_)
Fixed
^^^^^
* Fixed deprecations in preparation of Pandas 2.0 support (`#602`_, `#603`_, contributed by `@bartbroere`_)
.. _#568: https://github.com/elastic/eland/pull/568
.. _#579: https://github.com/elastic/eland/pull/579
.. _#602: https://github.com/elastic/eland/pull/602
.. _#603: https://github.com/elastic/eland/pull/603
.. _#625: https://github.com/elastic/eland/pull/625
8.10.1 (2023-10-11)
-------------------
Fixed
^^^^^
* Fixed direct usage of TransformerModel (`#619`_)
.. _#619: https://github.com/elastic/eland/pull/619
8.10.0 (2023-10-09)
-------------------
Added
^^^^^
* Published pre-built Docker images to docker.elastic.co/eland/eland (`#613`_)
* Allowed importing private HuggingFace models (`#608`_)
* Added Apple Silicon (arm64) support to Docker image (`#615`_)
* Allowed importing some DPR models like ance-dpr-context-multi (`#573`_)
* Allowed using the Pandas API without monitoring/main permissions (`#581`_)
Changed
^^^^^^^
* Updated Docker image to Debian 12 Bookworm (`#613`_)
* Reduced Docker image size by not installing unused PyTorch GPU support on amd64 (`#615`_)
* Reduced model chunk size to 1MB (`#605`_)
Fixed
^^^^^
* Fixed deprecations in preparation of Pandas 2.0 support (`#593`_, `#596`_, contributed by `@bartbroere`_)
.. _@bartbroere: https://github.com/bartbroere
.. _#613: https://github.com/elastic/eland/pull/613
.. _#608: https://github.com/elastic/eland/pull/608
.. _#615: https://github.com/elastic/eland/pull/615
.. _#573: https://github.com/elastic/eland/pull/573
.. _#581: https://github.com/elastic/eland/pull/581
.. _#605: https://github.com/elastic/eland/pull/605
.. _#593: https://github.com/elastic/eland/pull/593
.. _#596: https://github.com/elastic/eland/pull/596
8.9.0 (2023-08-24)
------------------
Added
^^^^^
* Simplify embedding model support and loading (`#569`_)
* Make eland_import_hub_model easier to find on Windows (`#559`_)
* Update trained model inference endpoint (`#556`_)
* Add BertJapaneseTokenizer support with bert_ja tokenization configuration (`#534`_)
* Add ability to upload xlm-roberta tokenized models (`#518`_)
* Tolerate different model output formats when measuring embedding size (`#535`_)
* Generate valid NLP model id from file path (`#541`_)
* Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model (`#522`_)
* Set embedding_size config parameter for Text Embedding models (`#532`_)
* Add support for the pass_through task (`#526`_)
Fixed
^^^^^
* Fixed black to comply with the code style (`#557`_)
* Fixed No module named 'torch' (`#553`_)
* Fix autosummary directive by removing hack autosummaries (`#548`_)
* Prevent TypeError with None check (`#525`_)
.. _#518: https://github.com/elastic/eland/pull/518
.. _#522: https://github.com/elastic/eland/pull/522
.. _#525: https://github.com/elastic/eland/pull/525
.. _#526: https://github.com/elastic/eland/pull/526
.. _#532: https://github.com/elastic/eland/pull/532
.. _#534: https://github.com/elastic/eland/pull/534
.. _#535: https://github.com/elastic/eland/pull/535
.. _#541: https://github.com/elastic/eland/pull/541
.. _#548: https://github.com/elastic/eland/pull/548
.. _#553: https://github.com/elastic/eland/pull/553
.. _#556: https://github.com/elastic/eland/pull/556
.. _#557: https://github.com/elastic/eland/pull/557
.. _#559: https://github.com/elastic/eland/pull/559
.. _#569: https://github.com/elastic/eland/pull/569
8.7.0 (2023-03-30)
------------------
Added
^^^^^
* Added a new NLP model task type "text_similarity" (`#486`_)
* Added a new NLP model task type "text_expansion" (`#520`_)
* Added support for exporting an Elastic ML model as a scikit-learn pipeline via ``MLModel.export_model()`` (`#509`_)
Fixed
^^^^^
* Fixed an issue that occurred when LightGBM was installed but libomp wasn't installed on the system. (`#499`_)
.. _#486: https://github.com/elastic/eland/pull/486
.. _#499: https://github.com/elastic/eland/pull/499
.. _#509: https://github.com/elastic/eland/pull/509
.. _#520: https://github.com/elastic/eland/pull/520
8.3.0 (2022-07-11)
------------------
Added
^^^^^
* Added a new NLP model task type "auto" which infers the task type based on model configuration and architecture (`#475`_)
Changed
^^^^^^^
* Changed required version of 'torch' package to `>=1.11.0,<1.12` to match required PyTorch version for Elasticsearch 8.3 (was `>=1.9.0,<2`) (`#479`_)
* Changed the default value of the `--task-type` parameter for the `eland_import_hub_model` CLI to be "auto" (`#475`_)
Fixed
^^^^^
* Fixed decision tree classifier serialization to account for probabilities (`#465`_)
* Fixed PyTorch model quantization (`#472`_)
.. _#465: https://github.com/elastic/eland/pull/465
.. _#472: https://github.com/elastic/eland/pull/472
.. _#475: https://github.com/elastic/eland/pull/475
.. _#479: https://github.com/elastic/eland/pull/479
8.2.0 (2022-05-09)
------------------

View File

@ -78,9 +78,15 @@ Once your changes and tests are ready to submit for review:
# Run Auto-format, lint, mypy type checker for your changes
$ nox -s format
# Run the test suite
$ pytest --doctest-modules eland/ tests/
$ pytest --nbval tests/notebook/
# Launch Elasticsearch with a trial licence and ML enabled
$ docker run --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.license.self_generated.type=trial" docker.elastic.co/elasticsearch/elasticsearch:9.0.0
# See all test suites
$ nox -l
# Run a specific test suite
$ nox -rs "test-3.12(pandas_version='2.2.3')"
# Run a specific test
$ nox -rs "test-3.12(pandas_version='2.2.3')" -- -k test_learning_to_rank
```
@ -169,7 +175,7 @@ currently using a minimum version of PyCharm 2019.2.4.
* Setup Elasticsearch instance with docker
``` bash
> ELASTICSEARCH_VERSION=elasticsearch:7.x-SNAPSHOT .ci/run-elasticsearch.sh
> ELASTICSEARCH_VERSION=elasticsearch:8.17.0 BUILDKITE=false .buildkite/run-elasticsearch.sh
```
* Now check `http://localhost:9200`
@ -191,7 +197,7 @@ currently using a minimum version of PyCharm 2019.2.4.
``` bash
> import eland as ed
> ed_df = ed.DataFrame('localhost', 'flights')
> ed_df = ed.DataFrame('http://localhost:9200', 'flights')
```
* To run the automatic formatter and check for lint issues run
@ -203,7 +209,7 @@ currently using a minimum version of PyCharm 2019.2.4.
* To test specific versions of Python run
``` bash
> nox -s test-3.8
> nox -s test-3.12
```
### Documentation

View File

@ -1,14 +1,28 @@
FROM debian:11.1
# syntax=docker/dockerfile:1
FROM python:3.10-slim
RUN apt-get update && \
apt-get install -y build-essential pkg-config cmake \
python3-dev python3-pip python3-venv \
libzip-dev libjpeg-dev && \
apt-get clean
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && apt-get install -y \
build-essential \
pkg-config \
cmake \
libzip-dev \
libjpeg-dev
ADD . /eland
WORKDIR /eland
RUN python3 -m pip install --no-cache-dir --disable-pip-version-check .[all]
ARG TARGETPLATFORM
RUN --mount=type=cache,target=/root/.cache/pip \
if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
python3 -m pip install \
--no-cache-dir --disable-pip-version-check --extra-index-url https://download.pytorch.org/whl/cpu \
torch==2.5.1+cpu .[all]; \
else \
python3 -m pip install \
--no-cache-dir --disable-pip-version-check \
.[all]; \
fi
CMD ["/bin/sh"]

42
Dockerfile.wolfi Normal file
View File

@ -0,0 +1,42 @@
# syntax=docker/dockerfile:1
FROM docker.elastic.co/wolfi/python:3.10-dev AS builder
WORKDIR /eland
ENV VIRTUAL_ENV=/eland/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
ADD . /eland
ARG TARGETPLATFORM
RUN --mount=type=cache,target=/root/.cache/pip \
if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
python3 -m pip install \
--no-cache-dir --disable-pip-version-check --extra-index-url https://download.pytorch.org/whl/cpu \
torch==2.5.1+cpu .[all]; \
else \
python3 -m pip install \
--no-cache-dir --disable-pip-version-check \
.[all]; \
fi
FROM docker.elastic.co/wolfi/python:3.10
WORKDIR /eland
ENV VIRTUAL_ENV=/eland/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
COPY --from=builder /eland /eland
# The eland_import_hub_model script is intended to be executed by a shell,
# which will see its shebang line and then execute it with the Python
# interpreter of the virtual environment. We want to keep this behavior even
# with Wolfi so that users can use the image as before. To do that, we use two
# tricks:
#
# * copy /bin/sh (that is, busybox's ash) from the builder image
# * revert to Docker's the default entrypoint, which is the only way to pass
# parameters to `eland_import_hub_model` without needing quotes.
#
COPY --from=builder /bin/sh /bin/sh
ENTRYPOINT []

View File

@ -50,3 +50,6 @@ Permission is hereby granted, free of charge, to any person obtaining a copy of
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--
This product contains an adapted version of the "us-national-parks" dataset, https://data.world/kevinnayar/us-national-parks, by Kevin Nayar, https://data.world/kevinnayar, is licensed under CC BY, https://creativecommons.org/licenses/by/4.0/legalcode

View File

@ -9,11 +9,10 @@
<a href="https://pypi.org/project/eland"><img src="https://img.shields.io/pypi/v/eland.svg" alt="PyPI Version"></a>
<a href="https://anaconda.org/conda-forge/eland"><img src="https://img.shields.io/conda/vn/conda-forge/eland"
alt="Conda Version"></a>
<a href="https://pepy.tech/project/eland"><img src="https://pepy.tech/badge/eland" alt="Downloads"></a>
<a href="https://pepy.tech/project/eland"><img src="https://static.pepy.tech/badge/eland" alt="Downloads"></a>
<a href="https://pypi.org/project/eland"><img src="https://img.shields.io/pypi/status/eland.svg"
alt="Package Status"></a>
<a href="https://clients-ci.elastic.co/job/elastic+eland+main"><img
src="https://clients-ci.elastic.co/buildStatus/icon?job=elastic%2Beland%2Bmain" alt="Build Status"></a>
<a href="https://buildkite.com/elastic/eland"><img src="https://badge.buildkite.com/d92340e800bc06a7c7c02a71b8d42fcb958bd18c25f99fe2d9.svg" alt="Build Status"></a>
<a href="https://github.com/elastic/eland/blob/main/LICENSE.txt"><img src="https://img.shields.io/pypi/l/eland.svg"
alt="License"></a>
<a href="https://eland.readthedocs.io"><img
@ -41,6 +40,11 @@ Eland can be installed from [PyPI](https://pypi.org/project/eland) with Pip:
$ python -m pip install eland
```
If using Eland to upload NLP models to Elasticsearch install the PyTorch extras:
```bash
$ python -m pip install 'eland[pytorch]'
```
Eland can also be installed from [Conda Forge](https://anaconda.org/conda-forge/eland) with Conda:
```bash
@ -49,9 +53,15 @@ $ conda install -c conda-forge eland
### Compatibility
- Supports Python 3.7+ and Pandas 1.3
- Supports Elasticsearch clusters that are 7.11+, recommended 7.14 or later for all features to work.
Make sure your Eland major version matches the major version of your Elasticsearch cluster.
- Supports Python 3.9, 3.10, 3.11 and 3.12.
- Supports Pandas 1.5 and 2.
- Supports Elasticsearch 8+ clusters, recommended 8.16 or later for all features to work.
If you are using the NLP with PyTorch feature make sure your Eland minor version matches the minor
version of your Elasticsearch cluster. For all other features it is sufficient for the major versions
to match.
- You need to install the appropriate version of PyTorch to import an NLP model. Run `python -m pip
install 'eland[pytorch]'` to install that version.
### Prerequisites
@ -69,29 +79,23 @@ specifying different package names.
### Docker
Users wishing to use Eland without installing it, in order to just run the available scripts, can build the Docker
container:
If you want to use Eland without installing it just to run the available scripts, use the Docker
image.
It can be used interactively:
```bash
$ docker build -t elastic/eland .
```
The container can now be used interactively:
```bash
$ docker run -it --rm --network host elastic/eland
$ docker run -it --rm --network host docker.elastic.co/eland/eland
```
Running installed scripts is also possible without an interactive shell, e.g.:
```bash
$ docker run -it --rm --network host \
elastic/eland \
docker.elastic.co/eland/eland \
eland_import_hub_model \
--url http://host.docker.internal:9200/ \
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
--task-type ner \
--start
--task-type ner
```
### Connecting to Elasticsearch
@ -105,15 +109,15 @@ or a string containing the host to connect to:
```python
import eland as ed
# Connecting to an Elasticsearch instance running on 'localhost:9200'
df = ed.DataFrame("localhost:9200", es_index_pattern="flights")
# Connecting to an Elasticsearch instance running on 'http://localhost:9200'
df = ed.DataFrame("http://localhost:9200", es_index_pattern="flights")
# Connecting to an Elastic Cloud instance
from elasticsearch import Elasticsearch
es = Elasticsearch(
cloud_id="cluster-name:...",
http_auth=("elastic", "<password>")
basic_auth=("elastic", "<password>")
)
df = ed.DataFrame(es, es_index_pattern="flights")
```
@ -134,7 +138,7 @@ without overloading your machine.
>>> import eland as ed
>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('localhost:9200', 'flights')
>>> df = ed.DataFrame('http://localhost:9200', 'flights')
# eland.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
@ -196,10 +200,12 @@ libraries to be serialized and used as an inference model in Elasticsearch.
➤ [Read more about Machine Learning in Elasticsearch](https://www.elastic.co/guide/en/machine-learning/current/ml-getting-started.html)
```python
>>> from sklearn import datasets
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
# Train and exercise an XGBoost ML model locally
>>> training_data = datasets.make_classification(n_features=5)
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])
@ -208,7 +214,7 @@ libraries to be serialized and used as an inference model in Elasticsearch.
# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
es_client="localhost:9200",
es_client="http://localhost:9200",
model_id="xgb-classifier",
model=xgb_model,
feature_names=["f0", "f1", "f2", "f3", "f4"],
@ -233,14 +239,29 @@ $ eland_import_hub_model \
--start
```
The example above will automatically start a model deployment. This is a
good shortcut for initial experimentation, but for anything that needs
good throughput you should omit the `--start` argument from the Eland
command line and instead start the model using the ML UI in Kibana.
The `--start` argument will deploy the model with one allocation and one
thread per allocation, which will not offer good performance. When starting
the model deployment using the ML UI in Kibana or the Elasticsearch
[API](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-trained-model-deployment.html)
you will be able to set the threading options to make the best use of your
hardware.
```python
>>> import elasticsearch
>>> from pathlib import Path
>>> from eland.common import es_version
>>> from eland.ml.pytorch import PyTorchModel
>>> from eland.ml.pytorch.transformers import TransformerModel
>>> es = elasticsearch.Elasticsearch("http://elastic:mlqa_admin@localhost:9200")
>>> es_cluster_version = es_version(es)
# Load a Hugging Face transformers model directly from the model hub
>>> tm = TransformerModel("elastic/distilbert-base-cased-finetuned-conll03-english", "ner")
>>> tm = TransformerModel(model_id="elastic/distilbert-base-cased-finetuned-conll03-english", task_type="ner", es_version=es_cluster_version)
Downloading: 100%|██████████| 257/257 [00:00<00:00, 108kB/s]
Downloading: 100%|██████████| 954/954 [00:00<00:00, 372kB/s]
Downloading: 100%|██████████| 208k/208k [00:00<00:00, 668kB/s]
@ -253,7 +274,6 @@ Downloading: 100%|██████████| 249M/249M [00:23<00:00, 11.2MB
>>> model_path, config, vocab_path = tm.save(tmp_path)
# Import model into Elasticsearch
>>> es = elasticsearch.Elasticsearch("http://elastic:mlqa_admin@localhost:9200", timeout=300) # 5 minute timeout
>>> ptm = PyTorchModel(es, tm.elasticsearch_model_id())
>>> ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
100%|██████████| 63/63 [00:12<00:00, 5.02it/s]

View File

@ -1,224 +0,0 @@
#!/usr/bin/env python
# Licensed to Elasticsearch B.V. under one or more contributor
# license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright
# ownership. Elasticsearch B.V. licenses this file to you under
# the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""
Copies a model from the Hugging Face model hub into an Elasticsearch cluster.
This will create local cached copies that will be traced (necessary) before
uploading to Elasticsearch. This will also check that the task type is supported
as well as the model and tokenizer types. All necessary configuration is
uploaded along with the model.
"""
import argparse
import logging
import os
import sys
import tempfile
import textwrap
from elastic_transport.client_utils import DEFAULT
from elasticsearch import AuthenticationException, Elasticsearch
MODEL_HUB_URL = "https://huggingface.co"
def get_arg_parser():
parser = argparse.ArgumentParser()
location_args = parser.add_mutually_exclusive_group(required=True)
location_args.add_argument(
"--url",
default=os.environ.get("ES_URL"),
help="An Elasticsearch connection URL, e.g. http://localhost:9200",
)
location_args.add_argument(
"--cloud-id",
default=os.environ.get("CLOUD_ID"),
help="Cloud ID as found in the 'Manage Deployment' page of an Elastic Cloud deployment",
)
parser.add_argument(
"--hub-model-id",
required=True,
help="The model ID in the Hugging Face model hub, "
"e.g. dbmdz/bert-large-cased-finetuned-conll03-english",
)
parser.add_argument(
"--es-model-id",
required=False,
default=None,
help="The model ID to use in Elasticsearch, "
"e.g. bert-large-cased-finetuned-conll03-english."
"When left unspecified, this will be auto-created from the `hub-id`",
)
parser.add_argument(
"-u", "--es-username",
required=False,
default=os.environ.get("ES_USERNAME"),
help="Username for Elasticsearch"
)
parser.add_argument(
"-p", "--es-password",
required=False,
default=os.environ.get("ES_PASSWORD"),
help="Password for the Elasticsearch user specified with -u/--username"
)
parser.add_argument(
"--es-api-key",
required=False,
default=os.environ.get("ES_API_KEY"),
help="API key for Elasticsearch"
)
parser.add_argument(
"--task-type",
required=True,
choices=SUPPORTED_TASK_TYPES,
help="The task type for the model usage.",
)
parser.add_argument(
"--quantize",
action="store_true",
default=False,
help="Quantize the model before uploading. Default: False",
)
parser.add_argument(
"--start",
action="store_true",
default=False,
help="Start the model deployment after uploading. Default: False",
)
parser.add_argument(
"--clear-previous",
action="store_true",
default=False,
help="Should the model previously stored with `es-model-id` be deleted"
)
parser.add_argument(
"--insecure",
action="store_false",
default=True,
help="Do not verify SSL certificates"
)
parser.add_argument(
"--ca-certs",
required=False,
default=DEFAULT,
help="Path to CA bundle"
)
return parser
def get_es_client(cli_args):
try:
es_args = {
'request_timeout': 300,
'verify_certs': cli_args.insecure,
'ca_certs': cli_args.ca_certs
}
# Deployment location
if cli_args.url:
es_args['hosts'] = cli_args.url
if cli_args.cloud_id:
es_args['cloud_id'] = cli_args.cloud_id
# Authentication
if cli_args.es_api_key:
es_args['api_key'] = cli_args.es_api_key
elif cli_args.es_username:
if not cli_args.es_password:
logging.error(f"Password for user {cli_args.es_username} was not specified.")
exit(1)
es_args['basic_auth'] = (cli_args.es_username, cli_args.es_password)
es_client = Elasticsearch(**es_args)
es_info = es_client.info()
logger.info(f"Connected to cluster named '{es_info['cluster_name']}' (version: {es_info['version']['number']})")
return es_client
except AuthenticationException as e:
logger.error(e)
exit(1)
if __name__ == "__main__":
# Configure logging
logging.basicConfig(format='%(asctime)s %(levelname)s : %(message)s')
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
try:
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import SUPPORTED_TASK_TYPES, TransformerModel
except ModuleNotFoundError as e:
logger.error(textwrap.dedent(f"""\
\033[31mFailed to run because module '{e.name}' is not available.\033[0m
This script requires PyTorch extras to run. You can install these by running:
\033[1m{sys.executable} -m pip install 'eland[pytorch]'
\033[0m"""))
exit(1)
# Parse arguments
args = get_arg_parser().parse_args()
# Connect to ES
logger.info("Establishing connection to Elasticsearch")
es = get_es_client(args)
# Trace and save model, then upload it from temp file
with tempfile.TemporaryDirectory() as tmp_dir:
logger.info(f"Loading HuggingFace transformer tokenizer and model '{args.hub_model_id}'")
tm = TransformerModel(args.hub_model_id, args.task_type, args.quantize)
model_path, config, vocab_path = tm.save(tmp_dir)
ptm = PyTorchModel(es, args.es_model_id if args.es_model_id else tm.elasticsearch_model_id())
model_exists = es.options(ignore_status=404).ml.get_trained_models(model_id=ptm.model_id).meta.status == 200
if model_exists:
if args.clear_previous:
logger.info(f"Stopping deployment for model with id '{ptm.model_id}'")
ptm.stop()
logger.info(f"Deleting model with id '{ptm.model_id}'")
ptm.delete()
else:
logger.error(f"Trained model with id '{ptm.model_id}' already exists")
logger.info("Run the script with the '--clear-previous' flag if you want to overwrite the existing model.")
exit(1)
logger.info(f"Creating model with id '{ptm.model_id}'")
ptm.put_config(config=config)
logger.info(f"Uploading model definition")
ptm.put_model(model_path)
logger.info(f"Uploading model vocabulary")
ptm.put_vocab(vocab_path)
# Start the deployed model
if args.start:
logger.info(f"Starting model deployment")
ptm.start()
logger.info(f"Model successfully imported with id '{ptm.model_id}'")

94
catalog-info.yaml Normal file
View File

@ -0,0 +1,94 @@
# Declare a Backstage Component that represents the Eland application.
---
# yaml-language-server: $schema=https://json.schemastore.org/catalog-info.json
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: eland
description: Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
annotations:
backstage.io/source-location: url:https://github.com/elastic/eland/
github.com/project-slug: elastic/eland
github.com/team-slug: elastic/ml-core
buildkite.com/project-slug: elastic/eland
tags:
- elasticsearch
- python
- machine-learning
- big-data
- etl
links:
- title: Eland docs
url: https://eland.readthedocs.io/
spec:
type: application
owner: group:ml-core
lifecycle: production
dependsOn:
- resource:eland-pipeline
- resource:eland-releaser-docker-pipeline
# yaml-language-server: $schema=https://gist.githubusercontent.com/elasticmachine/988b80dae436cafea07d9a4a460a011d/raw/e57ee3bed7a6f73077a3f55a38e76e40ec87a7cf/rre.schema.json
---
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: eland-pipeline
description: Run Eland tests
links:
- title: Pipeline
url: https://buildkite.com/elastic/eland
spec:
type: buildkite-pipeline
owner: group:ml-core
system: buildkite
implementation:
apiVersion: buildkite.elastic.dev/v1
kind: Pipeline
metadata:
name: Eland
description: Eland Python
spec:
pipeline_file: .buildkite/pipeline.yml
repository: elastic/eland
teams:
ml-core: {}
devtools-team: {}
es-docs: {}
everyone:
access_level: READ_ONLY
# yaml-language-server: $schema=https://gist.githubusercontent.com/elasticmachine/988b80dae436cafea07d9a4a460a011d/raw/e57ee3bed7a6f73077a3f55a38e76e40ec87a7cf/rre.schema.json
---
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: eland-release-docker-pipeline
description: Release Docker Artifacts for Eland
links:
- title: Pipeline
url: https://buildkite.com/elastic/eland-release-docker
spec:
type: buildkite-pipeline
owner: group:ml-core
system: buildkite
implementation:
apiVersion: buildkite.elastic.dev/v1
kind: Pipeline
metadata:
name: Eland - Release Docker
description: Release Docker Artifacts for Eland
spec:
pipeline_file: .buildkite/release-docker/pipeline.yml
provider_settings:
trigger_mode: none
repository: elastic/eland
teams:
ml-core: {}
devtools-team: {}
everyone:
access_level: READ_ONLY

8
docs/docset.yml Normal file
View File

@ -0,0 +1,8 @@
project: 'Eland Python client'
cross_links:
- docs-content
toc:
- toc: reference
subs:
es: "Elasticsearch"
ml: "machine learning"

View File

@ -1,13 +0,0 @@
= Eland Python Client
:doctype: book
include::{asciidoc-dir}/../../shared/attributes.asciidoc[]
include::overview.asciidoc[]
include::installation.asciidoc[]
include::dataframes.asciidoc[]
include::machine-learning.asciidoc[]

View File

@ -1,16 +0,0 @@
[[installation]]
== Installation
Eland can be installed with https://pip.pypa.io[pip] from https://pypi.org/project/eland[PyPI]:
[source,sh]
-----------------------------
$ python -m pip install eland
-----------------------------
and can also be installed with https://docs.conda.io[Conda] from https://anaconda.org/conda-forge/eland[Conda Forge]:
[source,sh]
------------------------------------
$ conda install -c conda-forge eland
------------------------------------

View File

@ -1,80 +0,0 @@
[[machine-learning]]
== Machine Learning
[discrete]
[[ml-trained-models]]
=== Trained models
Eland allows transforming trained models from scikit-learn, XGBoost,
and LightGBM libraries to be serialized and used as an inference
model in {es}.
[source,python]
------------------------
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])
>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
es_client="http://localhost:9200",
model_id="xgb-classifier",
model=xgb_model,
feature_names=["f0", "f1", "f2", "f3", "f4"],
)
# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
------------------------
[discrete]
[[ml-nlp-pytorch]]
=== Natural language processing (NLP) with PyTorch
For NLP tasks, Eland enables you to import PyTorch trained BERT models into {es}.
Models can be either plain PyTorch models, or supported
https://huggingface.co/transformers[transformers] models from the
https://huggingface.co/models[Hugging Face model hub].
[source,bash]
------------------------
$ eland_import_hub_model \
--url http://localhost:9200/ \
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
--task-type ner \
--start
------------------------
[source,python]
------------------------
>>> import elasticsearch
>>> from pathlib import Path
>>> from eland.ml.pytorch import PyTorchModel
>>> from eland.ml.pytorch.transformers import TransformerModel
# Load a Hugging Face transformers model directly from the model hub
>>> tm = TransformerModel("elastic/distilbert-base-cased-finetuned-conll03-english", "ner")
Downloading: 100%|██████████| 257/257 [00:00<00:00, 108kB/s]
Downloading: 100%|██████████| 954/954 [00:00<00:00, 372kB/s]
Downloading: 100%|██████████| 208k/208k [00:00<00:00, 668kB/s]
Downloading: 100%|██████████| 112/112 [00:00<00:00, 43.9kB/s]
Downloading: 100%|██████████| 249M/249M [00:23<00:00, 11.2MB/s]
# Export the model in a TorchScrpt representation which Elasticsearch uses
>>> tmp_path = "models"
>>> Path(tmp_path).mkdir(parents=True, exist_ok=True)
>>> model_path, config_path, vocab_path = tm.save(tmp_path)
# Import model into Elasticsearch
>>> es = elasticsearch.Elasticsearch("http://elastic:mlqa_admin@localhost:9200", timeout=300) # 5 minute timeout
>>> ptm = PyTorchModel(es, tm.elasticsearch_model_id())
>>> ptm.import_model(model_path, config_path, vocab_path)
100%|██████████| 63/63 [00:12<00:00, 5.02it/s]
------------------------

View File

@ -1,16 +1,16 @@
[[dataframes]]
== Data Frames
---
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/dataframes.html
---
`eland.DataFrame` wraps an Elasticsearch index in a Pandas-like API
and defers all processing and filtering of data to Elasticsearch
instead of your local machine. This means you can process large
amounts of data within Elasticsearch from a Jupyter Notebook
without overloading your machine.
# Data Frames [dataframes]
[source,python]
-------------------------------------
`eland.DataFrame` wraps an Elasticsearch index in a Pandas-like API and defers all processing and filtering of data to Elasticsearch instead of your local machine. This means you can process large amounts of data within Elasticsearch from a Jupyter Notebook without overloading your machine.
```python
>>> import eland as ed
>>> # Connect to 'flights' index via localhost Elasticsearch node
>>>
# Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('http://localhost:9200', 'flights')
# eland.DataFrame instance has the same API as pandas.DataFrame
@ -29,14 +29,14 @@ without overloading your machine.
<class 'eland.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 AvgTicketPrice 13059 non-null float64
1 Cancelled 13059 non-null bool
2 Carrier 13059 non-null object
...
24 OriginWeather 13059 non-null object
25 dayOfWeek 13059 non-null int64
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 AvgTicketPrice 13059 non-null float64
1 Cancelled 13059 non-null bool
2 Carrier 13059 non-null object
...
24 OriginWeather 13059 non-null object
25 dayOfWeek 13059 non-null int64
26 timestamp 13059 non-null datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(5), int64(2), object(17)
memory usage: 80.0 bytes
@ -59,4 +59,5 @@ Elasticsearch storage usage: 5.043 MB
sum 9.261629e+07 8.204365e+06
min 0.000000e+00 1.000205e+02
std 4.578263e+03 2.663867e+02
-------------------------------------
```

View File

@ -1,33 +1,36 @@
[[overview]]
== Overview
---
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/index.html
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/overview.html
navigation_title: Eland
---
Eland is a Python client and toolkit for DataFrames and {ml} in {es}.
Full documentation is available on https://eland.readthedocs.io[Read the Docs].
Source code is available on https://github.com/elastic/eland[GitHub].
# Eland Python client [overview]
[discrete]
=== Compatibility
Eland is a Python client and toolkit for DataFrames and {{ml}} in {{es}}. Full documentation is available on [Read the Docs](https://eland.readthedocs.io). Source code is available on [GitHub](https://github.com/elastic/eland).
- Supports Python 3.7+ and Pandas 1.3
- Supports {es} clusters that are 7.11+, recommended 7.14 or later for all features to work.
Make sure your Eland major version matches the major version of your Elasticsearch cluster.
The recommended way to set your requirements in your `setup.py` or
`requirements.txt` is::
## Compatibility [_compatibility]
# Elasticsearch 8.x
eland>=8,<9
* Supports Python 3.9+ and Pandas 1.5
* Supports {{es}} 8+ clusters, recommended 8.16 or later for all features to work. Make sure your Eland major version matches the major version of your Elasticsearch cluster.
# Elasticsearch 7.x
eland>=7,<8
The recommended way to set your requirements in your `setup.py` or `requirements.txt` is::
[discrete]
=== Getting Started
```
# Elasticsearch 8.x
eland>=8,<9
```
```
# Elasticsearch 7.x
eland>=7,<8
```
Create a `DataFrame` object connected to an {es} cluster running on `http://localhost:9200`:
## Getting Started [_getting_started]
[source,python]
------------------------------------
Create a `DataFrame` object connected to an {{es}} cluster running on `http://localhost:9200`:
```python
>>> import eland as ed
>>> df = ed.DataFrame(
... es_client="http://localhost:9200",
@ -48,20 +51,19 @@ Create a `DataFrame` object connected to an {es} cluster running on `http://loca
13058 858.144337 False ... 6 2018-02-11 14:54:34
[13059 rows x 27 columns]
------------------------------------
```
[discrete]
==== Elastic Cloud
### Elastic Cloud [_elastic_cloud]
You can also connect Eland to an Elasticsearch instance in Elastic Cloud:
[source,python]
------------------------------------
```python
>>> import eland as ed
>>> from elasticsearch import Elasticsearch
# First instantiate an 'Elasticsearch' instance connected to Elastic Cloud
>>> es = Elasticsearch(cloud_id="...", api_key=("...", "..."))
>>> es = Elasticsearch(cloud_id="...", api_key="...")
# then wrap the client in an Eland DataFrame:
>>> df = ed.DataFrame(es, es_index_pattern="flights")
@ -73,16 +75,16 @@ You can also connect Eland to an Elasticsearch instance in Elastic Cloud:
3 181.694216 True ... 0 2018-01-01 10:33:28
4 730.041778 False ... 0 2018-01-01 05:13:00
[5 rows x 27 columns]
------------------------------------
```
Eland can be used for complex queries and aggregations:
[source,python]
------------------------------------
```python
>>> df[df.Carrier != "Kibana Airlines"].groupby("Carrier").mean(numeric_only=False)
AvgTicketPrice Cancelled timestamp
Carrier
Carrier
ES-Air 630.235816 0.129814 2018-01-21 20:45:00.200000000
JetBeats 627.457373 0.134698 2018-01-21 14:43:18.112400635
Logstash Airways 624.581974 0.125188 2018-01-21 16:14:50.711798340
------------------------------------
```

View File

@ -0,0 +1,19 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/installation.html
---
# Installation [installation]
Eland can be installed with [pip](https://pip.pypa.io) from [PyPI](https://pypi.org/project/eland). We recommend [using a virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) when installing with pip:
```sh
$ python -m pip install eland
```
Alternatively, Eland can be installed with [Conda](https://docs.conda.io) from [Conda Forge](https://anaconda.org/conda-forge/eland):
```sh
$ conda install -c conda-forge eland
```

View File

@ -0,0 +1,199 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html
---
# Machine Learning [machine-learning]
## Trained models [ml-trained-models]
Eland allows transforming *some*
[trained models](https://eland.readthedocs.io/en/latest/reference/api/eland.ml.MLModel.import_model.html#parameters) from scikit-learn, XGBoost,
and LightGBM libraries to be serialized and used as an inference model in {{es}}.
```python
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])
>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
es_client="http://localhost:9200",
model_id="xgb-classifier",
model=xgb_model,
feature_names=["f0", "f1", "f2", "f3", "f4"],
)
# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
```
## Natural language processing (NLP) with PyTorch [ml-nlp-pytorch]
::::{important}
You need to install the appropriate version of PyTorch to import an NLP model. Run `python -m pip install 'eland[pytorch]'` to install that version.
::::
For NLP tasks, Eland enables you to import PyTorch models into {{es}}. Use the `eland_import_hub_model` script to download and install supported [transformer models](https://huggingface.co/transformers) from the [Hugging Face model hub](https://huggingface.co/models). For example:
```bash
eland_import_hub_model <authentication> \ <1>
--url http://localhost:9200/ \ <2>
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \ <3>
--task-type ner \ <4>
--start
```
1. Use an authentication method to access your cluster. Refer to [Authentication methods](machine-learning.md#ml-nlp-pytorch-auth).
2. The cluster URL. Alternatively, use `--cloud-id`.
3. Specify the identifier for the model in the Hugging Face model hub.
4. Specify the type of NLP task. Supported values are `fill_mask`, `ner`, `question_answering`, `text_classification`, `text_embedding`, `text_expansion`, `text_similarity` and `zero_shot_classification`.
For more information about the available options, run `eland_import_hub_model` with the `--help` option.
```bash
eland_import_hub_model --help
```
### Import model with Docker [ml-nlp-pytorch-docker]
::::{important}
To use the Docker container, you need to clone the Eland repository: [https://github.com/elastic/eland](https://github.com/elastic/eland)
::::
If you want to use Eland without installing it, you can use the Docker image:
You can use the container interactively:
```bash
docker run -it --rm --network host docker.elastic.co/eland/eland
```
Running installed scripts is also possible without an interactive shell, for example:
```bash
docker run -it --rm docker.elastic.co/eland/eland \
eland_import_hub_model \
--url $ELASTICSEARCH_URL \
--hub-model-id elastic/distilbert-base-uncased-finetuned-conll03-english \
--start
```
Replace the `$ELASTICSEARCH_URL` with the URL for your Elasticsearch cluster. For authentication purposes, include an administrator username and password in the URL in the following format: `https://username:password@host:port`.
### Install models in an air-gapped environment [ml-nlp-pytorch-air-gapped]
You can install models in a restricted or closed network by pointing the `eland_import_hub_model` script to local files.
For an offline install of a Hugging Face model, the model first needs to be cloned locally, Git and [Git Large File Storage](https://git-lfs.com/) are required to be installed in your system.
1. Select a model you want to use from Hugging Face. Refer to the [compatible third party model](docs-content://explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md) list for more information on the supported architectures.
2. Clone the selected model from Hugging Face by using the model URL. For example:
```bash
git clone https://huggingface.co/dslim/bert-base-NER
```
This command results in a local copy of of the model in the directory `bert-base-NER`.
3. Use the `eland_import_hub_model` script with the `--hub-model-id` set to the directory of the cloned model to install it:
```bash
eland_import_hub_model \
--url 'XXXX' \
--hub-model-id /PATH/TO/MODEL \
--task-type ner \
--es-username elastic --es-password XXX \
--es-model-id bert-base-ner
```
If you use the Docker image to run `eland_import_hub_model` you must bind mount the model directory, so the container can read the files:
```bash
docker run --mount type=bind,source=/PATH/TO/MODEL,destination=/model,readonly -it --rm docker.elastic.co/eland/eland \
eland_import_hub_model \
--url 'XXXX' \
--hub-model-id /model \
--task-type ner \
--es-username elastic --es-password XXX \
--es-model-id bert-base-ner
```
Once its uploaded to {{es}}, the model will have the ID specified by `--es-model-id`. If it is not set, the model ID is derived from `--hub-model-id`; spaces and path delimiters are converted to double underscores `__`.
### Connect to Elasticsearch through a proxy [ml-nlp-pytorch-proxy]
Behind the scenes, Eland uses the `requests` Python library, which [allows configuring proxies through an environment variable](https://requests.readthedocs.io/en/latest/user/advanced/#proxies). For example, to use an HTTP proxy to connect to an HTTPS Elasticsearch cluster, you need to set the `HTTPS_PROXY` environment variable when invoking Eland:
```bash
HTTPS_PROXY=http://proxy-host:proxy-port eland_import_hub_model ...
```
If you disabled security on your Elasticsearch cluster, you should use `HTTP_PROXY` instead.
### Authentication methods [ml-nlp-pytorch-auth]
The following authentication options are available when using the import script:
* Elasticsearch username and password authentication (specified with the `-u` and `-p` options):
```bash
eland_import_hub_model -u <username> -p <password> --cloud-id <cloud-id> ...
```
These `-u` and `-p` options also work when you use `--url`.
* Elasticsearch username and password authentication (embedded in the URL):
```bash
eland_import_hub_model --url https://<user>:<password>@<hostname>:<port> ...
```
* Elasticsearch API key authentication:
```bash
eland_import_hub_model --es-api-key <api-key> --url https://<hostname>:<port> ...
```
* HuggingFace Hub access token (for private models):
```bash
eland_import_hub_model --hub-access-token <access-token> ...
```
### TLS/SSL [ml-nlp-pytorch-tls]
The following TLS/SSL options for Elasticsearch are available when using the import script:
* Specify alternate CA bundle to verify the cluster certificate:
```bash
eland_import_hub_model --ca-certs CA_CERTS ...
```
* Disable TLS/SSL verification altogether (strongly discouraged):
```bash
eland_import_hub_model --insecure ...
```

6
docs/reference/toc.yml Normal file
View File

@ -0,0 +1,6 @@
project: 'Eland reference'
toc:
- file: index.md
- file: installation.md
- file: dataframes.md
- file: machine-learning.md

View File

@ -1,11 +1,5 @@
elasticsearch>=7.7
pandas>=1.2.0
matplotlib
nbval
numpydoc>=0.9.0
scikit-learn>=0.22.1
xgboost>=1
lightgbm
sphinx==5.3.0
nbsphinx
numpydoc>=0.9.0
git+https://github.com/pandas-dev/pydata-sphinx-theme.git@master
furo

View File

@ -58,9 +58,9 @@ release = version
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.doctest",
"sphinx.ext.extlinks",
"numpydoc",
"matplotlib.sphinxext.plot_directive",
"sphinx.ext.todo",
"nbsphinx",
@ -116,12 +116,7 @@ exclude_patterns = ["**.ipynb_checkpoints"]
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = "pydata_sphinx_theme"
html_theme_options = {
"external_links": [],
"github_url": "https://github.com/elastic/eland",
"twitter_url": "https://twitter.com/elastic",
}
html_theme = "furo"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,

View File

@ -167,7 +167,7 @@ Configuring PyCharm And Running Tests
- Install development requirements. Open terminal in virtual environment and run
.. code-block:: bash
`pip install -r requirements-dev.txt`
pip install -r requirements-dev.txt
- Setup Elasticsearch instance with docker
.. code-block:: bash
@ -200,7 +200,7 @@ Configuring PyCharm And Running Tests
- To test specific versions of Python run
.. code-block:: bash
nox -s test-3.8
nox -s test-3.12
Documentation

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,6 +1,6 @@
eland.DataFrame.agg
eland.DataFrame.agg
===================
.. currentmodule:: eland
.. automethod:: DataFrame.agg
.. automethod:: DataFrame.agg

View File

@ -1,6 +1,6 @@
eland.DataFrame.aggregate
eland.DataFrame.aggregate
=========================
.. currentmodule:: eland
.. automethod:: DataFrame.aggregate
.. automethod:: DataFrame.aggregate

View File

@ -1,6 +1,6 @@
eland.DataFrame.columns
eland.DataFrame.columns
=======================
.. currentmodule:: eland
.. autoattribute:: DataFrame.columns
.. autoproperty:: DataFrame.columns

View File

@ -1,6 +1,6 @@
eland.DataFrame.count
eland.DataFrame.count
=====================
.. currentmodule:: eland
.. automethod:: DataFrame.count
.. automethod:: DataFrame.count

View File

@ -1,6 +1,6 @@
eland.DataFrame.describe
eland.DataFrame.describe
========================
.. currentmodule:: eland
.. automethod:: DataFrame.describe
.. automethod:: DataFrame.describe

View File

@ -1,6 +1,6 @@
eland.DataFrame.drop
eland.DataFrame.drop
====================
.. currentmodule:: eland
.. automethod:: DataFrame.drop
.. automethod:: DataFrame.drop

View File

@ -1,6 +1,6 @@
eland.DataFrame.dtypes
eland.DataFrame.dtypes
======================
.. currentmodule:: eland
.. autoattribute:: DataFrame.dtypes
.. autoproperty:: DataFrame.dtypes

View File

@ -1,6 +1,6 @@
eland.DataFrame.empty
eland.DataFrame.empty
=====================
.. currentmodule:: eland
.. autoattribute:: DataFrame.empty
.. autoproperty:: DataFrame.empty

View File

@ -1,6 +1,6 @@
eland.DataFrame.es_dtypes
=========================
eland.DataFrame.es\_dtypes
==========================
.. currentmodule:: eland
.. autoattribute:: DataFrame.es_dtypes
.. autoproperty:: DataFrame.es_dtypes

View File

@ -1,6 +1,6 @@
eland.DataFrame.es_info
=======================
eland.DataFrame.es\_info
========================
.. currentmodule:: eland
.. automethod:: DataFrame.es_info
.. automethod:: DataFrame.es_info

View File

@ -1,6 +1,6 @@
eland.DataFrame.es_match
========================
eland.DataFrame.es\_match
=========================
.. currentmodule:: eland
.. automethod:: DataFrame.es_match
.. automethod:: DataFrame.es_match

View File

@ -1,6 +1,6 @@
eland.DataFrame.es_query
========================
eland.DataFrame.es\_query
=========================
.. currentmodule:: eland
.. automethod:: DataFrame.es_query
.. automethod:: DataFrame.es_query

View File

@ -1,6 +1,6 @@
eland.DataFrame.filter
eland.DataFrame.filter
======================
.. currentmodule:: eland
.. automethod:: DataFrame.filter
.. automethod:: DataFrame.filter

View File

@ -1,6 +1,6 @@
eland.DataFrame.get
eland.DataFrame.get
===================
.. currentmodule:: eland
.. automethod:: DataFrame.get
.. automethod:: DataFrame.get

View File

@ -1,6 +1,6 @@
eland.DataFrame.groupby
eland.DataFrame.groupby
=======================
.. currentmodule:: eland
.. automethod:: DataFrame.groupby
.. automethod:: DataFrame.groupby

View File

@ -1,6 +1,6 @@
eland.DataFrame.head
eland.DataFrame.head
====================
.. currentmodule:: eland
.. automethod:: DataFrame.head
.. automethod:: DataFrame.head

View File

@ -1,8 +1,6 @@
eland.DataFrame.hist
eland.DataFrame.hist
====================
.. currentmodule:: eland
.. automethod:: DataFrame.hist
.. image:: eland-DataFrame-hist-1.png
.. automethod:: DataFrame.hist

View File

@ -1,6 +1,6 @@
eland.DataFrame.idxmax
========================
eland.DataFrame.idxmax
======================
.. currentmodule:: eland
.. automethod:: DataFrame.idxmax
.. automethod:: DataFrame.idxmax

View File

@ -1,6 +1,6 @@
eland.DataFrame.idxmin
========================
eland.DataFrame.idxmin
======================
.. currentmodule:: eland
.. automethod:: DataFrame.idxmin
.. automethod:: DataFrame.idxmin

View File

@ -1,6 +1,6 @@
eland.DataFrame.index
eland.DataFrame.index
=====================
.. currentmodule:: eland
.. autoattribute:: DataFrame.index
.. autoproperty:: DataFrame.index

View File

@ -1,6 +1,6 @@
eland.DataFrame.info
eland.DataFrame.info
====================
.. currentmodule:: eland
.. automethod:: DataFrame.info
.. automethod:: DataFrame.info

View File

@ -1,6 +1,6 @@
eland.DataFrame.iterrows
eland.DataFrame.iterrows
========================
.. currentmodule:: eland
.. automethod:: DataFrame.iterrows
.. automethod:: DataFrame.iterrows

View File

@ -1,6 +1,6 @@
eland.DataFrame.itertuples
eland.DataFrame.itertuples
==========================
.. currentmodule:: eland
.. automethod:: DataFrame.itertuples
.. automethod:: DataFrame.itertuples

View File

@ -1,6 +1,6 @@
eland.DataFrame.keys
eland.DataFrame.keys
====================
.. currentmodule:: eland
.. automethod:: DataFrame.keys
.. automethod:: DataFrame.keys

View File

@ -1,6 +1,6 @@
eland.DataFrame.mad
eland.DataFrame.mad
===================
.. currentmodule:: eland
.. automethod:: DataFrame.mad
.. automethod:: DataFrame.mad

View File

@ -1,6 +1,6 @@
eland.DataFrame.max
eland.DataFrame.max
===================
.. currentmodule:: eland
.. automethod:: DataFrame.max
.. automethod:: DataFrame.max

View File

@ -1,6 +1,6 @@
eland.DataFrame.mean
eland.DataFrame.mean
====================
.. currentmodule:: eland
.. automethod:: DataFrame.mean
.. automethod:: DataFrame.mean

View File

@ -1,6 +1,6 @@
eland.DataFrame.median
eland.DataFrame.median
======================
.. currentmodule:: eland
.. automethod:: DataFrame.median
.. automethod:: DataFrame.median

View File

@ -1,6 +1,6 @@
eland.DataFrame.min
eland.DataFrame.min
===================
.. currentmodule:: eland
.. automethod:: DataFrame.min
.. automethod:: DataFrame.min

View File

@ -1,6 +1,6 @@
eland.DataFrame.ndim
eland.DataFrame.ndim
====================
.. currentmodule:: eland
.. autoattribute:: DataFrame.ndim
.. autoproperty:: DataFrame.ndim

View File

@ -1,6 +1,6 @@
eland.DataFrame.nunique
eland.DataFrame.nunique
=======================
.. currentmodule:: eland
.. automethod:: DataFrame.nunique
.. automethod:: DataFrame.nunique

View File

@ -1,6 +1,6 @@
eland.DataFrame.query
eland.DataFrame.query
=====================
.. currentmodule:: eland
.. automethod:: DataFrame.query
.. automethod:: DataFrame.query

View File

@ -1,18 +1,76 @@
eland.DataFrame
================
eland.DataFrame
===============
.. currentmodule:: eland
.. autoclass:: DataFrame
.. automethod:: __init__
.. rubric:: Methods
..
HACK -- the point here is that we don't want this to appear in the output, but the autosummary should still generate the pages.
.. autosummary::
:toctree:
DataFrame.abs
DataFrame.add
~DataFrame.__init__
~DataFrame.agg
~DataFrame.aggregate
~DataFrame.count
~DataFrame.describe
~DataFrame.drop
~DataFrame.es_info
~DataFrame.es_match
~DataFrame.es_query
~DataFrame.filter
~DataFrame.get
~DataFrame.groupby
~DataFrame.head
~DataFrame.hist
~DataFrame.idxmax
~DataFrame.idxmin
~DataFrame.info
~DataFrame.iterrows
~DataFrame.itertuples
~DataFrame.keys
~DataFrame.mad
~DataFrame.max
~DataFrame.mean
~DataFrame.median
~DataFrame.min
~DataFrame.mode
~DataFrame.nunique
~DataFrame.quantile
~DataFrame.query
~DataFrame.sample
~DataFrame.select_dtypes
~DataFrame.std
~DataFrame.sum
~DataFrame.tail
~DataFrame.to_csv
~DataFrame.to_html
~DataFrame.to_json
~DataFrame.to_numpy
~DataFrame.to_pandas
~DataFrame.to_string
~DataFrame.var
.. rubric:: Attributes
.. autosummary::
~DataFrame.columns
~DataFrame.dtypes
~DataFrame.empty
~DataFrame.es_dtypes
~DataFrame.index
~DataFrame.ndim
~DataFrame.shape
~DataFrame.size
~DataFrame.values

View File

@ -1,6 +1,6 @@
eland.DataFrame.sample
eland.DataFrame.sample
======================
.. currentmodule:: eland
.. automethod:: DataFrame.sample
.. automethod:: DataFrame.sample

View File

@ -1,6 +1,6 @@
eland.DataFrame.select_dtypes
=============================
eland.DataFrame.select\_dtypes
==============================
.. currentmodule:: eland
.. automethod:: DataFrame.select_dtypes
.. automethod:: DataFrame.select_dtypes

View File

@ -1,6 +1,6 @@
eland.DataFrame.shape
eland.DataFrame.shape
=====================
.. currentmodule:: eland
.. autoattribute:: DataFrame.shape
.. autoproperty:: DataFrame.shape

View File

@ -1,6 +1,6 @@
eland.DataFrame.size
eland.DataFrame.size
====================
.. currentmodule:: eland
.. autoattribute:: DataFrame.size
.. autoproperty:: DataFrame.size

View File

@ -1,6 +1,6 @@
eland.DataFrame.std
eland.DataFrame.std
===================
.. currentmodule:: eland
.. automethod:: DataFrame.std
.. automethod:: DataFrame.std

View File

@ -1,6 +1,6 @@
eland.DataFrame.sum
eland.DataFrame.sum
===================
.. currentmodule:: eland
.. automethod:: DataFrame.sum
.. automethod:: DataFrame.sum

View File

@ -1,6 +1,6 @@
eland.DataFrame.tail
eland.DataFrame.tail
====================
.. currentmodule:: eland
.. automethod:: DataFrame.tail
.. automethod:: DataFrame.tail

View File

@ -1,6 +1,6 @@
eland.DataFrame.to_csv
======================
eland.DataFrame.to\_csv
=======================
.. currentmodule:: eland
.. automethod:: DataFrame.to_csv
.. automethod:: DataFrame.to_csv

View File

@ -1,6 +1,6 @@
eland.DataFrame.to_html
=======================
eland.DataFrame.to\_html
========================
.. currentmodule:: eland
.. automethod:: DataFrame.to_html
.. automethod:: DataFrame.to_html

View File

@ -0,0 +1,6 @@
eland.DataFrame.to\_json
========================
.. currentmodule:: eland
.. automethod:: DataFrame.to_json

View File

@ -1,6 +1,6 @@
eland.DataFrame.to_numpy
========================
eland.DataFrame.to\_numpy
=========================
.. currentmodule:: eland
.. automethod:: DataFrame.to_numpy
.. automethod:: DataFrame.to_numpy

View File

@ -1,6 +1,6 @@
eland.DataFrame.to_pandas
=========================
eland.DataFrame.to\_pandas
==========================
.. currentmodule:: eland
.. automethod:: DataFrame.to_pandas
.. automethod:: DataFrame.to_pandas

View File

@ -1,6 +1,6 @@
eland.DataFrame.to_string
=========================
eland.DataFrame.to\_string
==========================
.. currentmodule:: eland
.. automethod:: DataFrame.to_string
.. automethod:: DataFrame.to_string

View File

@ -1,6 +1,6 @@
eland.DataFrame.values
eland.DataFrame.values
======================
.. currentmodule:: eland
.. autoattribute:: DataFrame.values
.. autoproperty:: DataFrame.values

View File

@ -1,6 +1,6 @@
eland.DataFrame.var
eland.DataFrame.var
===================
.. currentmodule:: eland
.. automethod:: DataFrame.var
.. automethod:: DataFrame.var

View File

@ -1,6 +1,33 @@
eland.Index
eland.Index
===========
.. currentmodule:: eland
.. autoclass:: Index
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~Index.__init__
~Index.es_info
.. rubric:: Attributes
.. autosummary::
~Index.ID_INDEX_FIELD
~Index.ID_SORT_FIELD
~Index.es_index_field
~Index.is_source_field
~Index.sort_field

View File

@ -1,6 +1,6 @@
eland.Series.add
eland.Series.add
================
.. currentmodule:: eland
.. automethod:: Series.add
.. automethod:: Series.add

View File

@ -1,6 +1,6 @@
eland.Series.describe
eland.Series.describe
=====================
.. currentmodule:: eland
.. automethod:: Series.describe
.. automethod:: Series.describe

View File

@ -1,6 +1,6 @@
eland.Series.div
eland.Series.div
================
.. currentmodule:: eland
.. automethod:: Series.div
.. automethod:: Series.div

View File

@ -1,6 +1,6 @@
eland.Series.divide
eland.Series.divide
===================
.. currentmodule:: eland
.. automethod:: Series.divide
.. automethod:: Series.divide

View File

@ -1,6 +1,6 @@
eland.Series.dtype
eland.Series.dtype
==================
.. currentmodule:: eland
.. autoattribute:: Series.dtype
.. autoproperty:: Series.dtype

View File

@ -1,6 +1,6 @@
eland.Series.dtypes
eland.Series.dtypes
===================
.. currentmodule:: eland
.. autoattribute:: Series.dtypes
.. autoproperty:: Series.dtypes

View File

@ -1,6 +1,6 @@
eland.Series.empty
eland.Series.empty
==================
.. currentmodule:: eland
.. autoattribute:: Series.empty
.. autoproperty:: Series.empty

Some files were not shown because too many files have changed in this diff Show More