Compare commits

..

403 Commits

Author SHA1 Message Date
Jan Calanog
cef4710695
docs-builder: add pull-requests: write permission to docs-build workflow (#800) 2025-06-23 15:39:36 +04:00
Quentin Pradet
44ead02b05
Fix lint (#798) 2025-06-05 15:52:19 +04:00
Miguel Grinberg
cb7c4fb122
Update README.md (#796)
Update Pandas support to include v2
2025-05-16 15:56:20 +01:00
Quentin Pradet
9e8f164677
Release 9.0.1 2025-04-30 17:25:32 +04:00
Quentin Pradet
3c3ffd7403
Forbid Elasticsearch 8 client or server (#780) 2025-04-30 16:25:33 +04:00
David Kyle
f5c2dcfc9d
Remove version checks in test (#792) 2025-04-30 16:24:05 +04:00
David Kyle
878cde6126
Upgrade PyTorch to 2.5.1 (#785)
PyTorch was upgraded to 2.5.1 in ml-cpp on the 8.18 and 9.0 branches in elastic/ml-cpp#2800
2025-04-30 10:57:45 +01:00
Mark J. Hoy
ec45c395fd
add 9.0.1 for LTR rescoring (#790) 2025-04-25 08:19:23 -04:00
Quentin Pradet
00dc55b3bd
Update instructions to run ML tests with Elasticsearch (#781)
* Update instructions to run ML tests with Elasticsearch

* Update CONTRIBUTING.md

Co-authored-by: David Kyle <david.kyle@elastic.co>

---------

Co-authored-by: David Kyle <david.kyle@elastic.co>
2025-04-24 15:42:00 +04:00
Quentin Pradet
8147eb517a
Allow lightgbm 4.6.0 (#782) 2025-04-24 15:40:39 +04:00
Quentin Pradet
4728d9b648
Run PyTorch tests on 3.12 too (#779)
PyTorch 2.3.1 does support Python 3.12.
2025-04-24 14:26:50 +04:00
Mark J. Hoy
51a2b9cc19
Add 9.1.0 Snapshot to Build and Fix test_ml_model Tests to Normalized Expected Scores if Min Score is Less Than Zero (#777)
* normalized expected scores if min is < 0

* only normalize scores for ES after 8.19+ / 9.1+

* add 9.1.0 snapshot to build matrix

* get min score from booster trees

* removing typing on function definition

* properly flatten our tree leaf scores

* simplify getting min score

* debugging messages

* get all the matches in better way

* Fix model score normalization.

* lint

* lint again

* lint; correct return for bounds map/list

* revert to Aurelian's fix

* re-lint :/

---------

Co-authored-by: Aurelien FOUCRET <aurelien.foucret@elastic.co>
2025-04-23 15:53:32 +00:00
David Kyle
a9c36927f6
Fix tokeniser for DeBERTa models (#769) 2025-04-23 09:10:02 +01:00
Quentin Pradet
87380ef716
Release 9.0.0
Co-authored-by: Miguel Grinberg <miguel.grinberg@gmail.com>
2025-04-16 15:21:04 +04:00
Quentin Pradet
9ca76d7888
Revert "Release 8.18.0" (#774)
This reverts commit ced3cdfe32bd04e3d127b18f66f9b143b2956564.
2025-04-16 14:53:51 +04:00
Quentin Pradet
ced3cdfe32
Release 8.18.0 2025-04-15 20:52:30 +04:00
kosabogi
87379c53de
[DOCS] Clean up CLI examples in ML docs (#766)
* [DOCS] Clean up CLI examples in ML docs

* Fixes spaces

* Rebuild for testing copy-paste
2025-04-07 10:06:37 +02:00
Paulo
1ddae81769
Update the documentation to reflect the partial support of eland/sckitlearn (#768) 2025-04-03 15:56:23 +02:00
Colleen McGinnis
9302bef7db
remove unused substitutions (#763) 2025-03-21 09:24:09 -05:00
Colleen McGinnis
ca64672fd7
[docs] Migrate docs from AsciiDoc to Markdown (#762)
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
2025-02-26 17:48:16 +01:00
Colleen McGinnis
6692251d9e
add the new ci checks (#761) 2025-02-26 16:40:43 +01:00
David Kyle
ee4d701aa4
Upgrade transformers to 4.47 (#752)
The upgrade fixes a crash tracing the baai/bge-m3 model
2025-02-12 17:30:45 +00:00
Quentin Pradet
acdeeeded2
Allow nox 2025.02.09 (#754) 2025-02-12 16:33:59 +04:00
Quentin Pradet
8350f06ea8
Fix pipeline labels (#751) 2025-02-12 15:07:51 +04:00
Quentin Pradet
e846fb7697
Add backport action (#750) 2025-02-12 15:07:43 +04:00
Quentin Pradet
c4ac64e3a0
Allow scikit-learn 1.5 to address CVE-2024-5206 (#729) 2025-02-12 14:34:13 +04:00
Jan Calanog
214c4645e9
github-action: Add AsciiDoc freeze warning (#748)
* github-action: Add AsciiDoc freeze warning

* Update .github/workflows/comment-on-asciidoc-changes.yml
2025-02-12 07:45:07 +04:00
Quentin Pradet
871e52b37a
Pin nox to avoid session.env issue (#753) 2025-02-11 18:36:57 +04:00
Quentin Pradet
aa5196edee
Switch to black's 2025 code style (#749) 2025-02-11 14:57:16 +04:00
Bart Broere
75c57b0775
Support Pandas 2 (#742)
* Fix test setup to match pandas 2.0 demands

* Use the now deprecated _append method

(Better solution might exist)

* Deal with numeric_only being removed in metrics test

* Skip mad metric for other pandas versions

* Account for differences between pandas versions in describe methods

* Run black

* Check Pandas version first

* Mirror behaviour of installed Pandas version when running value_counts

* Allow passing arguments to the individual asserters

* Fix for method _construct_axes_from_arguments no longer existing

* Skip mad metric if it does not exist

* Account for pandas 2.0 timestamp default behaviour

* Deal with empty vs other inferred data types

* Account for default datetime precision change

* Run Black

* Solution for differences in inferred_type only

* Fix csv and json issues

* Skip two doctests

* Passing a set as indexer is no longer allowed

* Don't validate output where it differs between Pandas versions in the environment

* Update test matrix and packaging metadata

* Update version of Python in the docs

* Update Python version in demo notebook

* Match noxfile

* Symmetry

* Fix trailing comma in JSON

* Revert some changes in setup.py to fix building the documentation

* Revert "Revert some changes in setup.py to fix building the documentation"

This reverts commit ea9879753129d8d8390b3cbbce57155a8b4fb346.

* Use PANDAS_VERSION from eland.common

* Still skip the doctest, but make the output pandas 2 instead of 1

* Still skip doctest, but switch to pandas 2 output

* Prepare for pandas 3

* Reference the right column

* Ignore output in tests but switch to pandas 2 output

* Add line comment about NBVAL_IGNORE_OUTPUT

* Restore missing line and add stderr cell

* Use non-private method instead

* Fix indentation and parameter issues

* If index is not specified, and pandas 1 is present, set it to True

From pandas 2 and upwards, index is set to None by default

* Run black

* Newer version of black might have different opinions?

* Add line comment

* Remove unused import

* Add reason for ignore statement

* Add reason for skip

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2025-02-04 17:43:43 +04:00
Valeriy Khakhutskyy
77589b26b8
Remove ML model export as sklearn Pipeline and clean up code (#744)
* Revert "[ML] Export ML model as sklearn Pipeline (#509)"

This reverts commit 0576114a1d886eafabca3191743a9bea9dc20b1a.

* Keep useful changes

* formatting

* Remove obsolete test matrix configuration and update version references in documentation and Noxfile

* formatting

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2025-02-04 11:36:50 +04:00
Bart Broere
9b5badb941
Drop Python 3.8 support and introduce Python 3.12 CI/CD (#743) 2025-01-22 21:55:57 +04:00
Quentin Pradet
f99adce23f
Build documentation using Docker again (#746) 2025-01-14 18:16:39 +04:00
Quentin Pradet
7774a506ae
Release 8.17.0 2025-01-07 10:58:59 +04:00
Dai Sugimori
82492fe771
Expansion support (#740) 2024-11-23 00:20:58 +09:00
Quentin Pradet
04102f2a4e
Release 8.16.0 2024-11-14 09:07:39 +04:00
Valeriy Khakhutskyy
9aec8fc751
Add deprecation warning for ESGradientBoostingModel subclasses (#738)
Introduce a warning indicating that exporting data frame analytics models as ESGradientBoostingModel subclasses is deprecated and will be removed in version 9.0.0.

The implementation of ESGradientBoostingModel relies on importing undocumented private classes that were changed in 1.4 to https://github.com/scikit-learn/scikit-learn/pull/26278. This dependency makes the code difficult to maintain, while the functionality is not widely used by users. Therefore, we will deprecate this functionality in 8.16 and remove it completely in 9.0.0. 

---------

Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-11-11 14:26:11 +01:00
Quentin Pradet
79d9a6ae29
Release 8.15.4 2024-10-18 10:52:52 +04:00
Quentin Pradet
939f4d672c
Revert "Add feedback request to README" (#735) 2024-10-18 08:06:42 +04:00
Quentin Pradet
1312e96220
Revert "Allow reading Elasticsearch certs in Wolfi image" (#734)
This reverts commit 5dabe9c0996e62d8bf4b493dcea7d4bc161dead4.
2024-10-11 16:52:41 +04:00
Quentin Pradet
2916b51fa7
Release 8.15.3 2024-10-09 16:16:52 +04:00
Quentin Pradet
5dabe9c099
Allow reading Elasticsearch certs in Wolfi image (#732)
The config/certs directory of Elasticsearch is not readable by other
users and groups. This work in the public image, which uses the root
user, but the Wolfi image does not. Using the same user id fixes the
problem.
2024-10-09 15:37:05 +04:00
Max Hniebergall
06b65e211e
Add support for DeBERTa-V2 tokenizer (#717) 2024-10-03 14:04:19 -04:00
Quentin Pradet
a45c7bc357
Release 8.15.2 2024-10-02 13:54:03 +04:00
Quentin Pradet
d1e533ffb9
Fix Docker image build on Linux (#728)
* Fix Docker image build on Linux

* Build Docker images in CI

* Fix bash syntax

* Only load, not push

* Parallelize docker build

It's currently the slowest step.

* Only build Linux images
2024-10-02 10:33:35 +04:00
Quentin Pradet
a83ce20fcc
Release 8.15.1 2024-10-01 15:31:24 +04:00
David Kyle
03af8a6319
Fix path in docker model upload example (#726) 2024-10-01 08:53:28 +01:00
David Kyle
5253501704
Upgrade PyTorch to version 2.3.1 (#718)
Upgrades the PyTorch, transformers and sentence transformer requirements.
Elasticsearch has upgraded to PyTorch to 2.3.1 in 8.16 and 8.15.2. For 
compatibility reasons Eland will refuse to upload to an Elasticsearch cluster 
that has is using an earlier version of PyTorch.
2024-09-30 10:22:02 +01:00
David Kyle
ec66b5f320
Add ES 8.16 and 8.15.2 to test matrix (#725) 2024-09-27 13:37:31 +01:00
Quentin Pradet
64d05e4c68
Restore public Dockerfile (#722) 2024-09-25 12:49:46 +04:00
Quentin Pradet
f79180be42
Migrate to Wolfi base Docker image (#720) 2024-09-03 18:02:08 +04:00
Miguel Grinberg
0ce3db26e8
Release 8.15.0 (#715)
* Release 8.15.0

* update release notes
2024-08-13 09:47:48 +01:00
David Kyle
5a76f826df
Add note about using text_similarity for rerank to the CLI (#716) 2024-08-12 14:40:12 +01:00
David Kyle
fd8886da6a
Default truncation to second for text similarity the task type(#713)
In reranking the first input (the query) is generally shorter. In this case
it makes more sense to truncate the second input (the document text)
2024-08-05 11:47:15 +01:00
Aurélien FOUCRET
bee6d0e1f7
Remove input fields from exported LTR models (#708) 2024-07-05 14:31:22 +02:00
Bart Broere
f18aa35e8e
Deal with the possibility of lists (#707) 2024-06-28 22:25:47 +04:00
Quentin Pradet
56a46d0f85
Rename Buildkite team from clients-team to devtools-team (#702) 2024-06-12 11:39:25 +04:00
Quentin Pradet
c497683064
Quote remaining eland[pytorch] for ZSH users (#701) 2024-06-10 16:50:03 +00:00
Quentin Pradet
0ddc21b895
Release 8.14.0 2024-06-10 15:56:43 +04:00
István Zoltán Szabó
5a3e7d78b3
[DOCS] Completes the list of available NLP task types. (#699) 2024-06-10 12:30:07 +02:00
Bart Broere
1014ecdb39
Fix non _source fields missing from the result hits (#693) 2024-06-10 11:09:52 +04:00
David Kyle
632074c0f0
Make eland_import_hub_model script compatible with serverless (#698)
Checks for build_flavor == serverless rather than a version
2024-06-07 14:46:12 +01:00
Bart Broere
35a96ab3f0
Fix missing method str.removeprefix in Python 3.8 (#695) 2024-05-24 10:25:04 +04:00
Quentin Pradet
116416b3e8
Stop duplicating requirements (#691) 2024-05-14 15:59:39 +04:00
Ashok Kumar
5b728c29c1
Replace check for Elasticsearch to str/list in ensure_es_client (#690) 2024-05-04 09:01:31 +04:00
Quentin Pradet
e76b32eee2
Release 8.13.1 2024-05-03 09:20:45 +04:00
Quentin Pradet
fd38e26df1
Support HTTP proxies in eland_import_hub_model (#688)
* Document TLS/SSL options for import script

* Mention --help option

* Add HTTP proxy support

* Mention HTTP_PROXY too

---------

Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-05-02 21:03:44 +04:00
Quentin Pradet
f7f6e0aba9
Document TLS/SSL options for import script (#667) 2024-05-02 18:06:40 +04:00
Aurélien FOUCRET
9cea2385e6
Work around LTR model cache in tests (#685) 2024-04-08 14:00:36 +04:00
Quentin Pradet
1921792df8
Release 8.13.0 2024-03-27 18:18:21 +04:00
David Kyle
c16e36c051
Add Python 3.11 to support matrix (#681) 2024-03-27 10:34:35 +00:00
David Kyle
ae0bba34c6
Upgrade torch to 2.1.2 (#671)
Compatible with Elasticsearch 8.13 where the same upgrade has been made
2024-03-26 10:06:50 +00:00
Iulia Feroli
aaec995b1b
Update overview.asciidoc to replace tuple reference to API Key (#678) 2024-03-21 15:31:19 +04:00
Iulia Feroli
de83f3f905
Improve PyTorch installation instructions (#677) 2024-03-21 14:21:32 +04:00
David Kyle
8e8c49ddbf
Mute the Learning to Rank tests (#676) 2024-03-21 10:13:31 +00:00
David Kyle
5d34dc3cc4
Add override option to specify the model's max input size(#674)
If the max input size cannot be found in the configuration the user
can specify it as a parameter to the eland_import_hub_model script
2024-03-20 10:02:43 +00:00
Bart Broere
9b335315bb
Mirror pandas' to_csv lineterminator instead of line_terminator (#595)
* Mirror pandas' to_csv lineterminator instead of line_terminator

(even though it looks a little weird perhaps)

* Remove squeeze argument

* Revert "Merge branch 'remove-squeeze-argument' into patch-2"

This reverts commit 8b9ab5647e244d78ec3471b80ee7c42e019cf347.

* Don't remove the parameter yet since people might use it

* Add pending deprecation warning

---------

Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-02-23 14:23:58 +04:00
Quentin Pradet
28eda95ba9
Add feedback request to README (#665) 2024-02-15 15:23:45 +04:00
Quentin Pradet
f4b30753ad
Fix CI badge in README (#664) 2024-02-15 15:14:16 +04:00
Bart Broere
33cf029efe
Implement eland.DataFrame.to_json (#661)
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-02-15 11:32:54 +04:00
Aurélien FOUCRET
9d492b03aa
Release 8.12.1
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2024-02-01 10:50:18 +04:00
Quentin Pradet
fd2ceab846
Run Buildkite docs jobs in pull requests from forks (#652) 2024-01-31 20:55:19 +04:00
Quentin Pradet
02190e74e7
Switch to 2024 black style (#657) 2024-01-31 14:47:19 +04:00
Aurélien FOUCRET
2a6a4b1f06
Fix missing value support for XGBRanker. (#654)
* Fix missing value support for XGBRanker.

* lint

* Sort expected scores

* lint
2024-01-23 18:42:24 +01:00
Quentin Pradet
1190364abb
Release 8.12.0 2024-01-19 12:42:45 +04:00
David Kyle
64216d44fb
Add prefix_string config option to the import model hub script (#642) 2024-01-19 12:06:57 +04:00
Liam Thompson
0a6e3db157
[DOCS] Make online retail notebook runnable in Colab (#641)
* Make online retail notebook runnable in Colab

* Fix broken query
2024-01-18 15:55:20 +04:00
Aurélien FOUCRET
5169cc926a
Improve LTR (#651)
* Ensure the feature logger is using NaN for non matching query feature extractors (consistent with ES).

* Default score is None instead of 0.

* LTR model import API improvements.

* Fix feature logger tests.

* Fix export in eland.ml.ltr

* Apply suggestions from code review

Co-authored-by: Adam Demjen <demjened@gmail.com>

* Fix supported models for LTR

---------

Co-authored-by: Adam Demjen <demjened@gmail.com>
2024-01-17 13:01:47 +04:00
Aurélien FOUCRET
d2291889f8
Fix typo (#650) 2024-01-12 09:34:09 -05:00
Aurélien FOUCRET
d3ed669a5e
LTR feature logger (#648) 2024-01-12 13:52:04 +01:00
Adam Demjen
926f0b9b5c
Add XGBRanker and transformer (#649)
* Add XGBRanker and transformer

* Map XGBoostRegressorTransformer to XGBRanker

* Add unit tests

* Remove unused import

* Revert addition of type

* Update function comment

* Distinguish objective based on model class
2024-01-11 15:48:13 -05:00
Adam Demjen
840871f9d9
Accept LTR inference config when creating model (#645)
* Support for supplying inference_config

* Fix linting errors

* Add unit test

* Add LTR type, throw exception on predict, refine test

* Add search step to LTR test

* Fix linter errors

* Update rescoring assertion in test + type defs

* Fix linting error

* Remove failing assertion
2024-01-08 09:19:03 -05:00
Aurélien FOUCRET
05c5859b8a
Adding a new movie dataset to the tests. (#646) 2024-01-04 16:14:56 +01:00
Aurélien FOUCRET
0f91224daf
Add 8.12 to CI and remove 8.10 (#647) 2024-01-04 10:06:19 -05:00
Bart Broere
927acc86ad
Small cosmetic fix to the docs (#640) 2023-11-30 08:34:59 +01:00
David Kyle
6ef418f465
Release 8.11.1 2023-11-22 11:55:53 +01:00
David Kyle
081250cdec
Fix failed import of ST RoBERTa models (#637)
Fixes an error uploading the sentence-transformers/all-distilroberta-v1 model
which failed with "missing 2 required positional arguments: 'token_type_ids' 
and 'position_ids'". The cause was that the tokenizer type was not recognised 
due to a typo
2023-11-21 12:53:43 +00:00
Quentin Pradet
af26897313
Bumpy numpy and shap (#636) 2023-11-21 13:17:53 +01:00
David Kyle
add61a69ec
Update CI machine types to N2 (#634)
Use `n2-standard-2` for lint and doc builds
Use `n2-standard-4` for tests
2023-11-21 11:33:04 +00:00
David Kyle
b689759278
Skip model config tests (#635)
For #633
2023-11-21 11:07:55 +00:00
Liam Thompson
87d18bd850
Fix colab link (#632)
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2023-11-16 10:24:06 +00:00
Quentin Pradet
dfc522eb31
Allow es-doc members to trigger CI (#631) 2023-11-13 11:55:39 +01:00
Liam Thompson
508de981ff
Make demo notebook runnable in Colab (#630)
* Make demo notebook runnable in Colab

* Index using IDs starting from 0

* Trivial change to trigger CI
2023-11-10 08:44:19 +01:00
Quentin Pradet
41db37246f
Release 8.11.0 2023-11-08 11:51:14 +01:00
Valeriy Khakhutskyy
6cecb454e3
[ML] Better memory estimation for NLP models (#568)
This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference.

This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).
2023-11-06 12:18:20 +01:00
Bart Broere
28e6d92430
Stream writes in to_csv()
Co-authored-by: P. Sai Vinay <pvinay1998@gmail.com>
2023-11-06 11:39:31 +01:00
Quentin Pradet
adf0535608 Fix docs build
Some dependencies like numpy are pinned to versions that do not support
Python 3.12. Python 3.10 is the latest version supported by Eland.
2023-11-06 13:25:30 +04:00
Bart Broere
5e5f36bdf8
Deal with the mad aggregation being removed in Pandas 2 (#602) 2023-11-06 06:12:16 +01:00
David Kyle
5b3a83e7f2
[NLP] Support E5 small multi-lingual (#625)
Although E5 small is a BERT based model it takes 2 parameters to forward
not 4. Use the tokenizer type to decide the number of parameters
2023-10-31 17:49:43 +00:00
David Kyle
ab6e44f430
[NLP] Tests for NLP model configurations (#623)
Add tests for generated Elasticsearch model configurations
2023-10-19 12:39:57 +01:00
Quentin Pradet
0c0a8ab19f
Bump tested stack versions (#621) 2023-10-11 19:48:47 +02:00
Bart Broere
36b941e336
Use _append instead of append since it's still available after 2.0 of pandas (#603) 2023-10-11 15:41:05 +01:00
Quentin Pradet
6a4fd511cc
Release 8.10.1 (#620) 2023-10-11 12:56:24 +02:00
Quentin Pradet
c6ce4b2c46
Fix direct usage of TransformerModel (#619) 2023-10-11 11:56:14 +02:00
Bart Broere
48e290a927
Prepare for deprecation of is_datetime_or_timedelta_dtype in Pandas 2.0 (#592) 2023-10-10 19:37:13 +01:00
Quentin Pradet
bb0c111a68
Release Eland 8.10.0 2023-10-09 11:49:12 +02:00
Quentin Pradet
9273636026
Reduce Docker image size and support arm64 (#615)
Co-authored-by: David Olaru <dolaru@elastic.co>

* Reduce Docker image size from 4.8GB to 2.2GB

* Use torch+cpu variant if target platform is linux/amd64

Avoids downloading large & unnecessary NVIDIA deps defined in the package on PyPI

* Build linux/arm64 image using buildx and QEMU
2023-10-05 18:43:52 +04:00
Quentin Pradet
b8a7b60c03
Stop mentioning Python 3.7 and Pandas 1.13 are supported (#612) 2023-10-04 10:56:51 +02:00
Quentin Pradet
3be610b6fc
Recommend using pre-built Docker image (#614)
* Recommend using pre-built Docker image

* Update README.md

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2023-10-03 19:40:24 +02:00
Quentin Pradet
352e31ed14
Add Buildkite pipeline to push Docker image (#613)
* Add Buildkite pipeline to push Docker image

* Fix lint

* Fix Read the Docs build

* Replace distutils with packaging
2023-10-03 14:39:54 +02:00
Quentin Pradet
9d7c042bdb
Bump transformers to fix private model support (#611) 2023-09-26 14:54:23 +02:00
Enrico Zimuel
235c490e0c
Updated bullseye docker image (#610) 2023-09-26 09:53:24 +02:00
Bart Broere
3908f43905
Remove deprecated check_less_precise (#596) 2023-09-26 07:34:52 +02:00
Quentin Pradet
566bb9e990
Allow importing private HuggingFace models (#608) 2023-09-25 15:10:58 +02:00
Quentin Pradet
5ec760635b
Recommend installing Eland in a virtual environment (#606) 2023-09-22 13:14:05 +02:00
Jonathan Buttner
a8b76c390f
Setting chunk size to 1mb (#605) 2023-09-20 11:40:11 -04:00
Bart Broere
12200039f5
Fix iteritems deprecation (#593) 2023-09-19 12:00:32 +02:00
David Kyle
301cda8d69
Error measuring embedding size for some DPR models (#573)
Fixes an error unpacking a tuple that contains a single element.
2023-09-19 10:44:15 +01:00
Bart Broere
5c5ef63a69
Use the workaround if we can't determine the server's version (#581) 2023-09-15 15:29:36 +04:00
Quentin Pradet
eb69496627
Add dummy pipeline to prepare publishing a Docker image (#590) 2023-09-06 07:12:06 +02:00
Quentin Pradet
64ffbcec0f
Revert "Update Docker image to Debian 12 Bookworm (#586)" (#588) 2023-09-05 12:36:42 +04:00
Quentin Pradet
4d2c6e2f4d
Fix Buildkite builds on pull requests (#589) 2023-09-05 12:20:24 +04:00
Quentin Pradet
ea4c2d1251
Fix downloads badge URL (#587) 2023-09-05 11:57:36 +04:00
Quentin Pradet
c7a58e3783
Fix README so that copy/pastes work without warnings (#584) 2023-09-05 11:56:25 +04:00
Quentin Pradet
0be509730a
Update Docker image to Debian 12 Bookworm (#586) 2023-09-04 19:28:38 +04:00
David Kyle
95864a9ace
Update README.md with note about installing extras for NLP (#582) 2023-08-31 10:34:36 +01:00
Enrico Zimuel
f14bbaf4b0
Added build and twine to requirements-dev 2023-08-24 16:02:12 +02:00
Enrico Zimuel
ac8c7c341e
Readded author info 2023-08-24 11:18:17 +02:00
Enrico Zimuel
2304fdc593
Updated docs 2023-08-24 11:12:30 +02:00
Enrico Zimuel
ebdebdf16f
Prep for 8.9.0 release 2023-08-24 11:11:48 +02:00
Enrico Zimuel
932092c0e5
Fixed test for mean using ES 8.9.0 2023-08-24 10:46:14 +02:00
Enrico Zimuel
08b7fac32b
Updated test to ES 8.9-SNAPSHOT 2023-08-23 13:53:15 +02:00
Enrico Zimuel
bb59a4f8d6
Fixed conf test with isinstance 2023-08-22 13:23:23 +02:00
Josh Devins
f26fb8a430
Simplify embedding model support and loading (#569)
We were attempting to load SentenceTransformers by looking at the model
prefix, however SentenceTransformers can also be loaded from other
orgs in the model hub, as well as from local disk. This prefix checking
failed in those two cases. To simplify the loading logic and deciding
which wrapper to use, we’ve removed support for text_embedding tasks to
load a plain Transformer. We now only support DPR embedding models and
SentenceTransformer embedding models. If you try to load a plain
Transformer model, it will be loaded by SentenceTransformers and a mean
pooling layer will automatically be added by the SentenceTransformer
library. Since we no longer automatically support non-DPR and
non-SentenceTransformers, we should include somewhere example code for
how to load a custom model without DPR or SentenceTransformers. 

See: https://github.com/UKPLab/sentence-transformers/blob/v2.2.2/sentence_transformers/SentenceTransformer.py#L801

Resolves #531
2023-07-31 18:18:46 +02:00
Fernando Briano
7ad1f430e4
[CI] Adds buildkite pull requests configuration (#570) 2023-07-26 13:43:40 +01:00
Youhei Sakurai
4cf92fd9b7
Make eland_import_hub_model easier to find on Windows. (#559) 2023-07-20 09:24:35 +01:00
Fernando Briano
664180d93d
[CI] Removes Jenkins .ci folder (#561)
Continuing the migration to Buildkite.
2023-07-18 13:32:30 +01:00
Fernando Briano
2134c71ab4
Add Buildkite configuration (#515)
* [CI] Adds Buildkite configuration
* Removes GitHub Actions
* Moves lint and docs tasks to Buildkite
2023-07-17 14:08:41 +01:00
Youhei Sakurai
b5bcba713d
Apply black to comply with the code style (#557)
Relates https://github.com/elastic/eland/pull/552

**Issue**:

```console
C:\Users\YouheiSakurai\git\myeland>python -m black --version
python -m black, 23.3.0 (compiled: yes)
Python (CPython) 3.11.0

C:\Users\YouheiSakurai\git\myeland>python -m black --check --target-version=py38 bin\eland_import_hub_model
would reformat bin\eland_import_hub_model

Oh no! 💥 💔 💥
1 file would be reformatted.
```

**Solution**:
```
C:\Users\YouheiSakurai\git\myeland>python -m black --target-version=py38 bin\eland_import_hub_model
reformatted bin\eland_import_hub_model

All done!  🍰 
1 file reformatted.
```
2023-07-13 09:55:00 +02:00
Valeriy Khakhutskyy
77781b90ff
[ML] Update trained model inference endpoint (#556)
Infer trained model deployment API has been deprecated, so I changed the code to use the new one.
2023-07-11 10:55:11 +02:00
Valeriy Khakhutskyy
f38de0ed05
Fix failing unit tests (#558)
I updated the tree serialization format for the new scikit learn versions. I also updated the minimum requirement of scikit learn to 1.3 to ensure compatibility.

Fixes #555
2023-07-10 15:15:58 +02:00
Youhei Sakurai
5ac8a053f0
Fix No module named 'torch' (#553)
Do not import torch unless necessary
2023-07-07 09:11:11 +01:00
Youhei Sakurai
55967a7324
Minimize if main section (#554)
For migration from scripts to console_scripts in setup.py,
the current long if __name__ == "__main__": section is a 
blocker because the console_scripts requires to specify a
function as an entrypoint.
Move the logic into a main() function.
2023-07-05 10:49:16 +01:00
Dai Sugimori
bf3b092ed4
Add BertJapaneseTokenizer support with bert_ja tokenization configuration (#534)
See elasticsearch#95546
2023-06-23 08:14:27 +01:00
Seth Michael Larson
5fd1221815
Fix autosummary directive by removing hack autosummaries 2023-06-15 10:50:19 -05:00
Seth Michael Larson
17c1c2e9c7
Switch to the 'Furo' Sphinx theme 2023-06-15 09:51:14 -05:00
Benjamin Trent
8b327f60b8
[ML] add ability to upload xlm-roberta tokenized models (#518)
This allows XLMRoberta models to be uploaded to Elasticsearch.

blocked by: elastic/elasticsearch#94089
2023-06-14 07:59:28 -04:00
David Kyle
68a22a8001
Default the optional es_version parameter (#545) 2023-06-07 12:34:53 +01:00
Seth Michael Larson
afc7e41d6e
Update Dockerfile base image to use newer version 2023-06-02 14:20:01 -05:00
David Kyle
32ab988eb6
Tolerate different model output formats when measuring embedding size (#535)
Only add the embedding_size config option if the target Elasticsearch 
cluster version supports it
2023-05-25 12:25:31 -05:00
David Kyle
7ca8376f68
Add Elasticsearch 8.8 snapshot to test matrix (#543)
And increase the test ES node heap size to prevent circuit 
breaker exceptions due to better memory accounting in
elastic/elasticsearch#89437.
2023-05-24 11:59:41 +01:00
István Zoltán Szabó
e0c08e42a0
[DOCS] Adds instructions on model install in air-gapped env (#542)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-05-24 12:53:04 +02:00
David Kyle
1e6f48f8f4
Generate valid NLP model id from file path (#541)
The eland_import_hub_model script supports uploading a local file where
the --hub-model-id argument is a file path. If the --es-model-id option is
not used the model Id is generated from the hub model id and when that 
is a file path the path must be converted to a valid elasticsearch model id.
2023-05-22 15:37:36 +01:00
David Kyle
7820a31256
Limit NumPy to a range of versions and note why (#540) 2023-05-22 10:47:06 +01:00
David Kyle
36bbbe0bdb
Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model. (#522)
PyTorch models traced in version 1.13 of PyTorch cannot be evaluated in 
version 1.9 or earlier. With this upgrade Eland becomes incompatible with
pre 8.7 Elasticsearch and will refuse to upload a model to the cluster. 
In this scenario either upgrade Elasticsearch or use an earlier version of Eland.
2023-05-19 16:29:38 +01:00
David Kyle
b507bb6d6c
Restrict NumPy and Pandas versions (#539)
Shap is incompatible with NumPy 1.24 due to a deprecated usage becoming
an error. There is no fix in Shap yet so an earlier version of NumPy must
be used.
Pandas 2.0 was recently released we will continue to use the latest 1.5 release 
to avoid any incompatibilities.
2023-05-19 16:04:33 +01:00
Seth Michael Larson
f7ea3bd476
Add a compatibility layer for Elasticsearch server 8.5.0 field_caps API 2023-05-02 15:40:20 -05:00
Seth Michael Larson
ca0cbe94ea
Fix readthedocs with Python 3.8 2023-05-02 12:21:57 -05:00
David Kyle
50d301f7cb
Set embedding_size config parameter for Text Embedding models (#532) 2023-04-25 11:41:14 +01:00
David Kyle
940f2a9bad
[NLP] Add support for the pass_through task #526 2023-04-06 15:43:00 +01:00
David Kyle
8e0d897171
[NLP] Prevent TypeError with None check (#525) 2023-04-03 14:56:19 +01:00
David Roberts
cebee6406f
Include pitfall of --start in the README (#506)
Users who follow the Eland README as a guide to importing
models can easily end up seeing inexplicably poor performance
due to unknowingly running the model with one allocation and
one thread per allocation.

This change spells out the effect of `--start` and links to
alternatives that allow better use of available hardware.

Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-03-30 20:28:48 +01:00
Seth Michael Larson
44e04b4905
Release v8.7.0 2023-03-30 14:00:02 -05:00
David Kyle
7f4687c791
[ML] Text expansion model config support (#520) 2023-03-08 15:40:14 +00:00
Benjamin Trent
d5578637cb
Choose text_embedding from auto when task type is unknown but its a sentence-transfomers model (#516)
closes https://github.com/elastic/eland/issues/514
2023-02-09 12:50:30 -05:00
Valeriy Khakhutskyy
0576114a1d
[ML] Export ML model as sklearn Pipeline (#509)
Closes #503

Note: I also had to fix the Sphinx version to 5.3.0 since, starting from 6.0, Sphinx suffers from a TypeError bug, which causes a CI failure.
2023-02-01 16:17:06 +01:00
Valeriy Khakhutskyy
2ea96322b3
Update to latest ES versions and fix unit tests (#512)
Update the test matrix to the latest Elasticsearch versions and fix the broken unit tests on the CI.
2023-01-31 20:55:29 +01:00
David Kyle
c55516f376
Fixes for two type hinting issues 2023-01-04 09:53:09 -06:00
David Kyle
211cc2c83f
Handle OSError for missing LightGBM dependency
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2022-11-02 11:32:27 -05:00
Benjamin Trent
82e34dbddb
Minor formatting fix for ML docs 2022-10-20 09:47:55 -05:00
Benjamin Trent
a8c8726634
[ML] add text_similarity task support (#486)
Adds text_similarity task support. This is a cross-encoder transformer task where both sequences are given to the transformer at once.

According to 🤗 (or at least how the cross-encoder models are concerned) this is a sequence classification task with just one classification "label". But really, it isn't labeled at all and is more akin to a regression model.

related: elastic/elasticsearch#88439
2022-08-01 09:04:34 -04:00
Benjamin Trent
11ea68a443
Add docker steps for eland model upload (#489) 2022-07-21 15:27:19 -04:00
István Zoltán Szabó
fbb01e5698
[DOCS] Adds important note about PyTorch version compatibility. (#487) 2022-07-13 12:41:35 +02:00
Seth Michael Larson
c97e69410d
Release v8.3.0 2022-07-11 13:14:13 -05:00
David Kyle
0eb36faa5b
Restrict PyTorch version not to be more advanced than that used in Elasticsearch (#479)
Elasticsearch uses v1.11 of PyTorch. Models created with the latest PyTorch 
release (v1.12) are not compatible with v1.11. This pins the PyTorch version
to 1.11 to prevent the incompatibility. The version of the Elasticsearch Python
client is now required to be >= Eland.

All users of Eland for importing NLP models should upgrade.
2022-07-07 14:56:42 +01:00
Benjamin Trent
947d4d22a9
Update python example (#477) 2022-06-28 13:01:49 -04:00
David Kyle
23706e05b8
Add more exclusions to the dockerignore file 2022-06-28 10:34:02 -05:00
Benjamin Trent
8892f4fd64
[ML] adds new auto task type that attempts to automatically determine NLP task type from model config (#475)
For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture. 

This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.
2022-06-23 08:32:23 -04:00
David Kyle
8448b3ba4e
Bump minimum PyTorch version to 1.11 2022-06-21 07:43:43 -05:00
David Kyle
081c8efaa0
Freeze the traced PyTorch model 2022-06-21 07:43:18 -05:00
Benjamin Trent
ec041ffdfd
[ML] ensure quantization is applied (#472) 2022-06-15 09:23:24 -04:00
Lisa Cawley
07af00c741
[DOCS] Include missing attributes (#468)
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2022-05-31 15:50:11 -07:00
Seth Michael Larson
bbe7a70cb9 Also pin traitlets 2022-05-31 14:28:36 -07:00
Seth Michael Larson
14821a8b09 Remove 'numpydoc' to stop reformatting 2022-05-31 14:28:36 -07:00
Seth Michael Larson
673065ee42 Stop explicitly pulling master 2022-05-31 14:28:36 -07:00
Lisa Cawley
845c055d7c
[DOCS] Adds question_answering task type for eland_import_hub_model 2022-05-31 14:37:51 -05:00
Nigel Small
a4838f4d22
Ignore type checking for agg_value 2022-05-31 09:23:15 -05:00
Lisa Cawley
09dd56c399
Add authentication methods for import model script (#466) 2022-05-18 07:44:37 -07:00
Benjamin Trent
fa30246937
[ML] fixes decision tree classifier upload to account for probabilities (#465)
This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree.

The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results.

This means that our "probabilities" returned will be different than sklearn.
2022-05-17 08:11:20 -04:00
Seth Michael Larson
5bbb8e484a Release 8.2.0 2022-05-11 06:38:21 -05:00
Benjamin Trent
650e02d16e
[ML] improve general pytorch model import and add tests (#463)
This improves the user consumed functions and classes for PyTorch NLP model upload to Elasticsearch.

Previously it was difficult to wrap your own module for uploading to Elasticsearch.

This commit splits some classes out, adds new ones, and adds tests showing how to wrap some simple modules.
2022-05-05 10:50:53 -04:00
Benjamin Trent
70fadc9986
[ML] add support for question_answering NLP tasks (#457)
Adds support for `question_answering` NLP models within the pytorch model uploader.

Related: https://github.com/elastic/elasticsearch/pull/85958
2022-05-04 13:15:33 -04:00
Benjamin Trent
afe08f8107
[ML] Improve NLP model import by using nicely defined types (#459)
This adds some more definite types for our NLP tasks and tokenization configurations.

This is the first step in allowing users to more easily import their own transformer models via something other than hugging face.
2022-05-03 15:19:03 -04:00
David Olaru
3255f55d71 Fix --es-api-key argument help text 2022-04-27 15:48:22 -05:00
David Olaru
492bb9683a Add support for Cloud ID to hub model import script
The Cloud ID simplifies sending data to a cluster on Elastic Cloud.

With this change, the user will have the option specify a Cloud ID using the `--cloud-id` argument as an alternative to an Elasticsearch URL (`--url` argument).

`--cloud-id` and `--url` are mutually exclusive arguments.
2022-04-27 15:48:22 -05:00
David Olaru
fe3422100c
Hub model import script improvements (#461)
## Changes 
### Better logging
Switched from `print` statements to `logging` for a cleaner and more informative output - timestamps and log level are shown. The logging is now a bit more verbose, but it will help users to better understand what the script is doing.

### Add support for ES authentication using username/password or api key
Instead of being limited to passing credentials in the URL, there are now 2 additional methods:
- username/password using `--es-username` and `--es-password`
- API key using `--es-api-key`

Credentials can also be specified as environment variables with `ES_USERNAME`/`ES_PASSWORD` or `ES_API_KEY`

### Graceful handling of missing PyTorch requirements
In order to use the `eland_import_hub_model` script, PyTorch extras are required to be installed. If the user does not have the required packages installed, a helpful message is logged with a hint to install `eland[pytorch]` with `pip`.

### Graceful handling of already existing trained model
If a trained model with the same ID as the one we're trying to import already exists, and `--clear-previous` was not specified, we now log a clearer message about why the script can't proceed along with a hint to use the `--clear-previous` flag. 

Prior to this change, we were letting the API exception seep through and the user was faced with a stack trace.

### `tqdm` added to main dependencies
If the user doesn't have `eland[pytorch]` extras installed, the first module to be reported as missing is `tqdm`. Since this module is [used in eland codebase](8294224e34/eland/ml/pytorch/_pytorch_model.py (L24)) directly, it makes sense to me to have it as part of the main set of requirements.

### Nit: Set tqdm unit to `parts` in `_pytorch_model.put_model`
The default unit is `it`, but `parts` better describes what the progress bar is tracking - uploading trained model definition parts.
2022-04-27 15:13:58 +01:00
David Olaru
b5ea1cf228
Align dependencies between requirement files and setup.py (#460) 2022-04-27 07:14:49 -05:00
Benjamin Trent
8294224e34
[ML] Fix XGBoost model import for xgboost>=1.6 2022-04-20 09:20:50 -05:00
Seth Michael Larson
cb839a9ac9
Release 8.1.0 2022-03-31 17:12:26 -05:00
P. Sai Vinay
76a52b7947
Add support for eland.Series.unqiue() 2022-03-31 08:33:15 -05:00
Benjamin Trent
15a3007288
[ML] add roberta bart transformer upload support (#443)
Related to: https://github.com/elastic/elasticsearch/pull/84777

This allows BART and RoBERTa models to be uploaded to Elasticsearch for our currently defined NLP tasks.
2022-03-14 12:26:12 -04:00
David Kyle
5678525b15
Fix mypy type errors for elasticsearch-python v8.0.0 2022-03-08 17:50:39 -06:00
David Kyle
5c5e5af54d
Add --ca-certs and --insecure option for configuring TLS 2022-03-08 15:44:13 -06:00
Seth Michael Larson
abd05df50b
Release 8.0.0 2022-02-10 14:29:54 -06:00
Ashton Sidhu
e3bff8a623
Add option to disable schema enforcement for pandas_to_eland 2022-01-14 07:35:58 -06:00
István Zoltán Szabó
9206941659
[DOCS] Adds NLP with PyTorch section to ML-related page in Eland docs 2022-01-11 09:08:00 -06:00
Benjamin Trent
72856e2c3f
[ML] Add support for MPNet PyTorch models 2022-01-10 11:21:30 -06:00
Ashton Sidhu
64daa07a65
Using the 'date' field for datetime64+timezone columns 2022-01-04 22:03:49 -06:00
Florian Winkler
3db93cd789
Allow using datetime types in filters 2022-01-04 14:46:18 -06:00
Seth Michael Larson
c14bc24032
Release 8.0.0-beta1 2021-12-16 07:42:38 -06:00
Seth Michael Larson
ffe7c792dc
Update Notebook examples for 8.0 2021-12-15 16:01:32 -06:00
Seth Michael Larson
cd0897f5d7
Add a warning when connecting to incompatible Elasticsearch versions 2021-12-15 14:08:20 -06:00
Seth Michael Larson
109387184a
Support the v8.0 Elasticsearch client 2021-12-09 15:01:26 -06:00
Josh Devins
1ffbe002c4
Upgrade PyTorch dependencies to latest
In preparation for an 8.0 release, this updates PyTorch NLP dependencies
to more recent and latest minor versions. Amongst other things, this
introduces a fix from transformers that is helpful for text embedding
tasks with certain DPR models.

See: https://github.com/huggingface/transformers/issues/13670

Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-12-06 09:05:54 -06:00
Seth Michael Larson
e6bb917d83
Add quotes to versions in test-matrix.yml 2021-12-03 09:37:37 -06:00
Seth Michael Larson
4e489de424
Bump version to 8.0.0 2021-12-02 08:41:11 -06:00
Seth Michael Larson
f98ebd4c29
Update Jenkins jobs for 8.x and 7.x 2021-12-01 14:01:48 -06:00
Josh Devins
5bc1a824a7
Add PyTorch modules to noxfile
We added the `pytorch` module which is type checked but was not in the
noxfile as such. This change also addresses type errors that arose after
adding type checking.
2021-11-29 08:03:25 -08:00
Josh Devins
7209f61773
Adds max_length padding to transformer tracing (#411)
The padding parameter needs to be set on the tokenization call and not
in the constructor. Furthermore, the True value will only pad to the
largest input in a batch, however we don't trace with batches so this
value had no effect. The proper place to pass this parameter is in the
tokenization call itself and the proper value to use is "max_length"
which will pad the input to the maximum input size specified by the
model. Although we measure no functional or performance impact of this
setting, it has been suggested that this is a best practice.

See: https://huggingface.co/transformers/serialization.html#dummy-inputs-and-standard-lengths
2021-11-11 13:18:55 +01:00
Benjamin Trent
a3b0907c5b
[ML] Add inference results tests for PyTorch transformer models 2021-11-10 06:50:10 -06:00
Seth Michael Larson
66e3e4eaad
Set 'script.max_compilations_rate: use-context' 2021-11-02 10:09:25 -04:00
Josh Devins
1e5b475bee
Adds NLP with PyTorch basic example to README
The Machine Learning section now has two sub-sections — one for
traditional regression/classification and the other for NLP with
PyTorch. The examples show two ways to upload models from the Hugging
Face model hub.
2021-11-02 08:00:33 -05:00
Josh Devins
df51f8af07
Document how to install transitive binary dependencies, add repo Dockerfile
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-10-28 12:05:39 -05:00
Seth Michael Larson
19014f1227
Avoid DeprecationWarnings when using the new Elasticsearch client (7.15+) 2021-10-28 09:24:36 -05:00
Benjamin Trent
79b66eb6b4
Updating node type to larger ubuntu node (#404)
* Updating node type to larger ubuntu node

* adding torch location

* formatting

* formatting
;

* removing torch location specification
2021-10-25 14:48:26 -05:00
Benjamin Trent
d39c1cd784
[ML] Make eland_import_hub_model an installable script 2021-10-19 11:29:58 -05:00
P. Sai Vinay
704c8982bc
Optimize to_pandas() internally to improve performance
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-10-13 13:23:04 -05:00
James Rodewig
6088f2e39d
[DOCS] Retitle Eland Python Client docs book 2021-10-12 20:22:17 -05:00
P. Sai Vinay
f9d2defb1b
Add number_samples to sklearn MLModel 2021-10-07 08:14:54 -05:00
Josh Devins
014943d3b8
Add initial implementation of PyTorch ML models 2021-10-06 08:44:40 -05:00
P. Sai Vinay
995f2432b6
Add number_samples to LightGBM MLModel and leaf_count to leaf nodes
* Add number_samples to lightgbm ML Model

* Add leaf_count for leaf nodes
2021-09-30 08:13:44 -05:00
P. Sai Vinay
dabb327b8b
Refactor df.info() for better readability 2021-09-28 15:12:29 -05:00
P. Sai Vinay
bc201e22dd
Improve coverage for eland.dataframe 2021-09-28 15:11:57 -05:00
Seth Michael Larson
b8e192b7d0
Rename Jenkins job to 'main' 2021-09-28 10:07:16 -05:00
P. Sai Vinay
f241ae971a
Add flynt and --cov-report=term-missing 2021-09-21 11:18:01 -05:00
Seth Michael Larson
7aabc88e4a
Rename 'master' branch to 'main' 2021-09-08 11:51:50 -05:00
Jabin Kong
77f9a455e9
Fix docstring formatting 2021-09-07 11:40:19 -05:00
P. Sai Vinay
315f94b201
Add excluded lines for coverage and improve coverage 2021-09-07 11:39:19 -05:00
Seth Michael Larson
a50c3657c4
Release v7.14.1b1 2021-08-30 13:42:55 -05:00
Seth Michael Larson
7a2e845a76
Speedup CI by only installing Nox in Dockerfile 2021-08-20 08:39:02 -05:00
Jabin Kong
1aa193da9e
Add iterrows() and itertuples() to DataFrame
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-08-20 08:34:52 -05:00
Seth Michael Larson
e4f88a34a6
Yield list of hits from _search_yield_hits() instead of individual hits 2021-08-17 12:16:10 -05:00
P. Sai Vinay
011bf29816
Simplify ES->pandas logic by removing Collectors 2021-08-16 12:22:02 -05:00
Seth Michael Larson
76d83ea47f
Bump version to 7.14.0b1 2021-08-09 09:21:49 -05:00
Seth Michael Larson
b0c8434c06
Release 7.14.0b1 2021-08-09 09:11:57 -05:00
Seth Michael Larson
15ba8d3e02
Fallback on using scroll searches for Elasticsearch <7.12
PIT+search_after became universally safe in Elasticsearch 7.12 by adding an automatic sort tiebreaker field when using PITs called `_shard_doc` but now we need to do feature detection to make sure we use the previous scroll method on Elasticsearch <7.12 clusters
2021-08-08 12:19:41 -05:00
P. Sai Vinay
30876c8899
Switch to Point-in-Time with search_after instead of using scroll APIs
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-08-07 16:05:33 -05:00
P. Sai Vinay
8f84a315be
Add test case for pseudohubererror for XGBoost 2021-08-06 15:59:48 -05:00
P. Sai Vinay
d3f8d7b8f6
Optimize FieldMappings.aggregate_field_name() method 2021-08-06 11:27:59 -05:00
Seth Michael Larson
54b497ed9a
Update supported versions of Python, pandas, and Elasticsearch 2021-08-04 13:21:17 -05:00
P. Sai Vinay
823f01cc6c
Add type hints to 'eland.operations' and 'eland.ndframe' 2021-08-02 11:50:35 -05:00
P. Sai Vinay
c0e861dc77
Fix installed pandas version on Jenkins 2021-07-31 12:51:11 -05:00
P. Sai Vinay
4c1af42c14
Add idxmax and idxmin methods to DataFrame 2021-07-28 07:55:26 -05:00
Seth Michael Larson
c74fccbd74
Drop support for Python 3.6, pandas<1.2 2021-07-27 14:43:03 -05:00
P. Sai Vinay
193bcb73ef
Add support for Pandas v1.3 and LightGBM v3.x 2021-07-27 11:01:35 -05:00
P. Sai Vinay
22475cdc46
Add PANDAS_VERSION to Jenkins matrix 2021-07-26 11:17:46 -05:00
Seth Michael Larson
1555ea9534
Fix typo in version number
Should be `7.13.0b1` instead of `7.13.1b1`
2021-06-22 12:03:46 -05:00
Seth Michael Larson
16178dfb5d
Release 7.13.0b1 2021-06-22 11:59:27 -05:00
P. Sai Vinay
ac2efb5863
Optimize df.describe() to use aggregations instead of own query 2021-06-22 11:29:54 -05:00
P. Sai Vinay
5fe32a24df
Add quantile() to DataFrameGroupBy 2021-06-22 10:54:33 -05:00
P. Sai Vinay
7e8520a8ef
Remove deprecated code in XGBoost and test suite 2021-06-08 15:19:56 -05:00
P. Sai Vinay
e9c0b897f5
Add quantile() to DataFrame and Series 2021-06-08 13:02:44 -05:00
P. Sai Vinay
aa9d60e7e7
Add sort order to groupby dropna=False (#322)
* Add sort order to groupby dropna=False

* Fix rebase
2021-04-21 13:24:52 +00:00
Stephen Dodson
1040160451
Fix bugs with field mapping and lint issue (#346)
* Fix bugs with field mapping:

1. If no permission to call _mapping, return readable error
2. If index is wildcard, fix issues with user warnings

* Fixing lint issues

* Removing trailing backslashes in doc

* Remove pandas/matplotlib deprecation warning

This warning is due to a conflict between
pandas/matplotlib that may be resolved in a later
version. For now, ignore the warning so CI works.
2021-03-30 14:49:54 +00:00
Seth Michael Larson
985afe74e0
Release 7.10.1b1 2021-01-12 12:36:23 -06:00
Seth Michael Larson
26354622b5
Add more sections for elastic.co/guide 2021-01-12 10:26:01 -06:00
P. Sai Vinay
421d84fd20
Add mode() method to DataFrame and Series 2021-01-07 12:17:10 -06:00
P. Sai Vinay
27717eead1
Remove deprecated options and aliases 2021-01-04 13:20:45 -06:00
P. Sai Vinay
f89d79b1b4
Fix py.typed include in MANIFEST.in
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2020-12-30 15:07:40 -06:00
Seth Michael Larson
a552504f9b
Add support for Pandas 1.2.0 2020-12-30 14:20:36 -06:00
P. Sai Vinay
473db4576b
Move tests directory outside of eland namespace 2020-11-16 11:30:41 -06:00
P. Sai Vinay
56f6ba6c8b
Add Elasticsearch storage usage to df.info() 2020-11-16 10:07:28 -06:00
P. Sai Vinay
789f8959bc
Add support for pd.set_option("display.max_rows", None) 2020-11-06 12:23:09 -06:00
P. Sai Vinay
75451f1e93
Add pytest-cov for coverage tracking 2020-11-06 11:34:15 -06:00
P. Sai Vinay
4e92e3cf62
Fix Eland logo and update contributing documentation 2020-11-06 09:33:30 -06:00
Seth Michael Larson
31760fe02c
Release 7.10.0b1 2020-10-29 13:43:34 -05:00
Seth Michael Larson
b936e98012
Allow dict in es_type_overrides, text fields by default get keyword sub-field 2020-10-29 13:16:42 -05:00
Seth Michael Larson
cb4cd083c3
Add support for es_match() to DataFrame and Series 2020-10-29 10:16:50 -05:00
Seth Michael Larson
92a8040614
Test against Elasticsearch 7.10 2020-10-28 09:03:46 -05:00
Seth Michael Larson
ae96558075
Add source for 'elastic.co/guide' to 'docs/guide' 2020-10-28 07:57:10 -05:00
Seth Michael Larson
95b8d75e37
Fix 'Series.__repr__()' when the series is empty 2020-10-27 17:08:37 -05:00
P. Sai Vinay
54468cb85b
Add pytest --nbval of notebook examples to CI 2020-10-27 15:15:04 -05:00
P. Sai Vinay
e17b4e03ea
Error when es_type_overrides receives unknown columns 2020-10-27 13:48:31 -05:00
Seth Michael Larson
28951c0ad1
Add linting+docs to GitHub Actions, fix docs 2020-10-27 11:28:55 -05:00
Seth Michael Larson
ae70f03df3
Document DataFrame.groupby() methods 2020-10-27 10:10:57 -05:00
P. Sai Vinay
475e0f41ef
Implement DataFrameGroupBy.count() 2020-10-23 08:41:50 -05:00
Seth Michael Larson
bd7956ea72
Support typed 'elasticsearch-py' and add 'py.typed' 2020-10-20 16:26:58 -05:00
Seth Michael Larson
05a24cbe0b Add isort, rename Nox session to 'format' 2020-10-15 17:11:29 -05:00
Seth Michael Larson
18fb4af731 Document DataFrame.groupby() and rename Field.index -> .column 2020-10-15 17:11:29 -05:00
P. Sai Vinay
abc5ca927b
Add support for DataFrame.groupby() with aggregations 2020-10-15 10:52:48 -05:00
Seth Michael Larson
adafeed667
Add es_dtypes property to DataFrame and Series 2020-10-13 12:14:09 -05:00
P. Sai Vinay
b7c6c26606
Change DataFrame.filter() to preserve the order of items 2020-10-13 10:58:09 -05:00
P. Sai Vinay
0dd247b693
Improve efficiency of 'pandas_to_eland()' using 'parallel_bulk()' 2020-10-08 10:17:22 -05:00
Seth Michael Larson
225a23a59a Release 7.9.1a1 2020-09-30 11:12:33 -05:00
Seth Michael Larson
b206b851bf Remove 'include_model_definition', support ES ML <7.8 2020-09-30 11:12:33 -05:00
P. Sai Vinay
4d96ad39fd
Switch agg defaults to numeric_only=None 2020-09-22 10:32:27 -05:00
Seth Michael Larson
c86371733d
Deprecate ImportedMLModel in favor of MLModel.import_model() 2020-09-03 09:06:59 -05:00
Seth Michael Larson
1a8a301cd6
Apply Black 20.8b1 formatting 2020-08-27 15:19:56 -05:00
P. Sai Vinay
1d6311164e
Fix DataFrame.agg() with string argument to return Series 2020-08-25 12:39:34 -05:00
Seth Michael Larson
d73e8a241c
Add Conda Forge URL to shield 2020-08-18 13:24:37 -05:00
Seth Michael Larson
7180c96b80
Release 7.9.0a1 2020-08-18 11:53:40 -05:00
Seth Michael Larson
013cab0162
Fix get_feature_id() for named feature 0 2020-08-18 10:58:36 -05:00
Seth Michael Larson
4576951f37
Fix links in Implementation section 2020-08-17 16:32:48 -05:00
Seth Michael Larson
661b33dd0a Update and rearrange documentation 2020-08-17 15:55:06 -05:00
Seth Michael Larson
46533ede98 Misc documentation and name tweaks before release 2020-08-17 15:55:06 -05:00
P. Sai Vinay
66b24f9e8a
Replace MLModel(overwrite) with es_if_exists 2020-08-17 12:10:27 -05:00
Seth Michael Larson
5bf205a1e0
Fix Series.describe(), median agg dtype 2020-08-17 09:28:30 -05:00
Seth Michael Larson
f5b37e643c Update support matrix for Pandas 1.1 2020-08-14 12:55:02 -05:00
Seth Michael Larson
535ed9b334 Fix Series.median(), support median() for datetimes 2020-08-14 12:55:02 -05:00
Seth Michael Larson
a709ed589d Add 'nunique' and 'mean' aggs for datetime, improve precision of datetime aggs 2020-08-14 12:55:02 -05:00
Seth Michael Larson
d238bc5d42 Elasticsearch 7.6 only supports scalar leaf_values 2020-08-14 12:55:02 -05:00
Seth Michael Larson
92170c22d9 Add try_sort() to eland.utils
This function was deprecated and removed in Pandas v1.1
2020-08-14 12:55:02 -05:00
Seth Michael Larson
c6bf9b60a0
Change CI email to build-lang-clients@elastic.co 2020-08-13 10:24:17 -05:00
Benjamin Trent
f58634dc6e
[ML] Add support for LGBMClassifier models 2020-08-12 09:45:28 -05:00
Benjamin Trent
701a8008ad
[ML] Add tests for all supported objectives and boosters 2020-08-11 12:27:24 -05:00
Benjamin Trent
6ee282e19f
[ML] Add support for LGBMRegressor models 2020-08-11 07:42:59 -05:00
Benjamin Trent
efb9e3b4c4
[ML] Add support for multi:softmax|softprob XGBClassifier 2020-08-06 12:04:10 -05:00
Seth Michael Larson
5c901e8f1b
Create pytest fixture for testing behavior of Eland vs Pandas 2020-07-28 16:47:22 -05:00
Seth Michael Larson
140623283a
Support Series/collections in Series.isin(), add type hints 2020-07-14 11:39:52 -05:00
Seth Michael Larson
6e6ad04c5c
Use 'script.context.field.max_compilations_rate' instead of deprecated setting 2020-07-14 09:51:35 -05:00
Seth Michael Larson
6c2f9a2ed2
Add DataFrame.size and Series.size 2020-07-13 17:30:14 -05:00
Seth Michael Larson
d50e06dda5
Add webinar recording link to notebook 2020-07-10 14:21:55 -05:00
Seth Michael Larson
ceacf759c3
Add long Apache-2.0 license header to all files 2020-07-08 15:10:43 -05:00
Seth Michael Larson
5897b4587c
Add webinar example notebook, update prose in docs 2020-07-08 14:44:40 -05:00
Seth Michael Larson
de9c836c5e
Error when MLModel.predict fails, add es_compress_model_definition 2020-07-08 14:31:27 -05:00
Léonard Binet
5d0df757cf
Add column names to DataFrame.__dir__ for better auto-completion support 2020-07-02 08:49:52 -05:00
Seth Michael Larson
f63941014f Add support for es_if_exists='append' to pandas_to_eland() 2020-06-15 09:50:44 -05:00
Seth Michael Larson
ad2e012f1e Release 7.7.0a1 2020-05-20 13:58:40 -05:00
Seth Michael Larson
eff9625be1 Update docs with all new APIs 2020-05-20 13:58:40 -05:00
Seth Michael Larson
6000ea73d0
Add [DataFrame, Series].filter() 2020-05-20 12:45:30 -05:00
Daniel Mesejo-León
890cf6dc97
Add Series.isna() and Series.notna() 2020-05-19 16:12:59 -05:00
Seth Michael Larson
1378544933
Normalize and prune top-level APIs 2020-05-18 14:55:41 -05:00
Seth Michael Larson
d1444f8e09 Add Conda Forge installation instructions 2020-05-15 15:27:41 -05:00
Seth Michael Larson
6ca41585e9
Upgrade to elasticsearch-py v7.7 2020-05-14 10:07:10 -05:00
Seth Michael Larson
d2047aa51a
Make ML libraries optional, fix type issues 2020-05-14 09:31:01 -05:00
Daniel Mesejo-León
bfd0ee6f90
Fix DataFrame.shape when smaller than its SizedTask 2020-05-06 13:59:47 -05:00
Daniel Mesejo-León
94dbb36081
Add .sample() method to DataFrame and Series 2020-05-04 12:07:21 -05:00
Seth Michael Larson
def3a46af9
Fix bug when combining AndFilter with OrFilter 2020-05-04 07:39:05 -05:00
Seth Michael Larson
fa8dbe0eb4
Restore documentation requirements 2020-04-29 13:57:51 -05:00
Seth Michael Larson
3d81def5cc
Add support for xgboost v1 2020-04-29 13:06:35 -05:00
Seth Michael Larson
df2a21ffd4
Make QueryParams a dataclass 2020-04-27 16:21:26 -05:00
Seth Michael Larson
15a1977dcf
Add agg compatibility logic to Field class 2020-04-27 15:16:48 -05:00
Seth Michael Larson
7946eb4daa
Add an enforce license headers 2020-04-25 16:26:58 -05:00
Seth Michael Larson
33b4976f9a
Add type hints to base modules 2020-04-24 12:39:13 -05:00
Daniel Mesejo-León
fe6589ae6a
Change ScriptFilter from inline to source for script caching 2020-04-21 07:41:56 -05:00
Daniel Mesejo-León
a779f04a6d
Add default dtype to empty pd.Series
Suppress pandas DeprecationWarning with default dtype on empty pd.Series
2020-04-19 08:51:10 -05:00
Stephen Dodson
1bc83d15e7
Change var/std aggs to use sample instead of population 2020-04-15 14:16:12 -05:00
Seth Michael Larson
e71420c883
Release 7.6.0a5 2020-04-14 11:07:32 -05:00
Stephen Dodson
50734f8bd9
Allow user to specify es data types in read_csv and pandas_to_eland (#181)
* Allow user to specify es data types in read_csv and pandas_to_eland

Also, some minor maintenance modifications:

- replaced pandas.util.testing with pandas.testing (required in 1.x)
- updated elasticsearch-py requirements to 7.6+ (to support ML code)

* linting file
2020-04-14 15:04:12 +00:00
Seth Michael Larson
e1cacead44
Add 'inference_config' on ES >=7.8 2020-04-14 07:51:50 -05:00
Seth Michael Larson
448770df78
Restrict public API, update license header 2020-04-14 07:31:23 -05:00
Daniel Mesejo-León
e8f307d2e0
Add NDFrame.median() aggregation 2020-04-13 08:48:39 -05:00
Daniel Mesejo-León
7a1c636e56
Add NDFrame.var() and .std() aggregations 2020-04-12 15:48:13 -05:00
Seth Michael Larson
064d43b9ef
Remove eland.Client, use Elasticsearch directly 2020-04-06 07:25:25 -05:00
Seth Michael Larson
29af76101e
Fix unpacking of median aggregation 2020-04-03 07:56:09 -05:00
Daniel Mesejo-León
023a35c3b4
Add instructions for how to build docs 2020-04-03 07:53:27 -05:00
Seth Michael Larson
c8bd25cbea Add doctests to CI 2020-04-02 13:06:22 -05:00
Seth Michael Larson
7e5f0d3913 Add DataFrame.es_query() to query Elasticsearch directly 2020-04-02 13:06:22 -05:00
Seth Michael Larson
38251ddf08
No spaces in delimiters for serialized ML model 2020-04-02 07:40:51 -05:00
Stephen Dodson
71f2a3f793
Added 'use_pandas_index_for_es_ids' param to pandas_to_eland() 2020-03-31 09:20:47 -05:00
Daniel Mesejo-León
03582b9f5e
Import __version__ and other metadata by name 2020-03-30 07:45:04 -05:00
Seth Michael Larson
790e2b0de8
Update README with supported versions, pandas v1 outputs 2020-03-27 13:13:50 -05:00
Daniel Mesejo-León
e27a508c59
Update supported Pandas to v1.0 2020-03-27 12:21:15 -05:00
Seth Michael Larson
0c1d7222fe
Drop support for Python 3.5, add Black 2020-03-27 07:56:28 -05:00
Stephen Dodson
9e2997c00d
Bug/is scripted error (#149)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Minor mod to support 6.x style indices.

Currently, there is no specific test for this as
it requires a 6.x cluster. 6.x is not officially
supported by 7.x clients, but this is a friendly
option for users.

* Adding unittest for FieldMappings._extract_fields_from_mapping

* Changing to f-string formatting and adding exception test

* Reverting to OrderedDict

Will change after https://github.com/elastic/eland/pull/150 is merged.
2020-03-26 15:17:10 +00:00
Seth Michael Larson
2e74a56c0a
Release v7.6.0a4 2020-03-23 08:43:59 -05:00
Seth Michael Larson
e9a5180dac
Add python_requires to setup.py 2020-03-23 08:35:07 -05:00
Stephen Dodson
9fffbc4f39
Update README.md 2020-03-13 09:19:05 +00:00
Stephen Dodson
2c29e28a2f Updating logo 2020-03-13 09:17:56 +00:00
Stephen Dodson
43e4d03b39
Too long frame exception2 (#137)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Fixing too_long_frame_exception in scan/scroll
2020-02-28 12:49:59 +00:00
Stephen Dodson
a33ff45ebc
Too long frame exception fixes (#135)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Resolving too_long_frame_exception on large mappings

- Embedded _source parameters in bodt rather than url
- Fixed bug in DataFrame.info on empty DataFrame
- Removed warning from TestImportedMLModel

* Resolving too_long_frame_exception on large mappings

- Embedded _source parameters in bodt rather than url
- Fixed bug in DataFrame.info on empty DataFrame
- Removed warning from TestImportedMLModel
2020-02-26 12:50:14 +00:00
Stephen Dodson
206677818f Fixes to enforce xgboost==0.90
Issue raised to upgrade xgboost version
2020-02-24 09:20:36 +00:00
stevedodson
62b3133eae
7.6.0a3 (#132)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Bumping version following doc fixes

* Change ExternalMLModel to ImportedMLModel

* Bumping version
2020-02-15 20:33:33 +01:00
stevedodson
1a90e9232e
7.6.0a3 (#131)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Bumping version following doc fixes

* Change ExternalMLModel to ImportedMLModel
2020-02-15 20:29:03 +01:00
stevedodson
fa930b6cea
7.6.0a2 (#130)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Bumping version following doc fixes
2020-02-15 20:10:41 +01:00
stevedodson
163d18d84e
Updating ML docs (#129)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs
2020-02-15 19:52:04 +01:00
stevedodson
1cfcd0ab2b
Resolving docs issues (#128)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues
2020-02-15 19:37:41 +01:00
stevedodson
404e658a26
Updating test matrix for 7.6 + removing oss for now. (#127) 2020-02-15 18:48:17 +01:00
stevedodson
b535e69b92
Updating to 7.6.0a1 (#126) 2020-02-15 16:14:48 +01:00
stevedodson
7c1c2945a7
ML add externral models (#125)
* Partially implemented implementation of ml.ExternalModel

* Adding eland.ml.ExternalMLModel

More testing to be added + more support for MLModels
2020-02-15 15:54:29 +01:00
stevedodson
4ac67a73ea
Bumping version (#123) 2020-02-05 09:59:54 +00:00
stevedodson
c5f5d00bb0
Adding support for df['timestamp'].min() etc. (#122)
There is still a difference between pandas/eland in terms
of min/max etc. aggregations as pandas supports this
on strings.
2020-01-30 11:03:37 +00:00
stevedodson
2ca538c49d
Feature/show progress (#120)
* Adding show_progress debug option to eland_to_pandas

* Adding show_progress debug option to eland_to_pandas
2020-01-29 12:59:48 +00:00
stevedodson
409cb043c8
Refactoring of plotting + fixes for multiple charts (#117)
* Refactoring of plotting + fixes for multiple charts

Updates to plotting inline with pandas 0.25.3
Enables plotting of multiple histograms on the
same figure.

* Fix to setup.py to allow submodules

+ reformat of code and better Series.hist docs
2020-01-29 07:07:56 +00:00
stevedodson
46b428d59b
Improved read_csv docs + made 'to_eland' params consistent (#114)
* Improved read_csv docs + made 'to_eland' params consistent

Note, will change API.

* Removing additional args from pytest.

doctests + nbval tests in the CI are not addressed by
this PR.
2020-01-16 10:17:49 +00:00
stevedodson
1914644f93
Improve docs (#113)
* Adding more examples

* Adding more examples to README.md + pypi first page.

* Updated README.md
2020-01-13 15:32:41 +00:00
stevedodson
86c51dc267
Fix licensing headers (#112)
* Minor fixes for readthedocs compatibility.

* Adding doc templates

* Setting first version to 7.5.1
2020-01-13 11:54:43 +00:00
stevedodson
db3bb02335
Rename LICENSE to LICENSE.txt 2020-01-13 11:42:20 +00:00
stevedodson
277a52a242
Update LICENSE 2020-01-13 11:41:43 +00:00
stevedodson
2f87ca5901
Delete LICENSE.txt (#111)
* Delete LICENSE.txt

* Create LICENSE
2020-01-13 11:26:11 +00:00
stevedodson
5995e11bfd
Update README.md 2020-01-13 10:22:42 +00:00
stevedodson
a4736150f6
Update README.md 2020-01-13 09:01:34 +00:00
stevedodson
d7207bab3b
7.5.1a2 (#110)
* Updating README.md

* New version

* Fixing description for pypi
2020-01-10 15:40:15 +00:00
482 changed files with 42701 additions and 13809 deletions

9
.buildkite/Dockerfile Normal file
View File

@ -0,0 +1,9 @@
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}
ENV FORCE_COLOR=1
WORKDIR /code/eland
RUN python -m pip install nox
COPY . .

View File

@ -0,0 +1,11 @@
#!/usr/bin/env bash
set -eo pipefail
export LC_ALL=en_US.UTF-8
echo "--- Building the Wolfi image"
# Building the linux/arm64 image takes about one hour on Buildkite, which is too slow
docker build --file Dockerfile.wolfi .
echo "--- Building the public image"
docker build .

View File

@ -0,0 +1,8 @@
#!/usr/bin/env bash
docker build --file .buildkite/Dockerfile --tag elastic/eland --build-arg PYTHON_VERSION=${PYTHON_VERSION} .
docker run \
--name doc_build \
--rm \
elastic/eland \
bash -c "apt-get update && apt-get install --yes pandoc && nox -s docs"

7
.buildkite/lint-code.sh Executable file
View File

@ -0,0 +1,7 @@
#!/usr/bin/env bash
docker build --file .buildkite/Dockerfile --tag elastic/eland --build-arg PYTHON_VERSION=${PYTHON_VERSION} .
docker run \
--name linter \
--rm \
elastic/eland \
nox -s lint

50
.buildkite/pipeline.yml Normal file
View File

@ -0,0 +1,50 @@
steps:
- label: ":terminal: Lint code"
env:
PYTHON_VERSION: 3
agents:
provider: "gcp"
machineType: "n2-standard-2"
commands:
- ./.buildkite/lint-code.sh
- label: ":books: Build documentation"
env:
PYTHON_VERSION: 3.9-bookworm
agents:
provider: "gcp"
machineType: "n2-standard-2"
commands:
- ./.buildkite/build-documentation.sh
- label: ":docker: Build Wolfi image"
env:
PYTHON_VERSION: 3.11-bookworm
agents:
provider: "gcp"
machineType: "n2-standard-2"
commands:
- ./.buildkite/build-docker-images.sh
- label: ":python: {{ matrix.python }} :elasticsearch: {{ matrix.stack }} :pandas: {{ matrix.pandas }}"
agents:
provider: "gcp"
machineType: "n2-standard-4"
env:
PYTHON_VERSION: "{{ matrix.python }}"
PANDAS_VERSION: "{{ matrix.pandas }}"
TEST_SUITE: "xpack"
ELASTICSEARCH_VERSION: "{{ matrix.stack }}"
matrix:
setup:
# Python and pandas versions need to be added to the nox configuration too
# (in the decorators of the test method in noxfile.py)
pandas:
- '1.5.0'
- '2.2.3'
python:
- '3.12'
- '3.11'
- '3.10'
- '3.9'
stack:
- '9.0.0'
- '9.1.0-SNAPSHOT'
command: ./.buildkite/run-tests

View File

@ -0,0 +1,28 @@
{
"jobs": [
{
"enabled": true,
"pipeline_slug": "eland",
"allow_org_users": true,
"allowed_repo_permissions": ["admin", "write"],
"build_on_commit": true,
"build_on_comment": true,
"trigger_comment_regex": "^(?:(?:buildkite\\W+)?(?:build|test)\\W+(?:this|it))",
"always_trigger_comment_regex": "^(?:(?:buildkite\\W+)?(?:build|test)\\W+(?:this|it))",
"skip_ci_labels": ["skip-ci"],
"skip_ci_on_only_changed": ["\\.md$"]
},
{
"enabled": true,
"pipeline_slug": "docs-build-pr",
"allow_org_users": true,
"allowed_repo_permissions": ["admin", "write"],
"build_on_commit": true,
"build_on_comment": true,
"trigger_comment_regex": "^(?:(?:buildkite\\W+)?(?:build|test)\\W+(?:this|it))",
"always_trigger_comment_regex": "^(?:(?:buildkite\\W+)?(?:build|test)\\W+(?:this|it))",
"skip_ci_labels": ["skip-ci"],
"skip_ci_on_only_changed": ["\\.md$"]
}
]
}

View File

@ -0,0 +1,28 @@
steps:
- input: "Build parameters"
fields:
- text: "Release version"
key: "RELEASE_VERSION"
default: ""
format: "\\d{1,}.\\d{1,}.\\d{1,}"
hint: "The version to release e.g. '8.10.0' (without the v prefix)."
- select: "Environment"
key: "ENVIRONMENT"
options:
- label: "Staging"
value: "staging"
- label: "Production"
value: "production"
- wait
- label: "Release Docker Artifacts for Eland"
command: |
set -eo pipefail
export RELEASE_VERSION=$(buildkite-agent meta-data get RELEASE_VERSION)
export ENVIRONMENT=$(buildkite-agent meta-data get ENVIRONMENT)
export BUILDKIT_PROGRESS=plain
bash .buildkite/release-docker/run.sh
# Run on GCP to use `docker`
agents:
provider: gcp

View File

@ -0,0 +1,37 @@
#!/usr/bin/env bash
set -eo pipefail
export LC_ALL=en_US.UTF-8
echo "Publishing Eland $RELEASE_VERSION Docker image to $ENVIRONMENT"
set +x
# login to docker registry
docker_registry=$(vault read -field registry "secret/ci/elastic-eland/container-library/eland-$ENVIRONMENT")
docker_username=$(vault read -field username "secret/ci/elastic-eland/container-library/eland-$ENVIRONMENT")
docker_password=$(vault read -field password "secret/ci/elastic-eland/container-library/eland-$ENVIRONMENT")
echo "$docker_password" | docker login "$docker_registry" --username "$docker_username" --password-stdin
unset docker_username docker_password
set -x
tmp_dir=$(mktemp --directory)
pushd "$tmp_dir"
git clone https://github.com/elastic/eland
pushd eland
git checkout "v${RELEASE_VERSION}"
git --no-pager show
# Create builder that supports QEMU emulation (needed for linux/arm64)
docker buildx rm --force eland-multiarch-builder || true
docker buildx create --name eland-multiarch-builder --bootstrap --use
docker buildx build --push \
--file Dockerfile.wolfi \
--tag "$docker_registry/eland/eland:$RELEASE_VERSION" \
--tag "$docker_registry/eland/eland:latest" \
--platform linux/amd64,linux/arm64 \
"$PWD"
popd
popd
rm -rf "$tmp_dir"

View File

@ -16,7 +16,12 @@ fi
set -euxo pipefail set -euxo pipefail
SCRIPT_PATH=$(dirname $(realpath -s $0)) # realpath on MacOS use different flags than on Linux
if [[ "$OSTYPE" == "darwin"* ]]; then
SCRIPT_PATH=$(dirname $(realpath $0))
else
SCRIPT_PATH=$(dirname $(realpath -s $0))
fi
moniker=$(echo "$ELASTICSEARCH_VERSION" | tr -C "[:alnum:]" '-') moniker=$(echo "$ELASTICSEARCH_VERSION" | tr -C "[:alnum:]" '-')
suffix=rest-test suffix=rest-test
@ -27,10 +32,6 @@ CLUSTER_NAME=${CLUSTER_NAME-${moniker}${suffix}}
HTTP_PORT=${HTTP_PORT-9200} HTTP_PORT=${HTTP_PORT-9200}
ELASTIC_PASSWORD=${ELASTIC_PASSWORD-changeme} ELASTIC_PASSWORD=${ELASTIC_PASSWORD-changeme}
SSL_CERT=${SSL_CERT-"${SCRIPT_PATH}/certs/testnode.crt"}
SSL_KEY=${SSL_KEY-"${SCRIPT_PATH}/certs/testnode.key"}
SSL_CA=${SSL_CA-"${SCRIPT_PATH}/certs/ca.crt"}
SSL_CA_PEM=${SSL_CA-"${SCRIPT_PATH}/certs/ca.pem"}
DETACH=${DETACH-false} DETACH=${DETACH-false}
CLEANUP=${CLEANUP-false} CLEANUP=${CLEANUP-false}
@ -41,6 +42,11 @@ NETWORK_NAME=${NETWORK_NAME-"$network_default"}
set +x set +x
# Set vm.max_map_count kernel setting to 262144 if we're in CI
if [[ "$BUILDKITE" == "true" ]]; then
sudo sysctl -w vm.max_map_count=262144
fi
function cleanup_volume { function cleanup_volume {
if [[ "$(docker volume ls -q -f name=$1)" ]]; then if [[ "$(docker volume ls -q -f name=$1)" ]]; then
echo -e "\033[34;1mINFO:\033[0m Removing volume $1\033[0m" echo -e "\033[34;1mINFO:\033[0m Removing volume $1\033[0m"
@ -48,7 +54,7 @@ function cleanup_volume {
fi fi
} }
function container_running { function container_running {
if [[ "$(docker ps -q -f name=$1)" ]]; then if [[ "$(docker ps -q -f name=$1)" ]]; then
return 0; return 0;
else return 1; else return 1;
fi fi
@ -110,6 +116,12 @@ environment=($(cat <<-END
--env node.attr.testattr=test --env node.attr.testattr=test
--env path.repo=/tmp --env path.repo=/tmp
--env repositories.url.allowed_urls=http://snapshot.test* --env repositories.url.allowed_urls=http://snapshot.test*
--env ELASTIC_PASSWORD=$ELASTIC_PASSWORD
--env xpack.license.self_generated.type=trial
--env xpack.security.enabled=false
--env xpack.security.http.ssl.enabled=false
--env xpack.security.transport.ssl.enabled=false
--env xpack.ml.max_machine_memory_percent=90
END END
)) ))
@ -118,54 +130,31 @@ volumes=($(cat <<-END
END END
)) ))
if [[ "$ELASTICSEARCH_VERSION" != *oss* ]]; then url="http://elastic:$ELASTIC_PASSWORD@$NODE_NAME"
environment+=($(cat <<-END
--env ELASTIC_PASSWORD=$ELASTIC_PASSWORD
--env xpack.license.self_generated.type=trial
--env xpack.security.enabled=true
--env xpack.security.http.ssl.enabled=true
--env xpack.security.http.ssl.verification_mode=certificate
--env xpack.security.http.ssl.key=certs/testnode.key
--env xpack.security.http.ssl.certificate=certs/testnode.crt
--env xpack.security.http.ssl.certificate_authorities=certs/ca.crt
--env xpack.security.transport.ssl.enabled=true
--env xpack.security.transport.ssl.key=certs/testnode.key
--env xpack.security.transport.ssl.certificate=certs/testnode.crt
--env xpack.security.transport.ssl.certificate_authorities=certs/ca.crt
END
))
volumes+=($(cat <<-END
--volume $SSL_CERT:/usr/share/elasticsearch/config/certs/testnode.crt
--volume $SSL_KEY:/usr/share/elasticsearch/config/certs/testnode.key
--volume $SSL_CA:/usr/share/elasticsearch/config/certs/ca.crt
--volume $SSL_CA_PEM:/usr/share/elasticsearch/config/certs/ca.pem
END
))
fi
url="http://$NODE_NAME" # Pull the container, retry on failures up to 5 times with
if [[ "$ELASTICSEARCH_VERSION" != *oss* ]]; then # short delays between each attempt. Fixes most transient network errors.
url="https://elastic:$ELASTIC_PASSWORD@$NODE_NAME" docker_pull_attempts=0
fi until [ "$docker_pull_attempts" -ge 5 ]
do
cert_validation_flags="--insecure" docker pull docker.elastic.co/elasticsearch/$ELASTICSEARCH_VERSION && break
if [[ "$NODE_NAME" == "instance" ]]; then docker_pull_attempts=$((docker_pull_attempts+1))
cert_validation_flags="--cacert /usr/share/elasticsearch/config/certs/ca.pem --resolve ${NODE_NAME}:443:127.0.0.1" sleep 10
fi done
echo -e "\033[34;1mINFO:\033[0m Starting container $NODE_NAME \033[0m" echo -e "\033[34;1mINFO:\033[0m Starting container $NODE_NAME \033[0m"
set -x set -x
docker run \ docker run \
--name "$NODE_NAME" \ --name "$NODE_NAME" \
--network "$NETWORK_NAME" \ --network "$NETWORK_NAME" \
--env ES_JAVA_OPTS=-"Xms1g -Xmx1g" \ --env ES_JAVA_OPTS=-"Xms2g -Xmx2g" \
"${environment[@]}" \ "${environment[@]}" \
"${volumes[@]}" \ "${volumes[@]}" \
--publish "$HTTP_PORT":9200 \ --publish "$HTTP_PORT":9200 \
--ulimit nofile=65536:65536 \ --ulimit nofile=65536:65536 \
--ulimit memlock=-1:-1 \ --ulimit memlock=-1:-1 \
--detach="$DETACH" \ --detach="$DETACH" \
--health-cmd="curl $cert_validation_flags --fail $url:9200/_cluster/health || exit 1" \ --health-cmd="curl --insecure --fail $url:9200/_cluster/health || exit 1" \
--health-interval=2s \ --health-interval=2s \
--health-retries=20 \ --health-retries=20 \
--health-timeout=2s \ --health-timeout=2s \

View File

@ -12,7 +12,7 @@
# When run in CI the test-matrix is used to define additional variables # When run in CI the test-matrix is used to define additional variables
# TEST_SUITE -- either `oss` or `xpack`, defaults to `oss` in `run-tests` # TEST_SUITE -- `xpack`
# #
PYTHON_VERSION=${PYTHON_VERSION-3.8} PYTHON_VERSION=${PYTHON_VERSION-3.8}
@ -21,10 +21,11 @@ echo -e "\033[34;1mINFO:\033[0m VERSION ${ELASTICSEARCH_VERSION}\033[0m"
echo -e "\033[34;1mINFO:\033[0m CONTAINER ${ELASTICSEARCH_CONTAINER}\033[0m" echo -e "\033[34;1mINFO:\033[0m CONTAINER ${ELASTICSEARCH_CONTAINER}\033[0m"
echo -e "\033[34;1mINFO:\033[0m TEST_SUITE ${TEST_SUITE}\033[0m" echo -e "\033[34;1mINFO:\033[0m TEST_SUITE ${TEST_SUITE}\033[0m"
echo -e "\033[34;1mINFO:\033[0m PYTHON_VERSION ${PYTHON_VERSION}\033[0m" echo -e "\033[34;1mINFO:\033[0m PYTHON_VERSION ${PYTHON_VERSION}\033[0m"
echo -e "\033[34;1mINFO:\033[0m PANDAS_VERSION ${PANDAS_VERSION}\033[0m"
echo -e "\033[1m>>>>> Build [elastic/eland container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m" echo -e "\033[1m>>>>> Build [elastic/eland container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m"
docker build --file .ci/Dockerfile --tag elastic/eland --build-arg PYTHON_VERSION=${PYTHON_VERSION} . docker build --file .buildkite/Dockerfile --tag elastic/eland --build-arg PYTHON_VERSION=${PYTHON_VERSION} .
echo -e "\033[1m>>>>> Run [elastic/eland container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m" echo -e "\033[1m>>>>> Run [elastic/eland container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m"
@ -35,4 +36,4 @@ docker run \
--name eland-test-runner \ --name eland-test-runner \
--rm \ --rm \
elastic/eland \ elastic/eland \
./run_build.sh nox -s "test-${PYTHON_VERSION}(pandas_version='${PANDAS_VERSION}')"

View File

@ -9,13 +9,12 @@ if [[ -z $ELASTICSEARCH_VERSION ]]; then
fi fi
set -euxo pipefail set -euxo pipefail
TEST_SUITE=${TEST_SUITE-xpack}
TEST_SUITE=${TEST_SUITE-oss} NODE_NAME=localhost
NODE_NAME=instance PANDAS_VERSION=${PANDAS_VERSION-1.5.0}
elasticsearch_image=elasticsearch elasticsearch_image=elasticsearch
elasticsearch_url=https://elastic:changeme@${NODE_NAME}:9200 elasticsearch_url=http://elastic:changeme@${NODE_NAME}:9200
if [[ $TEST_SUITE != "xpack" ]]; then if [[ $TEST_SUITE != "xpack" ]]; then
elasticsearch_image=elasticsearch-${TEST_SUITE} elasticsearch_image=elasticsearch-${TEST_SUITE}
elasticsearch_url=http://${NODE_NAME}:9200 elasticsearch_url=http://${NODE_NAME}:9200
@ -28,7 +27,7 @@ function cleanup {
NODE_NAME=${NODE_NAME} \ NODE_NAME=${NODE_NAME} \
NETWORK_NAME=elasticsearch \ NETWORK_NAME=elasticsearch \
CLEANUP=true \ CLEANUP=true \
bash ./.ci/run-elasticsearch.sh bash ./.buildkite/run-elasticsearch.sh
# Report status and exit # Report status and exit
if [[ "$status" == "0" ]]; then if [[ "$status" == "0" ]]; then
echo -e "\n\033[32;1mSUCCESS run-tests\033[0m" echo -e "\n\033[32;1mSUCCESS run-tests\033[0m"
@ -40,19 +39,20 @@ function cleanup {
} }
trap cleanup EXIT trap cleanup EXIT
echo -e "\033[1m>>>>> Start [$ELASTICSEARCH_VERSION container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m" echo "--- :elasticsearch: Starting Elasticsearch"
ELASTICSEARCH_VERSION=${elasticsearch_image}:${ELASTICSEARCH_VERSION} \ ELASTICSEARCH_VERSION=${elasticsearch_image}:${ELASTICSEARCH_VERSION} \
NODE_NAME=${NODE_NAME} \ NODE_NAME=${NODE_NAME} \
NETWORK_NAME=elasticsearch \ NETWORK_NAME=host \
DETACH=true \ DETACH=true \
bash .ci/run-elasticsearch.sh bash .buildkite/run-elasticsearch.sh
echo -e "\033[1m>>>>> Repository specific tests >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m" echo "+++ :python: Run tests"
ELASTICSEARCH_CONTAINER=${elasticsearch_image}:${ELASTICSEARCH_VERSION} \ ELASTICSEARCH_CONTAINER=${elasticsearch_image}:${ELASTICSEARCH_VERSION} \
NETWORK_NAME=elasticsearch \ NETWORK_NAME=host \
NODE_NAME=${NODE_NAME} \ NODE_NAME=${NODE_NAME} \
ELASTICSEARCH_URL=${elasticsearch_url} \ ELASTICSEARCH_URL=${elasticsearch_url} \
bash .ci/run-repository.sh TEST_SUITE=${TEST_SUITE} \
PANDAS_VERSION=${PANDAS_VERSION} \
bash .buildkite/run-repository.sh

View File

@ -1,10 +0,0 @@
ARG PYTHON_VERSION=3.7
FROM python:${PYTHON_VERSION}
WORKDIR /code/eland
COPY requirements-dev.txt .
RUN pip install -r requirements-dev.txt
COPY . .

View File

@ -1,20 +0,0 @@
-----BEGIN CERTIFICATE-----
MIIDSTCCAjGgAwIBAgIUIwN+0zglsexRKwE1RGHvlCcmrdwwDQYJKoZIhvcNAQEL
BQAwNDEyMDAGA1UEAxMpRWxhc3RpYyBDZXJ0aWZpY2F0ZSBUb29sIEF1dG9nZW5l
cmF0ZWQgQ0EwHhcNMTkwMjEzMDcyMjQwWhcNMjIwMjEyMDcyMjQwWjA0MTIwMAYD
VQQDEylFbGFzdGljIENlcnRpZmljYXRlIFRvb2wgQXV0b2dlbmVyYXRlZCBDQTCC
ASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANILs0JO0e7x29zeVx21qalK
XKdX+AMlGJPH75wWO/Jq6YHtxt1wYIg762krOBXfG6JsFSOIwIv5VrzGGRGjSPt9
OXQyXrDDiQvsBT3rpzLNdDs7KMl2tZswwv7w9ujgud0cYnS1MOpn81rfPc73DvMg
xuhplofDx6fn3++PjVRU2FNiIVWyEoaxRjCeGPMBubKZYaYbQA6vYM4Z+ByG727B
AyAER3t7xmvYti/EoO2hv2HQk5zgcj/Oq3AJKhnt8LH8fnfm3TnYNM1htvXqhN05
vsvhvm2PHfnA5qLlSr/3W0aI/U/PqfsFDCgyRV097sMIaKkmavb0Ue7aQ7lgtp0C
AwEAAaNTMFEwHQYDVR0OBBYEFDRKlCMowWR1rwxE0d1lTEQe5O71MB8GA1UdIwQY
MBaAFDRKlCMowWR1rwxE0d1lTEQe5O71MA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZI
hvcNAQELBQADggEBAKbCJ95EBpeuvF70KEt6QU70k/SH1NRvM9YzKryV0D975Jvu
HOSm9HgSTULeAUFZIa4oYyf3QUfVoI+2T/aQrfXA3gfrJWsHURkyNmiHOFAbYHqi
xA6i249G2GTEjc1+le/M2N2CcDKAmurW6vSGK4upXQbPd6KmnhHREX74zkWjnOa+
+tibbSSOCT4Tmja2DbBxAPuivU9IB1g/hIUmbYQqKffQrBJA0658tz6w63a/Q7xN
pCvvbSgiMZ6qcVIcJkBT2IooYie+ax45pQECHthgIUcQAzfmIfqlU0Qfl8rDgAmn
0c1o6HQjKGU2aVGgSRuaaiHaSZjbPIZVS51sOoI=
-----END CERTIFICATE-----

View File

@ -1,20 +0,0 @@
-----BEGIN CERTIFICATE-----
MIIDSTCCAjGgAwIBAgIUIwN+0zglsexRKwE1RGHvlCcmrdwwDQYJKoZIhvcNAQEL
BQAwNDEyMDAGA1UEAxMpRWxhc3RpYyBDZXJ0aWZpY2F0ZSBUb29sIEF1dG9nZW5l
cmF0ZWQgQ0EwHhcNMTkwMjEzMDcyMjQwWhcNMjIwMjEyMDcyMjQwWjA0MTIwMAYD
VQQDEylFbGFzdGljIENlcnRpZmljYXRlIFRvb2wgQXV0b2dlbmVyYXRlZCBDQTCC
ASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANILs0JO0e7x29zeVx21qalK
XKdX+AMlGJPH75wWO/Jq6YHtxt1wYIg762krOBXfG6JsFSOIwIv5VrzGGRGjSPt9
OXQyXrDDiQvsBT3rpzLNdDs7KMl2tZswwv7w9ujgud0cYnS1MOpn81rfPc73DvMg
xuhplofDx6fn3++PjVRU2FNiIVWyEoaxRjCeGPMBubKZYaYbQA6vYM4Z+ByG727B
AyAER3t7xmvYti/EoO2hv2HQk5zgcj/Oq3AJKhnt8LH8fnfm3TnYNM1htvXqhN05
vsvhvm2PHfnA5qLlSr/3W0aI/U/PqfsFDCgyRV097sMIaKkmavb0Ue7aQ7lgtp0C
AwEAAaNTMFEwHQYDVR0OBBYEFDRKlCMowWR1rwxE0d1lTEQe5O71MB8GA1UdIwQY
MBaAFDRKlCMowWR1rwxE0d1lTEQe5O71MA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZI
hvcNAQELBQADggEBAKbCJ95EBpeuvF70KEt6QU70k/SH1NRvM9YzKryV0D975Jvu
HOSm9HgSTULeAUFZIa4oYyf3QUfVoI+2T/aQrfXA3gfrJWsHURkyNmiHOFAbYHqi
xA6i249G2GTEjc1+le/M2N2CcDKAmurW6vSGK4upXQbPd6KmnhHREX74zkWjnOa+
+tibbSSOCT4Tmja2DbBxAPuivU9IB1g/hIUmbYQqKffQrBJA0658tz6w63a/Q7xN
pCvvbSgiMZ6qcVIcJkBT2IooYie+ax45pQECHthgIUcQAzfmIfqlU0Qfl8rDgAmn
0c1o6HQjKGU2aVGgSRuaaiHaSZjbPIZVS51sOoI=
-----END CERTIFICATE-----

View File

@ -1,19 +0,0 @@
-----BEGIN CERTIFICATE-----
MIIDIjCCAgqgAwIBAgIUI4QU6jA1dYSCbdIA6oAb2TBEluowDQYJKoZIhvcNAQEL
BQAwNDEyMDAGA1UEAxMpRWxhc3RpYyBDZXJ0aWZpY2F0ZSBUb29sIEF1dG9nZW5l
cmF0ZWQgQ0EwHhcNMTkwMjEzMDcyMzEzWhcNMjIwMjEyMDcyMzEzWjATMREwDwYD
VQQDEwhpbnN0YW5jZTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAJeT
yOy6EAScZxrULKjHePciiz38grivCrhFFV+dThaRCcl3DhDzb9Eny5q5iEw3WvLQ
Rqmf01jncNIhaocTt66VqveXaMubbE8O0LcG6e4kpFO+JtnVF8JTARTc+ux/1uD6
hO1VG/HItM7WQrQxh4hfB2u1AX2YQtoqEtXXEC+UHWfl4QzuzXjBnKCkO/L9/6Tf
yNFQWXxKnIiTs8Xm9sEhhSCBJPlLTQu+MX4vR2Uwj5XZmflDUr+ZTenl9qYxL6b3
SWhh/qEl4GAj1+tS7ZZOxE0237mUh3IIFYSWSaMm8K2m/BYHkLNWL5B1dMic0lsv
osSoYrQuCef4HQMCitsCAwEAAaNNMEswHQYDVR0OBBYEFFMg4l1GLW8lYbwASY+r
YeWYRzIiMB8GA1UdIwQYMBaAFDRKlCMowWR1rwxE0d1lTEQe5O71MAkGA1UdEwQC
MAAwDQYJKoZIhvcNAQELBQADggEBAEQrgh1xALpumQTzsjxFRGque/vlKTgRs5Kh
xtgapr6wjIbdq7dagee+4yNOKzS5lGVXCgwrJlHESv9qY0uumT/33vK2uduJ7NAd
fR2ZzyBnhMX+mkYhmGrGYCTUMUIwOIQYa4Evis4W+LHmCIDG03l7gLHfdIBe9VMO
pDZum8f6ng0MM49s8/rXODNYKw8kFyUhnfChqMi/2yggb1uUIfKlJJIchkgYjE13
zuC+fjo029Pq1jeMIdxugLf/7I/8NiW1Yj9aCXevUXG1qzHFEuKAinBXYOZO/vWS
LaEqOhwrzNynwgGpYAr7Rfgv4AflltYIIav4PZT03P7fbyAAf8s=
-----END CERTIFICATE-----

View File

@ -1,27 +0,0 @@
-----BEGIN RSA PRIVATE KEY-----
MIIEpQIBAAKCAQEAl5PI7LoQBJxnGtQsqMd49yKLPfyCuK8KuEUVX51OFpEJyXcO
EPNv0SfLmrmITDda8tBGqZ/TWOdw0iFqhxO3rpWq95doy5tsTw7Qtwbp7iSkU74m
2dUXwlMBFNz67H/W4PqE7VUb8ci0ztZCtDGHiF8Ha7UBfZhC2ioS1dcQL5QdZ+Xh
DO7NeMGcoKQ78v3/pN/I0VBZfEqciJOzxeb2wSGFIIEk+UtNC74xfi9HZTCPldmZ
+UNSv5lN6eX2pjEvpvdJaGH+oSXgYCPX61Ltlk7ETTbfuZSHcggVhJZJoybwrab8
FgeQs1YvkHV0yJzSWy+ixKhitC4J5/gdAwKK2wIDAQABAoIBAQCRFTJna/xy/WUu
59FLR4qAOj8++JgCwACpue4oU7/vl6nffSYokWoAr2+RzG4qTX2vFi3cpA8+dGCn
sLZvTi8tWzKGxBTZdg2oakzaMzLr74SeZ052iCGyrZJGbvF6Ny7srr1XEXSq6+os
ZCb6pMHOhO7saBdiKMAsY8MdjTl/33AduuE6ztqv+L92xTr2g4QlbT1KvWlEgppU
k4Gy7zdETkPBTSH/17ZwyGJoJICIAhbL4IpmOM4dPIg8nFkVPPpy6p0z4uGjtgnK
nreZ2EKMzCafBaHn7A77gpi0OrQdl6pe0fsGqv/323YjCJPbwwl5TsoNq44DzwiX
3M7XiVJxAoGBAOCne56vdN4uZmCgLVGT2JSUNVPOu4bfjrxWH6cslzrPT2Zhp3lO
M4axZ3gmcervV252YEZXntXDHHCSfrECllRN1WFD63XmyQ/CkhuvZkkeRHfzL1TE
EdqHOTqs4sRETZ7+RITFC81DZQkWWOKeyXMjyPBqd7RnThQHijB1c8Y5AoGBAKy6
CVKBx+zz5crVD0tz4UhOmz1wRNN0CL0l+FXRuFSgbzMIvwpfiqe25crgeLHe2M2/
TogdWbjZ2nUZQTzoRsSkQ6cKHpj+G/gWurp/UcHHXFVwgLSPF7c3KHDtiYq7Vqw0
bvmhM03LI6+ZIPRV7hLBr7WP7UmpAiREMF7tTnmzAoGBAIkx3w3WywFQxtblmyeB
qbd7F2IaE23XoxyjX+tBEQ4qQqwcoSE0v8TXHIBEwjceeX+NLVhn9ClJYVniLRq+
oL3VVqVyzB4RleJZCc98e3PV1yyFx/b1Uo3pHOsXX9lKeTjKwV9v0rhFGzPEgP3M
yOvXA8TG0FnM6OLUg/D6GX0JAoGAMuHS4TVOGeV3ahr9mHKYiN5vKNgrzka+VEod
L9rJ/FQOrfADpyCiDen5I5ygsXU+VM3oanyK88NpcVlxOGoMft0M+OYoQVWKE7lO
ZKYhBX6fGqQ7pfUJPXXIOgwfmni5fZ0sm+j63g3bg10OsiumKGxaQJgXhL1+3gQg
Y7ZwibUCgYEAlZoFFvkMLjpOSaHk1z5ZZnt19X0QUIultBwkumSqMPm+Ks7+uDrx
thGUCoz4ecr/ci4bIUY7mB+zfAbqnBOMxreJqCRbAIuRypo1IlWkTp8DywoDOfMW
NfzjVmzJ7EJu44nGmVAi1jw4Pbseivvi1ujMCoPgaE8I1uSh144bwN8=
-----END RSA PRIVATE KEY-----

View File

@ -1,78 +0,0 @@
---
##### GLOBAL METADATA
- meta:
cluster: clients-ci
##### JOB DEFAULTS
- job:
project-type: matrix
logrotate:
daysToKeep: 30
numToKeep: 100
parameters:
- string:
name: branch_specifier
default: refs/heads/master
description: the Git branch specifier to build (&lt;branchName&gt;, &lt;tagName&gt;,
&lt;commitId&gt;, etc.)
properties:
- github:
url: https://github.com/elastic/eland
- inject:
properties-content: HOME=$JENKINS_HOME
concurrent: true
node: flyweight
scm:
- git:
name: origin
credentials-id: f6c7695a-671e-4f4f-a331-acdce44ff9ba
reference-repo: /var/lib/jenkins/.git-references/eland.git
branches:
- ${branch_specifier}
url: git@github.com:elastic/eland.git
basedir: ''
wipe-workspace: 'True'
triggers:
- github
axes:
- axis:
type: slave
name: label
values:
- linux
- axis:
type: yaml
filename: .ci/test-matrix.yml
name: ELASTICSEARCH_VERSION
- axis:
type: yaml
filename: .ci/test-matrix.yml
name: TEST_SUITE
- axis:
type: yaml
filename: .ci/test-matrix.yml
name: PYTHON_VERSION
yaml-strategy:
exclude-key: exclude
filename: .ci/test-matrix.yml
wrappers:
- ansicolor
- timeout:
type: absolute
timeout: 120
fail: true
- timestamps
- workspace-cleanup
builders:
- shell: |-
#!/usr/local/bin/runbld
.ci/run-tests
publishers:
- email:
recipients: infra-root+build@elastic.co
- junit:
results: "build/output/*-junit.xml"
allow-empty-results: true

View File

@ -1,14 +0,0 @@
---
- job:
name: elastic+eland+master
display-name: 'elastic / eland # master'
description: Eland is a data science client with a Pandas-like interface
junit_results: "*-junit.xml"
parameters:
- string:
name: branch_specifier
default: refs/heads/master
description: The Git branch specifier to build
triggers:
- github
- timed: '@daily'

View File

@ -1,19 +0,0 @@
---
- job:
name: elastic+eland+pull-request
display-name: 'elastic / eland # pull-request'
description: Testing of eland pull requests.
scm:
- git:
branches:
- ${ghprbActualCommit}
refspec: +refs/pull/*:refs/remotes/origin/pr/*
triggers:
- github-pull-request:
org-list:
- elastic
allow-whitelist-orgs-as-admins: true
github-hooks: true
status-context: clients-ci
cancel-builds-on-update: true
publishers: []

View File

@ -1,18 +0,0 @@
---
ELASTICSEARCH_VERSION:
- 8.0.0-SNAPSHOT
- 7.5-SNAPSHOT
TEST_SUITE:
- oss
- xpack
PYTHON_VERSION:
- 3.8
- 3.7
- 3.6
- 3.5.3
exclude: ~

View File

@ -1,3 +1,62 @@
docs/* # docs and example
example/* example/*
# Git
.git .git
# Nox
.nox
# Compiled python modules.
*.pyc
__pycache__/
# Setuptools distribution folder.
dist/
# Build folder
build/
# pytest results
tests/dataframe/results/*csv
result_images/
# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
eland.egg-info/
# PyCharm files
.idea/
# vscode files
.vscode/
# pytest files
.pytest_cache/
# Ignore MacOSX files
.DS_Store
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# Environments
.env
.venv
.nox
env/
venv/
ENV/
env.bak/
venv.bak/
.mypy_cache
# Coverage
.coverage

26
.github/workflows/backport.yml vendored Normal file
View File

@ -0,0 +1,26 @@
name: Backport
on:
pull_request_target:
types:
- closed
- labeled
jobs:
backport:
name: Backport
runs-on: ubuntu-latest
# Only react to merged PRs for security reasons.
# See https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request_target.
if: >
github.event.pull_request.merged
&& (
github.event.action == 'closed'
|| (
github.event.action == 'labeled'
&& contains(github.event.label.name, 'backport')
)
)
steps:
- uses: tibdex/backport@9565281eda0731b1d20c4025c43339fb0a23812e # v2.0.4
with:
github_token: ${{ secrets.GITHUB_TOKEN }}

19
.github/workflows/docs-build.yml vendored Normal file
View File

@ -0,0 +1,19 @@
name: docs-build
on:
push:
branches:
- main
pull_request_target: ~
merge_group: ~
jobs:
docs-preview:
uses: elastic/docs-builder/.github/workflows/preview-build.yml@main
with:
path-pattern: docs/**
permissions:
deployments: write
id-token: write
contents: read
pull-requests: write

14
.github/workflows/docs-cleanup.yml vendored Normal file
View File

@ -0,0 +1,14 @@
name: docs-cleanup
on:
pull_request_target:
types:
- closed
jobs:
docs-preview:
uses: elastic/docs-builder/.github/workflows/preview-cleanup.yml@main
permissions:
contents: none
id-token: write
deployments: write

11
.gitignore vendored
View File

@ -1,5 +1,6 @@
# Compiled python modules. # Compiled python modules.
*.pyc *.pyc
__pycache__/
# Setuptools distribution folder. # Setuptools distribution folder.
dist/ dist/
@ -11,18 +12,19 @@ build/
docs/build/ docs/build/
# pytest results # pytest results
eland/tests/dataframe/results/ tests/dataframe/results/*csv
result_images/ result_images/
# Python egg metadata, regenerated from source files by setuptools. # Python egg metadata, regenerated from source files by setuptools.
/*.egg-info /*.egg-info
eland.egg-info/
# PyCharm files # PyCharm files
.idea/ .idea/
# vscode files # vscode files
.vscode/* .vscode/
# pytest files # pytest files
.pytest_cache/ .pytest_cache/
@ -43,8 +45,13 @@ ipython_config.py
# Environments # Environments
.env .env
.venv .venv
.nox
env/ env/
venv/ venv/
ENV/ ENV/
env.bak/ env.bak/
venv.bak/ venv.bak/
.mypy_cache
# Coverage
.coverage

14
.readthedocs.yml Normal file
View File

@ -0,0 +1,14 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.11"
python:
install:
- path: .
- requirements: docs/requirements-docs.txt
sphinx:
configuration: docs/sphinx/conf.py

784
CHANGELOG.rst Normal file
View File

@ -0,0 +1,784 @@
=========
Changelog
=========
9.0.1 (2025-04-30)
------------------
* Forbid Elasticsearch 8 client or server (`#780 <https://github.com/elastic/eland/pull/780>`_)
* Fix DeBERTa tokenization (`#769 <https://github.com/elastic/eland/pull/769>`_)
* Upgrade PyTorch to 2.5.1 (`#785 <https://github.com/elastic/eland/pull/785>`_)
* Upgrade LightGBM to 4.6.0 (`#782 <https://github.com/elastic/eland/pull/782>`_)
9.0.0 (2025-04-15)
------------------
* Drop Python 3.8, Support Python 3.12 (`#743 <https://github.com/elastic/eland/pull/743>`_)
* Support Pandas 2 (`#742 <https://github.com/elastic/eland/pull/742>`_)
* Upgrade transformers to 4.47 (`#752 <https://github.com/elastic/eland/pull/752>`_)
* Remove ML model export as sklearn Pipeline (`#744 <https://github.com/elastic/eland/pull/744>`_)
* Allow scikit-learn 1.5 (`#729 <https://github.com/elastic/eland/pull/729>`_)
* Migrate docs from AsciiDoc to Markdown (`#762 <https://github.com/elastic/eland/pull/762>`_)
8.17.0 (2025-01-07)
-------------------
* Support sparse embedding models such as SPLADE-v3-DistilBERT (`#740 <https://github.com/elastic/eland/pull/740>`_)
8.16.0 (2024-11-13)
-------------------
* Add deprecation warning for ESGradientBoostingModel subclasses (`#738 <https://github.com/elastic/eland/pull/738>`_)
8.15.4 (2024-10-17)
-------------------
* Revert "Allow reading Elasticsearch certs in Wolfi image" (`#734 <https://github.com/elastic/eland/pull/734>`_)
8.15.3 (2024-10-09)
-------------------
* Added support for DeBERTa-V2 tokenizer (`#717 <https://github.com/elastic/eland/pull/717>`_)
* Fixed ``--ca-cert`` with a shared Elasticsearch Docker volume (`#732 <https://github.com/elastic/eland/pull/732>`_)
8.15.2 (2024-10-02)
-------------------
* Fixed Docker image build (`#728 <https://github.com/elastic/eland/pull/728>`_)
8.15.1 (2024-10-01)
-------------------
* Upgraded PyTorch to version 2.3.1, which is compatible with Elasticsearch 8.15.2 or above (`#718 <https://github.com/elastic/eland/pull/718>`_)
* Migrated to distroless Wolfi base Docker image (`#720 <https://github.com/elastic/eland/pull/720>`_)
8.15.0 (2024-08-12)
-------------------
* Added a default truncation of ``second`` for text similarity (`#713 <https://github.com/elastic/eland/pull/713>`_)
* Added note about using text_similarity for rerank in the CLI (`#716 <https://github.com/elastic/eland/pull/716>`_)
* Added support for lists in result hits (`#707 <https://github.com/elastic/eland/pull/707>`_)
* Removed input fields from exported LTR models (`#708 <https://github.com/elastic/eland/pull/708>`_)
8.14.0 (2024-06-10)
-------------------
Added
^^^^^
* Added Elasticsearch Serverless support in DataFrames (`#690`_, contributed by `@AshokChoudhary11`_) and eland_import_hub_model (`#698`_)
Fixed
^^^^^
* Fixed Python 3.8 support (`#695`_, contributed by `@bartbroere`_)
* Fixed non _source fields missing from the results hits (`#693`_, contributed by `@bartbroere`_)
.. _@AshokChoudhary11: https://github.com/AshokChoudhary11
.. _#690: https://github.com/elastic/eland/pull/690
.. _#693: https://github.com/elastic/eland/pull/693
.. _#695: https://github.com/elastic/eland/pull/695
.. _#698: https://github.com/elastic/eland/pull/698
8.13.1 (2024-05-03)
-------------------
Added
^^^^^
* Added support for HTTP proxies in eland_import_hub_model (`#688`_)
.. _#688: https://github.com/elastic/eland/pull/688
8.13.0 (2024-03-27)
-------------------
Added
^^^^^
* Added support for Python 3.11 (`#681`_)
* Added ``eland.DataFrame.to_json`` function (`#661`_, contributed by `@bartbroere`_)
* Added override option to specify the model's max input size (`#674`_)
Changed
^^^^^^^
* Upgraded torch to 2.1.2 (`#671`_)
* Mirrored pandas' ``lineterminator`` instead of ``line_terminator`` in ``to_csv`` (`#595`_, contributed by `@bartbroere`_)
.. _#595: https://github.com/elastic/eland/pull/595
.. _#661: https://github.com/elastic/eland/pull/661
.. _#671: https://github.com/elastic/eland/pull/671
.. _#674: https://github.com/elastic/eland/pull/674
.. _#681: https://github.com/elastic/eland/pull/681
8.12.1 (2024-01-30)
-------------------
Fixed
^^^^^
* Fix missing value support for XGBRanker (`#654`_)
.. _#654: https://github.com/elastic/eland/pull/654
8.12.0 (2024-01-18)
-------------------
Added
^^^^^
* Supported XGBRanker model (`#649`_)
* Accepted LTR (Learning to rank) model config when importing model (`#645`_, `#651`_)
* Added LTR feature logger (`#648`_)
* Added ``prefix_string`` config option to the import model hub script (`#642`_)
* Made online retail analysis notebook runnable in Colab (`#641`_)
* Added new movie dataset to the tests (`#646`_)
.. _#641: https://github.com/elastic/eland/pull/641
.. _#642: https://github.com/elastic/eland/pull/642
.. _#645: https://github.com/elastic/eland/pull/645
.. _#646: https://github.com/elastic/eland/pull/646
.. _#648: https://github.com/elastic/eland/pull/648
.. _#649: https://github.com/elastic/eland/pull/649
.. _#651: https://github.com/elastic/eland/pull/651
8.11.1 (2023-11-22)
-------------------
Added
^^^^^
* Make demo notebook runnable in Colab (`#630`_)
Changed
^^^^^^^
* Bump Shap version to 0.43 (`#636`_)
Fixed
^^^^^
* Fix failed import of Sentence Transformer RoBERTa models (`#637`_)
.. _#630: https://github.com/elastic/eland/pull/630
.. _#636: https://github.com/elastic/eland/pull/636
.. _#637: https://github.com/elastic/eland/pull/637
8.11.0 (2023-11-08)
-------------------
Added
^^^^^
* Support E5 small multilingual model (`#625`_)
Changed
^^^^^^^
* Stream writes in ``ed.DataFrame.to_csv()`` (`#579`_)
* Improve memory estimation for NLP models (`#568`_)
Fixed
^^^^^
* Fixed deprecations in preparation of Pandas 2.0 support (`#602`_, `#603`_, contributed by `@bartbroere`_)
.. _#568: https://github.com/elastic/eland/pull/568
.. _#579: https://github.com/elastic/eland/pull/579
.. _#602: https://github.com/elastic/eland/pull/602
.. _#603: https://github.com/elastic/eland/pull/603
.. _#625: https://github.com/elastic/eland/pull/625
8.10.1 (2023-10-11)
-------------------
Fixed
^^^^^
* Fixed direct usage of TransformerModel (`#619`_)
.. _#619: https://github.com/elastic/eland/pull/619
8.10.0 (2023-10-09)
-------------------
Added
^^^^^
* Published pre-built Docker images to docker.elastic.co/eland/eland (`#613`_)
* Allowed importing private HuggingFace models (`#608`_)
* Added Apple Silicon (arm64) support to Docker image (`#615`_)
* Allowed importing some DPR models like ance-dpr-context-multi (`#573`_)
* Allowed using the Pandas API without monitoring/main permissions (`#581`_)
Changed
^^^^^^^
* Updated Docker image to Debian 12 Bookworm (`#613`_)
* Reduced Docker image size by not installing unused PyTorch GPU support on amd64 (`#615`_)
* Reduced model chunk size to 1MB (`#605`_)
Fixed
^^^^^
* Fixed deprecations in preparation of Pandas 2.0 support (`#593`_, `#596`_, contributed by `@bartbroere`_)
.. _@bartbroere: https://github.com/bartbroere
.. _#613: https://github.com/elastic/eland/pull/613
.. _#608: https://github.com/elastic/eland/pull/608
.. _#615: https://github.com/elastic/eland/pull/615
.. _#573: https://github.com/elastic/eland/pull/573
.. _#581: https://github.com/elastic/eland/pull/581
.. _#605: https://github.com/elastic/eland/pull/605
.. _#593: https://github.com/elastic/eland/pull/593
.. _#596: https://github.com/elastic/eland/pull/596
8.9.0 (2023-08-24)
------------------
Added
^^^^^
* Simplify embedding model support and loading (`#569`_)
* Make eland_import_hub_model easier to find on Windows (`#559`_)
* Update trained model inference endpoint (`#556`_)
* Add BertJapaneseTokenizer support with bert_ja tokenization configuration (`#534`_)
* Add ability to upload xlm-roberta tokenized models (`#518`_)
* Tolerate different model output formats when measuring embedding size (`#535`_)
* Generate valid NLP model id from file path (`#541`_)
* Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model (`#522`_)
* Set embedding_size config parameter for Text Embedding models (`#532`_)
* Add support for the pass_through task (`#526`_)
Fixed
^^^^^
* Fixed black to comply with the code style (`#557`_)
* Fixed No module named 'torch' (`#553`_)
* Fix autosummary directive by removing hack autosummaries (`#548`_)
* Prevent TypeError with None check (`#525`_)
.. _#518: https://github.com/elastic/eland/pull/518
.. _#522: https://github.com/elastic/eland/pull/522
.. _#525: https://github.com/elastic/eland/pull/525
.. _#526: https://github.com/elastic/eland/pull/526
.. _#532: https://github.com/elastic/eland/pull/532
.. _#534: https://github.com/elastic/eland/pull/534
.. _#535: https://github.com/elastic/eland/pull/535
.. _#541: https://github.com/elastic/eland/pull/541
.. _#548: https://github.com/elastic/eland/pull/548
.. _#553: https://github.com/elastic/eland/pull/553
.. _#556: https://github.com/elastic/eland/pull/556
.. _#557: https://github.com/elastic/eland/pull/557
.. _#559: https://github.com/elastic/eland/pull/559
.. _#569: https://github.com/elastic/eland/pull/569
8.7.0 (2023-03-30)
------------------
Added
^^^^^
* Added a new NLP model task type "text_similarity" (`#486`_)
* Added a new NLP model task type "text_expansion" (`#520`_)
* Added support for exporting an Elastic ML model as a scikit-learn pipeline via ``MLModel.export_model()`` (`#509`_)
Fixed
^^^^^
* Fixed an issue that occurred when LightGBM was installed but libomp wasn't installed on the system. (`#499`_)
.. _#486: https://github.com/elastic/eland/pull/486
.. _#499: https://github.com/elastic/eland/pull/499
.. _#509: https://github.com/elastic/eland/pull/509
.. _#520: https://github.com/elastic/eland/pull/520
8.3.0 (2022-07-11)
------------------
Added
^^^^^
* Added a new NLP model task type "auto" which infers the task type based on model configuration and architecture (`#475`_)
Changed
^^^^^^^
* Changed required version of 'torch' package to `>=1.11.0,<1.12` to match required PyTorch version for Elasticsearch 8.3 (was `>=1.9.0,<2`) (`#479`_)
* Changed the default value of the `--task-type` parameter for the `eland_import_hub_model` CLI to be "auto" (`#475`_)
Fixed
^^^^^
* Fixed decision tree classifier serialization to account for probabilities (`#465`_)
* Fixed PyTorch model quantization (`#472`_)
.. _#465: https://github.com/elastic/eland/pull/465
.. _#472: https://github.com/elastic/eland/pull/472
.. _#475: https://github.com/elastic/eland/pull/475
.. _#479: https://github.com/elastic/eland/pull/479
8.2.0 (2022-05-09)
------------------
Added
^^^^^
* Added support for passing Cloud ID via ``--cloud-id`` to ``eland_import_hub_model`` CLI tool (`#462`_)
* Added support for authenticating via ``--es-username``, ``--es-password``, and ``--es-api-key`` to the ``eland_import_hub_model`` CLI tool (`#461`_)
* Added support for XGBoost 1.6 (`#458`_)
* Added support for ``question_answering`` NLP tasks (`#457`_)
.. _#457: https://github.com/elastic/eland/pull/457
.. _#458: https://github.com/elastic/eland/pull/458
.. _#461: https://github.com/elastic/eland/pull/461
.. _#462: https://github.com/elastic/eland/pull/462
8.1.0 (2022-03-31)
------------------
Added
^^^^^
* Added support for ``eland.Series.unique()`` (`#448`_, contributed by `@V1NAY8`_)
* Added ``--ca-certs`` and ``--insecure`` options to ``eland_import_hub_model`` for configuring TLS (`#441`_)
.. _#448: https://github.com/elastic/eland/pull/448
.. _#441: https://github.com/elastic/eland/pull/441
8.0.0 (2022-02-10)
------------------
Added
^^^^^
* Added support for Natural Language Processing (NLP) models using PyTorch (`#394`_)
* Added new extra ``eland[pytorch]`` for installing all dependencies needed for PyTorch (`#394`_)
* Added a CLI script ``eland_import_hub_model`` for uploading HuggingFace models to Elasticsearch (`#403`_)
* Added support for v8.0 of the Python Elasticsearch client (`#415`_)
* Added a warning if Eland detects it's communicating with an incompatible Elasticsearch version (`#419`_)
* Added support for ``number_samples`` to LightGBM and Scikit-Learn models (`#397`_, contributed by `@V1NAY8`_)
* Added ability to use datetime types for filtering dataframes (`284`_, contributed by `@Fju`_)
* Added pandas ``datetime64`` type to use the Elasticsearch ``date`` type (`#425`_, contributed by `@Ashton-Sidhu`_)
* Added ``es_verify_mapping_compatibility`` parameter to disable schema enforcement with ``pandas_to_eland`` (`#423`_, contributed by `@Ashton-Sidhu`_)
Changed
^^^^^^^
* Changed ``to_pandas()`` to only use Point-in-Time and ``search_after`` instead of using Scroll APIs
for pagination.
.. _@Fju: https://github.com/Fju
.. _@Ashton-Sidhu: https://github.com/Ashton-Sidhu
.. _#419: https://github.com/elastic/eland/pull/419
.. _#415: https://github.com/elastic/eland/pull/415
.. _#397: https://github.com/elastic/eland/pull/397
.. _#394: https://github.com/elastic/eland/pull/394
.. _#403: https://github.com/elastic/eland/pull/403
.. _#284: https://github.com/elastic/eland/pull/284
.. _#424: https://github.com/elastic/eland/pull/425
.. _#423: https://github.com/elastic/eland/pull/423
7.14.1b1 (2021-08-30)
---------------------
Added
^^^^^
* Added support for ``DataFrame.iterrows()`` and ``DataFrame.itertuples()`` (`#380`_, contributed by `@kxbin`_)
Performance
^^^^^^^^^^^
* Simplified result collectors to increase performance transforming Elasticsearch results to pandas (`#378`_, contributed by `@V1NAY8`_)
* Changed search pagination function to yield batches of hits (`#379`_)
.. _@kxbin: https://github.com/kxbin
.. _#378: https://github.com/elastic/eland/pull/378
.. _#379: https://github.com/elastic/eland/pull/379
.. _#380: https://github.com/elastic/eland/pull/380
7.14.0b1 (2021-08-09)
---------------------
Added
^^^^^
* Added support for Pandas 1.3.x (`#362`_, contributed by `@V1NAY8`_)
* Added support for LightGBM 3.x (`#362`_, contributed by `@V1NAY8`_)
* Added ``DataFrame.idxmax()`` and ``DataFrame.idxmin()`` methods (`#353`_, contributed by `@V1NAY8`_)
* Added type hints to ``eland.ndframe`` and ``eland.operations`` (`#366`_, contributed by `@V1NAY8`_)
Removed
^^^^^^^
* Removed support for Pandas <1.2 (`#364`_)
* Removed support for Python 3.6 to match Pandas (`#364`_)
Changed
^^^^^^^
* Changed paginated search function to use `Point-in-Time`_ and `Search After`_ features
instead of Scroll when connected to Elasticsearch 7.12+ (`#370`_ and `#376`_, contributed by `@V1NAY8`_)
* Optimized the ``FieldMappings.aggregate_field_name()`` method (`#373`_, contributed by `@V1NAY8`_)
.. _Point-in-Time: https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html
.. _Search After: https://www.elastic.co/guide/en/elasticsearch/reference/7.14/paginate-search-results.html#search-after
.. _#353: https://github.com/elastic/eland/pull/353
.. _#362: https://github.com/elastic/eland/pull/362
.. _#364: https://github.com/elastic/eland/pull/364
.. _#366: https://github.com/elastic/eland/pull/366
.. _#370: https://github.com/elastic/eland/pull/370
.. _#373: https://github.com/elastic/eland/pull/373
.. _#376: https://github.com/elastic/eland/pull/376
7.13.0b1 (2021-06-22)
---------------------
Added
^^^^^
* Added ``DataFrame.quantile()``, ``Series.quantile()``, and
``DataFrameGroupBy.quantile()`` aggregations (`#318`_ and `#356`_, contributed by `@V1NAY8`_)
Changed
^^^^^^^
* Changed the error raised when ``es_index_pattern`` doesn't point to any indices
to be more user-friendly (`#346`_)
Fixed
^^^^^
* Fixed a warning about conflicting field types when wildcards are used
in ``es_index_pattern`` (`#346`_)
* Fixed sorting when using ``DataFrame.groupby()`` with ``dropna``
(`#322`_, contributed by `@V1NAY8`_)
* Fixed deprecated usage ``numpy.int`` in favor of ``numpy.int_`` (`#354`_, contributed by `@V1NAY8`_)
.. _#318: https://github.com/elastic/eland/pull/318
.. _#322: https://github.com/elastic/eland/pull/322
.. _#346: https://github.com/elastic/eland/pull/346
.. _#354: https://github.com/elastic/eland/pull/354
.. _#356: https://github.com/elastic/eland/pull/356
7.10.1b1 (2021-01-12)
---------------------
Added
^^^^^
* Added support for Pandas 1.2.0 (`#336`_)
* Added ``DataFrame.mode()`` and ``Series.mode()`` aggregation (`#323`_, contributed by `@V1NAY8`_)
* Added support for ``pd.set_option("display.max_rows", None)``
(`#308`_, contributed by `@V1NAY8`_)
* Added Elasticsearch storage usage to ``df.info()`` (`#321`_, contributed by `@V1NAY8`_)
Removed
^^^^^^^
* Removed deprecated aliases ``read_es``, ``read_csv``, ``DataFrame.info_es``,
and ``MLModel(overwrite=True)`` (`#331`_, contributed by `@V1NAY8`_)
.. _#336: https://github.com/elastic/eland/pull/336
.. _#331: https://github.com/elastic/eland/pull/331
.. _#323: https://github.com/elastic/eland/pull/323
.. _#321: https://github.com/elastic/eland/pull/321
.. _#308: https://github.com/elastic/eland/pull/308
7.10.0b1 (2020-10-29)
---------------------
Added
^^^^^
* Added ``DataFrame.groupby()`` method with all aggregations
(`#278`_, `#291`_, `#292`_, `#300`_ contributed by `@V1NAY8`_)
* Added ``es_match()`` method to ``DataFrame`` and ``Series`` for
filtering rows with full-text search (`#301`_)
* Added support for type hints of the ``elasticsearch-py`` package (`#295`_)
* Added support for passing dictionaries to ``es_type_overrides`` parameter
in the ``pandas_to_eland()`` function to directly control the field mapping
generated in Elasticsearch (`#310`_)
* Added ``es_dtypes`` property to ``DataFrame`` and ``Series`` (`#285`_)
Changed
^^^^^^^
* Changed ``pandas_to_eland()`` to use the ``parallel_bulk()``
helper instead of single-threaded ``bulk()`` helper to improve
performance (`#279`_, contributed by `@V1NAY8`_)
* Changed the ``es_type_overrides`` parameter in ``pandas_to_eland()``
to raise ``ValueError`` if an unknown column is given (`#302`_)
* Changed ``DataFrame.filter()`` to preserve the order of items
(`#283`_, contributed by `@V1NAY8`_)
* Changed when setting ``es_type_overrides={"column": "text"}`` in
``pandas_to_eland()`` will automatically add the ``column.keyword``
sub-field so that aggregations are available for the field as well (`#310`_)
Fixed
^^^^^
* Fixed ``Series.__repr__`` when the series is empty (`#306`_)
.. _#278: https://github.com/elastic/eland/pull/278
.. _#279: https://github.com/elastic/eland/pull/279
.. _#283: https://github.com/elastic/eland/pull/283
.. _#285: https://github.com/elastic/eland/pull/285
.. _#291: https://github.com/elastic/eland/pull/291
.. _#292: https://github.com/elastic/eland/pull/292
.. _#295: https://github.com/elastic/eland/pull/295
.. _#300: https://github.com/elastic/eland/pull/300
.. _#301: https://github.com/elastic/eland/pull/301
.. _#302: https://github.com/elastic/eland/pull/302
.. _#306: https://github.com/elastic/eland/pull/306
.. _#310: https://github.com/elastic/eland/pull/310
7.9.1a1 (2020-09-29)
--------------------
Added
^^^^^
* Added the ``predict()`` method and ``model_type``,
``feature_names``, and ``results_field`` properties
to ``MLModel`` (`#266`_)
Deprecated
^^^^^^^^^^
* Deprecated ``ImportedMLModel`` in favor of
``MLModel.import_model(...)`` (`#266`_)
Changed
^^^^^^^
* Changed DataFrame aggregations to use ``numeric_only=None``
instead of ``numeric_only=True`` by default. This is the same
behavior as Pandas (`#270`_, contributed by `@V1NAY8`_)
Fixed
^^^^^
* Fixed ``DataFrame.agg()`` when given a string instead of a list of
aggregations will now properly return a ``Series`` instead of
a ``DataFrame`` (`#263`_, contributed by `@V1NAY8`_)
.. _#263: https://github.com/elastic/eland/pull/263
.. _#266: https://github.com/elastic/eland/pull/266
.. _#270: https://github.com/elastic/eland/pull/270
7.9.0a1 (2020-08-18)
--------------------
Added
^^^^^
* Added support for Pandas v1.1 (`#253`_)
* Added support for LightGBM ``LGBMRegressor`` and ``LGBMClassifier`` to ``ImportedMLModel`` (`#247`_, `#252`_)
* Added support for ``multi:softmax`` and ``multi:softprob`` XGBoost operators to ``ImportedMLModel`` (`#246`_)
* Added column names to ``DataFrame.__dir__()`` for better auto-completion support (`#223`_, contributed by `@leonardbinet`_)
* Added support for ``es_if_exists='append'`` to ``pandas_to_eland()`` (`#217`_)
* Added support for aggregating datetimes with ``nunique`` and ``mean`` (`#253`_)
* Added ``es_compress_model_definition`` parameter to ``ImportedMLModel`` constructor (`#220`_)
* Added ``.size`` and ``.ndim`` properties to ``DataFrame`` and ``Series`` (`#231`_ and `#233`_)
* Added ``.dtype`` property to ``Series`` (`#258`_)
* Added support for using ``pandas.Series`` with ``Series.isin()`` (`#231`_)
* Added type hints to many APIs in ``DataFrame`` and ``Series`` (`#231`_)
Deprecated
^^^^^^^^^^
* Deprecated the ``overwrite`` parameter in favor of ``es_if_exists`` in ``ImportedMLModel`` constructor (`#249`_, contributed by `@V1NAY8`_)
Changed
^^^^^^^
* Changed aggregations for datetimes to be higher precision when available (`#253`_)
Fixed
^^^^^
* Fixed ``ImportedMLModel.predict()`` to fail when ``errors`` are present in the ``ingest.simulate`` response (`#220`_)
* Fixed ``Series.median()`` aggregation to return a scalar instead of ``pandas.Series`` (`#253`_)
* Fixed ``Series.describe()`` to return a ``pandas.Series`` instead of ``pandas.DataFrame`` (`#258`_)
* Fixed ``DataFrame.mean()`` and ``Series.mean()`` dtype (`#258`_)
* Fixed ``DataFrame.agg()`` aggregations when using ``extended_stats`` Elasticsearch aggregation (`#253`_)
.. _@leonardbinet: https://github.com/leonardbinet
.. _@V1NAY8: https://github.com/V1NAY8
.. _#217: https://github.com/elastic/eland/pull/217
.. _#220: https://github.com/elastic/eland/pull/220
.. _#223: https://github.com/elastic/eland/pull/223
.. _#231: https://github.com/elastic/eland/pull/231
.. _#233: https://github.com/elastic/eland/pull/233
.. _#246: https://github.com/elastic/eland/pull/246
.. _#247: https://github.com/elastic/eland/pull/247
.. _#249: https://github.com/elastic/eland/pull/249
.. _#252: https://github.com/elastic/eland/pull/252
.. _#253: https://github.com/elastic/eland/pull/253
.. _#258: https://github.com/elastic/eland/pull/258
7.7.0a1 (2020-05-20)
--------------------
Added
^^^^^
* Added the package to Conda Forge, install via
``conda install -c conda-forge eland`` (`#209`_)
* Added ``DataFrame.sample()`` and ``Series.sample()`` for querying
a random sample of data from the index (`#196`_, contributed by `@mesejo`_)
* Added ``Series.isna()`` and ``Series.notna()`` for filtering out
missing, ``NaN`` or null values from a column (`#210`_, contributed by `@mesejo`_)
* Added ``DataFrame.filter()`` and ``Series.filter()`` for reducing an axis
using a sequence of items or a pattern (`#212`_)
* Added ``DataFrame.to_pandas()`` and ``Series.to_pandas()`` for converting
an Eland dataframe or series into a Pandas dataframe or series inline (`#208`_)
* Added support for XGBoost v1.0.0 (`#200`_)
Deprecated
^^^^^^^^^^
* Deprecated ``info_es()`` in favor of ``es_info()`` (`#208`_)
* Deprecated ``eland.read_csv()`` in favor of ``eland.csv_to_eland()`` (`#208`_)
* Deprecated ``eland.read_es()`` in favor of ``eland.DataFrame()`` (`#208`_)
Changed
^^^^^^^
* Changed ``var`` and ``std`` aggregations to use sample instead of
population in line with Pandas (`#185`_)
* Changed painless scripts to use ``source`` rather than ``inline`` to improve
script caching performance (`#191`_, contributed by `@mesejo`_)
* Changed minimum ``elasticsearch`` Python library version to v7.7.0 (`#207`_)
* Changed name of ``Index.field_name`` to ``Index.es_field_name`` (`#208`_)
Fixed
^^^^^
* Fixed ``DeprecationWarning`` raised from ``pandas.Series`` when an
an empty series was created without specifying ``dtype`` (`#188`_, contributed by `@mesejo`_)
* Fixed a bug when filtering columns on complex combinations of and and or (`#204`_)
* Fixed an issue where ``DataFrame.shape`` would return a larger value than
in the index if a sized operation like ``.head(X)`` was applied to the data
frame (`#205`_, contributed by `@mesejo`_)
* Fixed issue where both ``scikit-learn`` and ``xgboost`` libraries were
required to use ``eland.ml.ImportedMLModel``, now only one library is
required to use this feature (`#206`_)
.. _#200: https://github.com/elastic/eland/pull/200
.. _#201: https://github.com/elastic/eland/pull/201
.. _#204: https://github.com/elastic/eland/pull/204
.. _#205: https://github.com/elastic/eland/pull/205
.. _#206: https://github.com/elastic/eland/pull/206
.. _#207: https://github.com/elastic/eland/pull/207
.. _#191: https://github.com/elastic/eland/pull/191
.. _#210: https://github.com/elastic/eland/pull/210
.. _#185: https://github.com/elastic/eland/pull/185
.. _#188: https://github.com/elastic/eland/pull/188
.. _#196: https://github.com/elastic/eland/pull/196
.. _#208: https://github.com/elastic/eland/pull/208
.. _#209: https://github.com/elastic/eland/pull/209
.. _#212: https://github.com/elastic/eland/pull/212
7.6.0a5 (2020-04-14)
--------------------
Added
^^^^^
* Added support for Pandas v1.0.0 (`#141`_, contributed by `@mesejo`_)
* Added ``use_pandas_index_for_es_ids`` parameter to ``pandas_to_eland()`` (`#154`_)
* Added ``es_type_overrides`` parameter to ``pandas_to_eland()`` (`#181`_)
* Added ``NDFrame.var()``, ``.std()`` and ``.median()`` aggregations (`#175`_, `#176`_, contributed by `@mesejo`_)
* Added ``DataFrame.es_query()`` to allow modifying ES queries directly (`#156`_)
* Added ``eland.__version__`` (`#153`_, contributed by `@mesejo`_)
Removed
^^^^^^^
* Removed support for Python 3.5 (`#150`_)
* Removed ``eland.Client()`` interface, use
``elasticsearch.Elasticsearch()`` client instead (`#166`_)
* Removed all private objects from top-level ``eland`` namespace (`#170`_)
* Removed ``geo_points`` from ``pandas_to_eland()`` in favor of ``es_type_overrides`` (`#181`_)
Changed
^^^^^^^
* Changed ML model serialization to be slightly smaller (`#159`_)
* Changed minimum ``elasticsearch`` Python library version to v7.6.0 (`#181`_)
Fixed
^^^^^
* Fixed ``inference_config`` being required on ML models for ES >=7.8 (`#174`_)
* Fixed unpacking for ``DataFrame.aggregate("median")`` (`#161`_)
.. _@mesejo: https://github.com/mesejo
.. _#141: https://github.com/elastic/eland/pull/141
.. _#150: https://github.com/elastic/eland/pull/150
.. _#153: https://github.com/elastic/eland/pull/153
.. _#154: https://github.com/elastic/eland/pull/154
.. _#156: https://github.com/elastic/eland/pull/156
.. _#159: https://github.com/elastic/eland/pull/159
.. _#161: https://github.com/elastic/eland/pull/161
.. _#166: https://github.com/elastic/eland/pull/166
.. _#170: https://github.com/elastic/eland/pull/170
.. _#174: https://github.com/elastic/eland/pull/174
.. _#175: https://github.com/elastic/eland/pull/175
.. _#176: https://github.com/elastic/eland/pull/176
.. _#181: https://github.com/elastic/eland/pull/181
7.6.0a4 (2020-03-23)
--------------------
Changed
^^^^^^^
* Changed requirement for ``xgboost`` from ``>=0.90`` to ``==0.90``
Fixed
^^^^^
* Fixed issue in ``DataFrame.info()`` when called on an empty frame (`#135`_)
* Fixed issues where many ``_source`` fields would generate
a ``too_long_frame`` error (`#135`_, `#137`_)
.. _#135: https://github.com/elastic/eland/pull/135
.. _#137: https://github.com/elastic/eland/pull/137

View File

@ -1,5 +1,4 @@
Contributing to eland # Contributing to eland
=====================
Eland is an open source project and we love to receive contributions Eland is an open source project and we love to receive contributions
from our community --- you! There are many ways to contribute, from from our community --- you! There are many ways to contribute, from
@ -7,8 +6,7 @@ writing tutorials or blog posts, improving the documentation, submitting
bug reports and feature requests or writing code which can be bug reports and feature requests or writing code which can be
incorporated into eland itself. incorporated into eland itself.
Bug reports ## Bug reports
-----------
If you think you have found a bug in eland, first make sure that you are If you think you have found a bug in eland, first make sure that you are
testing against the [latest version of testing against the [latest version of
@ -29,8 +27,7 @@ lies with your query, when actually it depends on how your data is
indexed. The easier it is for us to recreate your problem, the faster it indexed. The easier it is for us to recreate your problem, the faster it
is likely to be fixed. is likely to be fixed.
Feature requests ## Feature requests
----------------
If you find yourself wishing for a feature that doesn\'t exist in eland, If you find yourself wishing for a feature that doesn\'t exist in eland,
you are probably not alone. There are bound to be others out there with you are probably not alone. There are bound to be others out there with
@ -40,8 +37,7 @@ list](https://github.com/elastic/eland/issues) on GitHub which describes
the feature you would like to see, why you need it, and how it should the feature you would like to see, why you need it, and how it should
work. work.
Contributing code and documentation changes ## Contributing code and documentation changes
-------------------------------------------
If you have a bugfix or new feature that you would like to contribute to If you have a bugfix or new feature that you would like to contribute to
eland, please find or open an issue about it first. Talk about what you eland, please find or open an issue about it first. Talk about what you
@ -66,7 +62,7 @@ individual projects can be found below.
You will need to fork the main eland code or documentation repository You will need to fork the main eland code or documentation repository
and clone it to your local machine. See [github help and clone it to your local machine. See [github help
page](https://help.github.com/articles/fork-a-repo) for help. page](https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/fork-a-repo) for help.
Further instructions for specific projects are given below. Further instructions for specific projects are given below.
@ -74,58 +70,69 @@ Further instructions for specific projects are given below.
Once your changes and tests are ready to submit for review: Once your changes and tests are ready to submit for review:
1. Test your changes 1. Run the linter and test suite to ensure your changes do not break the existing code:
Run the test suite to make sure that nothing is broken (TODO add (TODO Add link to the testing document)
link to testing doc).
2. Sign the Contributor License Agreement ``` bash
# Run Auto-format, lint, mypy type checker for your changes
$ nox -s format
Please make sure you have signed our [Contributor License # Launch Elasticsearch with a trial licence and ML enabled
Agreement](https://www.elastic.co/contributor-agreement/). We are $ docker run --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.license.self_generated.type=trial" docker.elastic.co/elasticsearch/elasticsearch:9.0.0
not asking you to assign copyright to us, but to give us the right
to distribute your code without restriction. We ask this of all
contributors in order to assure our users of the origin and
continuing existence of the code. You only need to sign the CLA
once.
3. Rebase your changes # See all test suites
$ nox -l
# Run a specific test suite
$ nox -rs "test-3.12(pandas_version='2.2.3')"
# Run a specific test
$ nox -rs "test-3.12(pandas_version='2.2.3')" -- -k test_learning_to_rank
```
2. Sign the Contributor License Agreement
Please make sure you have signed our [Contributor License Agreement](https://www.elastic.co/contributor-agreement/).
We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction.
We ask this of all contributors in order to assure our users of the origin and continuing existence of the code.
You only need to sign the CLA once.
3. Rebase your changes
Update your local repository with the most recent code from the main Update your local repository with the most recent code from the main
eland repository, and rebase your branch on top of the latest master eland repository, and rebase your branch on top of the latest main
branch. We prefer your initial changes to be squashed into a single branch. We prefer your initial changes to be squashed into a single
commit. Later, if we ask you to make changes, add them as separate commit. Later, if we ask you to make changes, add them as separate
commits. This makes them easier to review. As a final step before commits. This makes them easier to review. As a final step before
merging we will either ask you to squash all commits yourself or merging we will either ask you to squash all commits yourself or
we\'ll do it for you. we\'ll do it for you.
4. Submit a pull request 4. Submit a pull request
Push your local changes to your forked copy of the repository and Push your local changes to your forked copy of the repository and
[submit a pull [submit a pull
request](https://help.github.com/articles/using-pull-requests). In request](https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/proposing-changes-to-your-work-with-pull-requests) .
the pull request, choose a title which sums up the changes that you In the pull request, choose a title which sums up the changes that you
have made, and in the body provide more details about what your have made, and in the body provide more details about what your
changes do. Also mention the number of the issue where discussion changes do. Also mention the number of the issue where discussion
has taken place, eg "Closes \#123". has taken place, eg "Closes \#123".
Then sit back and wait. There will probably be discussion about the pull Then sit back and wait. There will probably be discussion about the pull
request and, if any changes are needed, we would love to work with you request and, if any changes are needed, we would love to work with you
to get your pull request merged into eland. to get your pull request merged into `eland` .
Please adhere to the general guideline that you should never force push Please adhere to the general guideline that you should never force push
to a publicly shared branch. Once you have opened your pull request, you to a publicly shared branch. Once you have opened your pull request, you
should consider your branch publicly shared. Instead of force pushing should consider your branch publicly shared. Instead of force pushing
you can just add incremental commits; this is generally easier on your you can just add incremental commits; this is generally easier on your
reviewers. If you need to pick up changes from master, you can merge reviewers. If you need to pick up changes from main, you can merge
master into your branch. A reviewer might ask you to rebase a main into your branch. A reviewer might ask you to rebase a
long-running pull request in which case force pushing is okay for that long-running pull request in which case force pushing is okay for that
request. Note that squashing at the end of the review process should request. Note that squashing at the end of the review process should
also not be done, that can be done when the pull request is [integrated also not be done, that can be done when the pull request is [integrated
via GitHub](https://github.com/blog/2141-squash-your-commits). via GitHub](https://github.com/blog/2141-squash-your-commits).
Contributing to the eland codebase ## Contributing to the eland codebase
----------------------------------
**Repository:** <https://github.com/elastic/eland> **Repository:** <https://github.com/elastic/eland>
@ -136,27 +143,91 @@ currently using a minimum version of PyCharm 2019.2.4.
(All commands should be run from module root) (All commands should be run from module root)
- Create a new project via \'Check out from Version * Create a new project via \'Check out from Version
Control\'-\>\'Git\' on the \"Welcome to PyCharm\" page (or other) Control\'-\>\'Git\' on the \"Welcome to PyCharm\" page (or other)
- Enter the URL to your fork of eland
(e.g. `git@github.com:stevedodson/eland.git`) * Enter the URL to your fork of eland
- Click \'Yes\' for \'Checkout from Version Control\'
- Configure PyCharm environment: (e.g.  `git@github.com:stevedodson/eland.git` )
- In \'Preferences\' configure a \'Project: eland\'-\>\'Project
* Click \'Yes\' for \'Checkout from Version Control\'
* Configure PyCharm environment:
* In \'Preferences\' configure a \'Project: eland\'-\>\'Project
Interpreter\'. Generally, we recommend creating a virtual Interpreter\'. Generally, we recommend creating a virtual
environment (TODO link to installing for python version support). environment (TODO link to installing for python version support).
- In \'Preferences\' set \'Tools\'-\>\'Python Integrated
* In \'Preferences\' set \'Tools\'-\>\'Python Integrated
Tools\'-\>\'Default test runner\' to `pytest` Tools\'-\>\'Default test runner\' to `pytest`
- In \'Preferences\' set \'Tools\'-\>\'Python Integrated
* In \'Preferences\' set \'Tools\'-\>\'Python Integrated
Tools\'-\>\'Docstring format\' to `numpy` Tools\'-\>\'Docstring format\' to `numpy`
- Install development requirements. Open terminal in virtual
environment and run `pip install -r requirements-dev.txt` * To install development requirements. Open terminal in virtual environment and run
- Setup Elasticsearch instance (assumes `localhost:9200`), and run
`python -m eland.tests.setup_tests` to setup test environment -*note ``` bash
this modifies Elasticsearch indices* > pip install -r requirements-dev.txt
- Run `pytest --nbval --doctest-modules` to validate install ```
* Setup Elasticsearch instance with docker
``` bash
> ELASTICSEARCH_VERSION=elasticsearch:8.17.0 BUILDKITE=false .buildkite/run-elasticsearch.sh
```
* Now check `http://localhost:9200`
* Install local `eland` module (required to execute notebook tests)
``` bash
> python setup.py install
```
* To setup test environment:
``` bash
> python -m tests.setup_tests
```
(Note this modifies Elasticsearch indices)
* To validate installation, open python console and run
``` bash
> import eland as ed
> ed_df = ed.DataFrame('http://localhost:9200', 'flights')
```
* To run the automatic formatter and check for lint issues run
``` bash
> nox -s format
```
* To test specific versions of Python run
``` bash
> nox -s test-3.12
```
### Documentation ### Documentation
- Install documentation requirements. Open terminal in virtual * [Install pandoc on your system](https://pandoc.org/installing.html) . For Ubuntu or Debian you can do
environment and run `pip install -r docs/requirements-docs.txt`
``` bash
> sudo apt-get install -y pandoc
```
* Install documentation requirements. Open terminal in virtual environment and run
``` bash
> pip install -r docs/requirements-docs.txt
```
* To verify/generate documentation run
``` bash
> nox -s docs
```

28
Dockerfile Normal file
View File

@ -0,0 +1,28 @@
# syntax=docker/dockerfile:1
FROM python:3.10-slim
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && apt-get install -y \
build-essential \
pkg-config \
cmake \
libzip-dev \
libjpeg-dev
ADD . /eland
WORKDIR /eland
ARG TARGETPLATFORM
RUN --mount=type=cache,target=/root/.cache/pip \
if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
python3 -m pip install \
--no-cache-dir --disable-pip-version-check --extra-index-url https://download.pytorch.org/whl/cpu \
torch==2.5.1+cpu .[all]; \
else \
python3 -m pip install \
--no-cache-dir --disable-pip-version-check \
.[all]; \
fi
CMD ["/bin/sh"]

42
Dockerfile.wolfi Normal file
View File

@ -0,0 +1,42 @@
# syntax=docker/dockerfile:1
FROM docker.elastic.co/wolfi/python:3.10-dev AS builder
WORKDIR /eland
ENV VIRTUAL_ENV=/eland/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
ADD . /eland
ARG TARGETPLATFORM
RUN --mount=type=cache,target=/root/.cache/pip \
if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
python3 -m pip install \
--no-cache-dir --disable-pip-version-check --extra-index-url https://download.pytorch.org/whl/cpu \
torch==2.5.1+cpu .[all]; \
else \
python3 -m pip install \
--no-cache-dir --disable-pip-version-check \
.[all]; \
fi
FROM docker.elastic.co/wolfi/python:3.10
WORKDIR /eland
ENV VIRTUAL_ENV=/eland/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
COPY --from=builder /eland /eland
# The eland_import_hub_model script is intended to be executed by a shell,
# which will see its shebang line and then execute it with the Python
# interpreter of the virtual environment. We want to keep this behavior even
# with Wolfi so that users can use the image as before. To do that, we use two
# tricks:
#
# * copy /bin/sh (that is, busybox's ash) from the builder image
# * revert to Docker's the default entrypoint, which is the only way to pass
# parameters to `eland_import_hub_model` without needing quotes.
#
COPY --from=builder /bin/sh /bin/sh
ENTRYPOINT []

View File

@ -1,13 +1,201 @@
Copyright 2019 Elasticsearch BV Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Licensed under the Apache License, Version 2.0 (the "License"); TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 1. Definitions.
Unless required by applicable law or agreed to in writing, software "License" shall mean the terms and conditions for use, reproduction,
distributed under the License is distributed on an "AS IS" BASIS, and distribution as defined by Sections 1 through 9 of this document.
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and "Licensor" shall mean the copyright owner or entity authorized by
limitations under the License. the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -1 +1,3 @@
include LICENSE.txt include LICENSE.txt
include README.md
include eland/py.typed

View File

@ -50,3 +50,6 @@ Permission is hereby granted, free of charge, to any person obtaining a copy of
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--
This product contains an adapted version of the "us-national-parks" dataset, https://data.world/kevinnayar/us-national-parks, by Kevin Nayar, https://data.world/kevinnayar, is licensed under CC BY, https://creativecommons.org/licenses/by/4.0/legalcode

380
README.md
View File

@ -1,150 +1,280 @@
_Note, this project is still very much a work in progress and in an alpha state; input and contributions welcome!_ <div align="center">
<a href="https://github.com/elastic/eland">
<img src="https://raw.githubusercontent.com/elastic/eland/main/docs/sphinx/logo/eland.png" width="30%"
alt="Eland" />
</a>
</div>
<br />
<div align="center">
<a href="https://pypi.org/project/eland"><img src="https://img.shields.io/pypi/v/eland.svg" alt="PyPI Version"></a>
<a href="https://anaconda.org/conda-forge/eland"><img src="https://img.shields.io/conda/vn/conda-forge/eland"
alt="Conda Version"></a>
<a href="https://pepy.tech/project/eland"><img src="https://static.pepy.tech/badge/eland" alt="Downloads"></a>
<a href="https://pypi.org/project/eland"><img src="https://img.shields.io/pypi/status/eland.svg"
alt="Package Status"></a>
<a href="https://buildkite.com/elastic/eland"><img src="https://badge.buildkite.com/d92340e800bc06a7c7c02a71b8d42fcb958bd18c25f99fe2d9.svg" alt="Build Status"></a>
<a href="https://github.com/elastic/eland/blob/main/LICENSE.txt"><img src="https://img.shields.io/pypi/l/eland.svg"
alt="License"></a>
<a href="https://eland.readthedocs.io"><img
src="https://readthedocs.org/projects/eland/badge/?version=latest" alt="Documentation Status"></a>
</div>
# eland: pandas-like Python client for analysis of Elasticsearch data ## About
<table> Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar
<tr> Pandas-compatible API.
<td>Latest Release</td>
<td>
<a href="https://pypi.org/project/eland/">
<img src="https://img.shields.io/pypi/v/eland.svg" alt="latest release" />
</a>
</td>
<tr>
<td>Package Status</td>
<td>
<a href="https://pypi.org/project/eland/">
<img src="https://img.shields.io/pypi/status/eland.svg" alt="status" />
</a>
</td>
</tr>
<tr>
<td>License</td>
<td>
<a href="https://github.com/elastic/eland/LICENSE.txt">
<img src="https://img.shields.io/pypi/l/eland.svg" alt="license" />
</a>
</td>
</tr>
<tr>
<td>Build Status</td>
<td>
<a href="https://clients-ci.elastic.co/job/elastic+eland+master/">
<img src="https://clients-ci.elastic.co/buildStatus/icon?job=elastic%2Beland%2Bmaster" alt="Build Status" />
</a>
</td>
</tr>
</table>
# What is it? Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy,
pandas, or scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and
not in memory, which allows Eland to access large datasets stored in Elasticsearch.
eland is a Elasticsearch client Python package to analyse, explore and manipulate data that resides in Elasticsearch. Eland also provides tools to upload trained machine learning models from common libraries like
Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy, [scikit-learn](https://scikit-learn.org), [XGBoost](https://xgboost.readthedocs.io), and
pandas, scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and [LightGBM](https://lightgbm.readthedocs.io) into Elasticsearch.
not in memory, which allows eland to access large datasets stored in Elasticsearch.
For example, to explore data in a large Elasticsearch index, simply create an eland DataFrame from an Elasticsearch ## Getting Started
index pattern, and explore using an API that mirrors a subset of the pandas.DataFrame API:
Eland can be installed from [PyPI](https://pypi.org/project/eland) with Pip:
```bash
$ python -m pip install eland
``` ```
If using Eland to upload NLP models to Elasticsearch install the PyTorch extras:
```bash
$ python -m pip install 'eland[pytorch]'
```
Eland can also be installed from [Conda Forge](https://anaconda.org/conda-forge/eland) with Conda:
```bash
$ conda install -c conda-forge eland
```
### Compatibility
- Supports Python 3.9, 3.10, 3.11 and 3.12.
- Supports Pandas 1.5 and 2.
- Supports Elasticsearch 8+ clusters, recommended 8.16 or later for all features to work.
If you are using the NLP with PyTorch feature make sure your Eland minor version matches the minor
version of your Elasticsearch cluster. For all other features it is sufficient for the major versions
to match.
- You need to install the appropriate version of PyTorch to import an NLP model. Run `python -m pip
install 'eland[pytorch]'` to install that version.
### Prerequisites
Users installing Eland on Debian-based distributions may need to install prerequisite packages for the transitive
dependencies of Eland:
```bash
$ sudo apt-get install -y \
build-essential pkg-config cmake \
python3-dev libzip-dev libjpeg-dev
```
Note that other distributions such as CentOS, RedHat, Arch, etc. may require using a different package manager and
specifying different package names.
### Docker
If you want to use Eland without installing it just to run the available scripts, use the Docker
image.
It can be used interactively:
```bash
$ docker run -it --rm --network host docker.elastic.co/eland/eland
```
Running installed scripts is also possible without an interactive shell, e.g.:
```bash
$ docker run -it --rm --network host \
docker.elastic.co/eland/eland \
eland_import_hub_model \
--url http://host.docker.internal:9200/ \
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
--task-type ner
```
### Connecting to Elasticsearch
Eland uses the [Elasticsearch low level client](https://elasticsearch-py.readthedocs.io) to connect to Elasticsearch.
This client supports a range of [connection options and authentication options](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch).
You can pass either an instance of `elasticsearch.Elasticsearch` to Eland APIs
or a string containing the host to connect to:
```python
import eland as ed
# Connecting to an Elasticsearch instance running on 'http://localhost:9200'
df = ed.DataFrame("http://localhost:9200", es_index_pattern="flights")
# Connecting to an Elastic Cloud instance
from elasticsearch import Elasticsearch
es = Elasticsearch(
cloud_id="cluster-name:...",
basic_auth=("elastic", "<password>")
)
df = ed.DataFrame(es, es_index_pattern="flights")
```
## DataFrames in Eland
`eland.DataFrame` wraps an Elasticsearch index in a Pandas-like API
and defers all processing and filtering of data to Elasticsearch
instead of your local machine. This means you can process large
amounts of data within Elasticsearch from a Jupyter Notebook
without overloading your machine.
➤ [Eland DataFrame API documentation](https://eland.readthedocs.io/en/latest/reference/dataframe.html)
➤ [Advanced examples in a Jupyter Notebook](https://eland.readthedocs.io/en/latest/examples/demo_notebook.html)
```python
>>> import eland as ed >>> import eland as ed
>>> df = ed.read_es('http://localhost:9200', 'reviews') >>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('http://localhost:9200', 'flights')
# eland.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head() >>> df.head()
reviewerId vendorId rating date AvgTicketPrice Cancelled ... dayOfWeek timestamp
0 0 0 5 2006-04-07 17:08 0 841.265642 False ... 0 2018-01-01 00:00:00
1 1 1 5 2006-05-04 12:16 1 882.982662 False ... 0 2018-01-01 18:27:00
2 2 2 4 2006-04-21 12:26 2 190.636904 False ... 0 2018-01-01 17:11:14
3 3 3 5 2006-04-18 15:48 3 181.694216 True ... 0 2018-01-01 10:33:28
4 3 4 5 2006-04-18 15:49 4 730.041778 False ... 0 2018-01-01 05:13:00
>>> df.describe() [5 rows x 27 columns]
reviewerId vendorId rating
count 578805.000000 578805.000000 578805.000000 >>> df.info()
mean 174124.098437 60.645267 4.679671 <class 'eland.dataframe.DataFrame'>
std 116951.972209 54.488053 0.800891 Index: 13059 entries, 0 to 13058
min 0.000000 0.000000 0.000000 Data columns (total 27 columns):
25% 70043.000000 20.000000 5.000000 # Column Non-Null Count Dtype
50% 161052.000000 44.000000 5.000000 --- ------ -------------- -----
75% 272697.000000 83.000000 5.000000 0 AvgTicketPrice 13059 non-null float64
max 400140.000000 246.000000 5.000000 1 Cancelled 13059 non-null bool
2 Carrier 13059 non-null object
...
24 OriginWeather 13059 non-null object
25 dayOfWeek 13059 non-null int64
26 timestamp 13059 non-null datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(5), int64(2), object(17)
memory usage: 80.0 bytes
Elasticsearch storage usage: 5.043 MB
# Filtering of rows using comparisons
>>> df[(df.Carrier=="Kibana Airlines") & (df.AvgTicketPrice > 900.0) & (df.Cancelled == True)].head()
AvgTicketPrice Cancelled ... dayOfWeek timestamp
8 960.869736 True ... 0 2018-01-01 12:09:35
26 975.812632 True ... 0 2018-01-01 15:38:32
311 946.358410 True ... 0 2018-01-01 11:51:12
651 975.383864 True ... 2 2018-01-03 21:13:17
950 907.836523 True ... 2 2018-01-03 05:14:51
[5 rows x 27 columns]
# Running aggregations across an index
>>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std'])
DistanceKilometers AvgTicketPrice
sum 9.261629e+07 8.204365e+06
min 0.000000e+00 1.000205e+02
std 4.578263e+03 2.663867e+02
``` ```
See [docs](https://eland.readthedocs.io/en/latest) and [demo_notebook.ipynb](https://eland.readthedocs.io/en/latest/examples/demo_notebook.html) for more examples. ## Machine Learning in Eland
## Where to get it ### Regression and classification
The source code is currently hosted on GitHub at:
https://github.com/elastic/eland
Binary installers for the latest released version are available at the [Python Eland allows transforming trained regression and classification models from scikit-learn, XGBoost, and LightGBM
package index](https://pypi.org/project/eland). libraries to be serialized and used as an inference model in Elasticsearch.
```sh ➤ [Eland Machine Learning API documentation](https://eland.readthedocs.io/en/latest/reference/ml.html)
pip install eland
➤ [Read more about Machine Learning in Elasticsearch](https://www.elastic.co/guide/en/machine-learning/current/ml-getting-started.html)
```python
>>> from sklearn import datasets
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
# Train and exercise an XGBoost ML model locally
>>> training_data = datasets.make_classification(n_features=5)
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])
>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
es_client="http://localhost:9200",
model_id="xgb-classifier",
model=xgb_model,
feature_names=["f0", "f1", "f2", "f3", "f4"],
)
# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
``` ```
## Development Setup ### NLP with PyTorch
1. Create a virtual environment in Python For NLP tasks, Eland allows importing PyTorch trained BERT models into Elasticsearch. Models can be either plain PyTorch
models, or supported [transformers](https://huggingface.co/transformers) models from the
[Hugging Face model hub](https://huggingface.co/models).
For example, ```bash
$ eland_import_hub_model \
``` --url http://localhost:9200/ \
python3 -m venv env --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
--task-type ner \
--start
``` ```
2. Activate the virtual environment The example above will automatically start a model deployment. This is a
good shortcut for initial experimentation, but for anything that needs
good throughput you should omit the `--start` argument from the Eland
command line and instead start the model using the ML UI in Kibana.
The `--start` argument will deploy the model with one allocation and one
thread per allocation, which will not offer good performance. When starting
the model deployment using the ML UI in Kibana or the Elasticsearch
[API](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-trained-model-deployment.html)
you will be able to set the threading options to make the best use of your
hardware.
```python
>>> import elasticsearch
>>> from pathlib import Path
>>> from eland.common import es_version
>>> from eland.ml.pytorch import PyTorchModel
>>> from eland.ml.pytorch.transformers import TransformerModel
>>> es = elasticsearch.Elasticsearch("http://elastic:mlqa_admin@localhost:9200")
>>> es_cluster_version = es_version(es)
# Load a Hugging Face transformers model directly from the model hub
>>> tm = TransformerModel(model_id="elastic/distilbert-base-cased-finetuned-conll03-english", task_type="ner", es_version=es_cluster_version)
Downloading: 100%|██████████| 257/257 [00:00<00:00, 108kB/s]
Downloading: 100%|██████████| 954/954 [00:00<00:00, 372kB/s]
Downloading: 100%|██████████| 208k/208k [00:00<00:00, 668kB/s]
Downloading: 100%|██████████| 112/112 [00:00<00:00, 43.9kB/s]
Downloading: 100%|██████████| 249M/249M [00:23<00:00, 11.2MB/s]
# Export the model in a TorchScrpt representation which Elasticsearch uses
>>> tmp_path = "models"
>>> Path(tmp_path).mkdir(parents=True, exist_ok=True)
>>> model_path, config, vocab_path = tm.save(tmp_path)
# Import model into Elasticsearch
>>> ptm = PyTorchModel(es, tm.elasticsearch_model_id())
>>> ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
100%|██████████| 63/63 [00:12<00:00, 5.02it/s]
``` ```
source env/bin/activate
```
3. Install dependencies from the `requirements.txt` file
```
pip install -r requirements.txt
```
## Versions and Compatibility
### Python Version Support
Officially Python 3.5.3 and above, 3.6, 3.7, and 3.8.
eland depends on pandas version 0.25.3.
#### Elasticsearch Versions
eland is versioned like the Elastic stack (eland 7.5.1 is compatible with Elasticsearch 7.x up to 7.5.1)
A major version of the client is compatible with the same major version of Elasticsearch.
No compatibility assurances are given between different major versions of the client and Elasticsearch.
Major differences likely exist between major versions of Elasticsearch,
particularly around request and response object formats, but also around API urls and behaviour.
## Connecting to Elasticsearch Cloud
```
>>> import eland as ed
>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch(cloud_id="<cloud_id>", http_auth=('<user>','<password>'))
>>> es.info()
{'name': 'instance-0000000000', 'cluster_name': 'bf900cfce5684a81bca0be0cce5913bc', 'cluster_uuid': 'xLPvrV3jQNeadA7oM4l1jA', 'version': {'number': '7.4.2', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': '2f90bbf7b93631e52bafb59b3b049cb44ec25e96', 'build_date': '2019-10-28T20:40:44.881551Z', 'build_snapshot': False, 'lucene_version': '8.2.0', 'minimum_wire_compatibility_version': '6.8.0', 'minimum_index_compatibility_version': '6.0.0-beta1'}, 'tagline': 'You Know, for Search'}
>>> df = ed.read_es(es, 'reviews')
```
## Why eland?
Naming is difficult, but as we had to call it something:
* eland: elastic and data
* eland: 'Elk/Moose' in Dutch (Alces alces)
* [Elandsgracht](https://goo.gl/maps/3hGBMqeGRcsBJfKx8): Amsterdam street near Elastic's Amsterdam office
[Pronunciation](https://commons.wikimedia.org/wiki/File:Nl-eland.ogg): /ˈeːlɑnt/

94
catalog-info.yaml Normal file
View File

@ -0,0 +1,94 @@
# Declare a Backstage Component that represents the Eland application.
---
# yaml-language-server: $schema=https://json.schemastore.org/catalog-info.json
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: eland
description: Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
annotations:
backstage.io/source-location: url:https://github.com/elastic/eland/
github.com/project-slug: elastic/eland
github.com/team-slug: elastic/ml-core
buildkite.com/project-slug: elastic/eland
tags:
- elasticsearch
- python
- machine-learning
- big-data
- etl
links:
- title: Eland docs
url: https://eland.readthedocs.io/
spec:
type: application
owner: group:ml-core
lifecycle: production
dependsOn:
- resource:eland-pipeline
- resource:eland-releaser-docker-pipeline
# yaml-language-server: $schema=https://gist.githubusercontent.com/elasticmachine/988b80dae436cafea07d9a4a460a011d/raw/e57ee3bed7a6f73077a3f55a38e76e40ec87a7cf/rre.schema.json
---
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: eland-pipeline
description: Run Eland tests
links:
- title: Pipeline
url: https://buildkite.com/elastic/eland
spec:
type: buildkite-pipeline
owner: group:ml-core
system: buildkite
implementation:
apiVersion: buildkite.elastic.dev/v1
kind: Pipeline
metadata:
name: Eland
description: Eland Python
spec:
pipeline_file: .buildkite/pipeline.yml
repository: elastic/eland
teams:
ml-core: {}
devtools-team: {}
es-docs: {}
everyone:
access_level: READ_ONLY
# yaml-language-server: $schema=https://gist.githubusercontent.com/elasticmachine/988b80dae436cafea07d9a4a460a011d/raw/e57ee3bed7a6f73077a3f55a38e76e40ec87a7cf/rre.schema.json
---
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: eland-release-docker-pipeline
description: Release Docker Artifacts for Eland
links:
- title: Pipeline
url: https://buildkite.com/elastic/eland-release-docker
spec:
type: buildkite-pipeline
owner: group:ml-core
system: buildkite
implementation:
apiVersion: buildkite.elastic.dev/v1
kind: Pipeline
metadata:
name: Eland - Release Docker
description: Release Docker Artifacts for Eland
spec:
pipeline_file: .buildkite/release-docker/pipeline.yml
provider_settings:
trigger_mode: none
repository: elastic/eland
teams:
ml-core: {}
devtools-team: {}
everyone:
access_level: READ_ONLY

View File

@ -5,7 +5,7 @@
# from the environment for the first two. # from the environment for the first two.
SPHINXOPTS ?= SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build SPHINXBUILD ?= sphinx-build
SOURCEDIR = source SOURCEDIR = sphinx
BUILDDIR = build BUILDDIR = build
# Put it first so that "make" without argument is like "make help". # Put it first so that "make" without argument is like "make help".

8
docs/docset.yml Normal file
View File

@ -0,0 +1,8 @@
project: 'Eland Python client'
cross_links:
- docs-content
toc:
- toc: reference
subs:
es: "Elasticsearch"
ml: "machine learning"

View File

@ -7,7 +7,7 @@ REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" ( if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build set SPHINXBUILD=sphinx-build
) )
set SOURCEDIR=source set SOURCEDIR=sphinx
set BUILDDIR=build set BUILDDIR=build
if "%1" == "" goto help if "%1" == "" goto help

View File

@ -0,0 +1,63 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/dataframes.html
---
# Data Frames [dataframes]
`eland.DataFrame` wraps an Elasticsearch index in a Pandas-like API and defers all processing and filtering of data to Elasticsearch instead of your local machine. This means you can process large amounts of data within Elasticsearch from a Jupyter Notebook without overloading your machine.
```python
>>> import eland as ed
>>>
# Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('http://localhost:9200', 'flights')
# eland.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head()
AvgTicketPrice Cancelled ... dayOfWeek timestamp
0 841.265642 False ... 0 2018-01-01 00:00:00
1 882.982662 False ... 0 2018-01-01 18:27:00
2 190.636904 False ... 0 2018-01-01 17:11:14
3 181.694216 True ... 0 2018-01-01 10:33:28
4 730.041778 False ... 0 2018-01-01 05:13:00
[5 rows x 27 columns]
>>> df.info()
<class 'eland.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 AvgTicketPrice 13059 non-null float64
1 Cancelled 13059 non-null bool
2 Carrier 13059 non-null object
...
24 OriginWeather 13059 non-null object
25 dayOfWeek 13059 non-null int64
26 timestamp 13059 non-null datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(5), int64(2), object(17)
memory usage: 80.0 bytes
Elasticsearch storage usage: 5.043 MB
# Filtering of rows using comparisons
>>> df[(df.Carrier=="Kibana Airlines") & (df.AvgTicketPrice > 900.0) & (df.Cancelled == True)].head()
AvgTicketPrice Cancelled ... dayOfWeek timestamp
8 960.869736 True ... 0 2018-01-01 12:09:35
26 975.812632 True ... 0 2018-01-01 15:38:32
311 946.358410 True ... 0 2018-01-01 11:51:12
651 975.383864 True ... 2 2018-01-03 21:13:17
950 907.836523 True ... 2 2018-01-03 05:14:51
[5 rows x 27 columns]
# Running aggregations across an index
>>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std'])
DistanceKilometers AvgTicketPrice
sum 9.261629e+07 8.204365e+06
min 0.000000e+00 1.000205e+02
std 4.578263e+03 2.663867e+02
```

90
docs/reference/index.md Normal file
View File

@ -0,0 +1,90 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/index.html
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/overview.html
navigation_title: Eland
---
# Eland Python client [overview]
Eland is a Python client and toolkit for DataFrames and {{ml}} in {{es}}. Full documentation is available on [Read the Docs](https://eland.readthedocs.io). Source code is available on [GitHub](https://github.com/elastic/eland).
## Compatibility [_compatibility]
* Supports Python 3.9+ and Pandas 1.5
* Supports {{es}} 8+ clusters, recommended 8.16 or later for all features to work. Make sure your Eland major version matches the major version of your Elasticsearch cluster.
The recommended way to set your requirements in your `setup.py` or `requirements.txt` is::
```
# Elasticsearch 8.x
eland>=8,<9
```
```
# Elasticsearch 7.x
eland>=7,<8
```
## Getting Started [_getting_started]
Create a `DataFrame` object connected to an {{es}} cluster running on `http://localhost:9200`:
```python
>>> import eland as ed
>>> df = ed.DataFrame(
... es_client="http://localhost:9200",
... es_index_pattern="flights",
... )
>>> df
AvgTicketPrice Cancelled ... dayOfWeek timestamp
0 841.265642 False ... 0 2018-01-01 00:00:00
1 882.982662 False ... 0 2018-01-01 18:27:00
2 190.636904 False ... 0 2018-01-01 17:11:14
3 181.694216 True ... 0 2018-01-01 10:33:28
4 730.041778 False ... 0 2018-01-01 05:13:00
... ... ... ... ... ...
13054 1080.446279 False ... 6 2018-02-11 20:42:25
13055 646.612941 False ... 6 2018-02-11 01:41:57
13056 997.751876 False ... 6 2018-02-11 04:09:27
13057 1102.814465 False ... 6 2018-02-11 08:28:21
13058 858.144337 False ... 6 2018-02-11 14:54:34
[13059 rows x 27 columns]
```
### Elastic Cloud [_elastic_cloud]
You can also connect Eland to an Elasticsearch instance in Elastic Cloud:
```python
>>> import eland as ed
>>> from elasticsearch import Elasticsearch
# First instantiate an 'Elasticsearch' instance connected to Elastic Cloud
>>> es = Elasticsearch(cloud_id="...", api_key="...")
# then wrap the client in an Eland DataFrame:
>>> df = ed.DataFrame(es, es_index_pattern="flights")
>>> df.head(5)
AvgTicketPrice Cancelled ... dayOfWeek timestamp
0 841.265642 False ... 0 2018-01-01 00:00:00
1 882.982662 False ... 0 2018-01-01 18:27:00
2 190.636904 False ... 0 2018-01-01 17:11:14
3 181.694216 True ... 0 2018-01-01 10:33:28
4 730.041778 False ... 0 2018-01-01 05:13:00
[5 rows x 27 columns]
```
Eland can be used for complex queries and aggregations:
```python
>>> df[df.Carrier != "Kibana Airlines"].groupby("Carrier").mean(numeric_only=False)
AvgTicketPrice Cancelled timestamp
Carrier
ES-Air 630.235816 0.129814 2018-01-21 20:45:00.200000000
JetBeats 627.457373 0.134698 2018-01-21 14:43:18.112400635
Logstash Airways 624.581974 0.125188 2018-01-21 16:14:50.711798340
```

View File

@ -0,0 +1,19 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/installation.html
---
# Installation [installation]
Eland can be installed with [pip](https://pip.pypa.io) from [PyPI](https://pypi.org/project/eland). We recommend [using a virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) when installing with pip:
```sh
$ python -m pip install eland
```
Alternatively, Eland can be installed with [Conda](https://docs.conda.io) from [Conda Forge](https://anaconda.org/conda-forge/eland):
```sh
$ conda install -c conda-forge eland
```

View File

@ -0,0 +1,199 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html
---
# Machine Learning [machine-learning]
## Trained models [ml-trained-models]
Eland allows transforming *some*
[trained models](https://eland.readthedocs.io/en/latest/reference/api/eland.ml.MLModel.import_model.html#parameters) from scikit-learn, XGBoost,
and LightGBM libraries to be serialized and used as an inference model in {{es}}.
```python
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])
>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
es_client="http://localhost:9200",
model_id="xgb-classifier",
model=xgb_model,
feature_names=["f0", "f1", "f2", "f3", "f4"],
)
# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
```
## Natural language processing (NLP) with PyTorch [ml-nlp-pytorch]
::::{important}
You need to install the appropriate version of PyTorch to import an NLP model. Run `python -m pip install 'eland[pytorch]'` to install that version.
::::
For NLP tasks, Eland enables you to import PyTorch models into {{es}}. Use the `eland_import_hub_model` script to download and install supported [transformer models](https://huggingface.co/transformers) from the [Hugging Face model hub](https://huggingface.co/models). For example:
```bash
eland_import_hub_model <authentication> \ <1>
--url http://localhost:9200/ \ <2>
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \ <3>
--task-type ner \ <4>
--start
```
1. Use an authentication method to access your cluster. Refer to [Authentication methods](machine-learning.md#ml-nlp-pytorch-auth).
2. The cluster URL. Alternatively, use `--cloud-id`.
3. Specify the identifier for the model in the Hugging Face model hub.
4. Specify the type of NLP task. Supported values are `fill_mask`, `ner`, `question_answering`, `text_classification`, `text_embedding`, `text_expansion`, `text_similarity` and `zero_shot_classification`.
For more information about the available options, run `eland_import_hub_model` with the `--help` option.
```bash
eland_import_hub_model --help
```
### Import model with Docker [ml-nlp-pytorch-docker]
::::{important}
To use the Docker container, you need to clone the Eland repository: [https://github.com/elastic/eland](https://github.com/elastic/eland)
::::
If you want to use Eland without installing it, you can use the Docker image:
You can use the container interactively:
```bash
docker run -it --rm --network host docker.elastic.co/eland/eland
```
Running installed scripts is also possible without an interactive shell, for example:
```bash
docker run -it --rm docker.elastic.co/eland/eland \
eland_import_hub_model \
--url $ELASTICSEARCH_URL \
--hub-model-id elastic/distilbert-base-uncased-finetuned-conll03-english \
--start
```
Replace the `$ELASTICSEARCH_URL` with the URL for your Elasticsearch cluster. For authentication purposes, include an administrator username and password in the URL in the following format: `https://username:password@host:port`.
### Install models in an air-gapped environment [ml-nlp-pytorch-air-gapped]
You can install models in a restricted or closed network by pointing the `eland_import_hub_model` script to local files.
For an offline install of a Hugging Face model, the model first needs to be cloned locally, Git and [Git Large File Storage](https://git-lfs.com/) are required to be installed in your system.
1. Select a model you want to use from Hugging Face. Refer to the [compatible third party model](docs-content://explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md) list for more information on the supported architectures.
2. Clone the selected model from Hugging Face by using the model URL. For example:
```bash
git clone https://huggingface.co/dslim/bert-base-NER
```
This command results in a local copy of of the model in the directory `bert-base-NER`.
3. Use the `eland_import_hub_model` script with the `--hub-model-id` set to the directory of the cloned model to install it:
```bash
eland_import_hub_model \
--url 'XXXX' \
--hub-model-id /PATH/TO/MODEL \
--task-type ner \
--es-username elastic --es-password XXX \
--es-model-id bert-base-ner
```
If you use the Docker image to run `eland_import_hub_model` you must bind mount the model directory, so the container can read the files:
```bash
docker run --mount type=bind,source=/PATH/TO/MODEL,destination=/model,readonly -it --rm docker.elastic.co/eland/eland \
eland_import_hub_model \
--url 'XXXX' \
--hub-model-id /model \
--task-type ner \
--es-username elastic --es-password XXX \
--es-model-id bert-base-ner
```
Once its uploaded to {{es}}, the model will have the ID specified by `--es-model-id`. If it is not set, the model ID is derived from `--hub-model-id`; spaces and path delimiters are converted to double underscores `__`.
### Connect to Elasticsearch through a proxy [ml-nlp-pytorch-proxy]
Behind the scenes, Eland uses the `requests` Python library, which [allows configuring proxies through an environment variable](https://requests.readthedocs.io/en/latest/user/advanced/#proxies). For example, to use an HTTP proxy to connect to an HTTPS Elasticsearch cluster, you need to set the `HTTPS_PROXY` environment variable when invoking Eland:
```bash
HTTPS_PROXY=http://proxy-host:proxy-port eland_import_hub_model ...
```
If you disabled security on your Elasticsearch cluster, you should use `HTTP_PROXY` instead.
### Authentication methods [ml-nlp-pytorch-auth]
The following authentication options are available when using the import script:
* Elasticsearch username and password authentication (specified with the `-u` and `-p` options):
```bash
eland_import_hub_model -u <username> -p <password> --cloud-id <cloud-id> ...
```
These `-u` and `-p` options also work when you use `--url`.
* Elasticsearch username and password authentication (embedded in the URL):
```bash
eland_import_hub_model --url https://<user>:<password>@<hostname>:<port> ...
```
* Elasticsearch API key authentication:
```bash
eland_import_hub_model --es-api-key <api-key> --url https://<hostname>:<port> ...
```
* HuggingFace Hub access token (for private models):
```bash
eland_import_hub_model --hub-access-token <access-token> ...
```
### TLS/SSL [ml-nlp-pytorch-tls]
The following TLS/SSL options for Elasticsearch are available when using the import script:
* Specify alternate CA bundle to verify the cluster certificate:
```bash
eland_import_hub_model --ca-certs CA_CERTS ...
```
* Disable TLS/SSL verification altogether (strongly discouraged):
```bash
eland_import_hub_model --insecure ...
```

6
docs/reference/toc.yml Normal file
View File

@ -0,0 +1,6 @@
project: 'Eland reference'
toc:
- file: index.md
- file: installation.md
- file: dataframes.md
- file: machine-learning.md

View File

@ -1,7 +1,5 @@
elasticsearch>=7.0.5
pandas==0.25.3
matplotlib matplotlib
pytest>=5.2.1 nbval
git+https://github.com/pandas-dev/pandas-sphinx-theme.git@master sphinx==5.3.0
numpydoc>=0.9.0
nbsphinx nbsphinx
furo

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,882 +0,0 @@
.. _implementation/dataframe_supported:
===============================
pandas.DataFrame supported APIs
===============================
The following table lists both implemented and not implemented methods. If you have need
of an operation that is listed as not implemented, feel free to open an issue on the
http://github/elastic/eland, or give a thumbs up to already created issues. Contributions are
also welcome!
The following table is structured as follows: The first column contains the method name.
The second column is a flag for whether or not there is an implementation in Modin for
the method in the left column. ``Y`` stands for yes, ``N`` stands for no.
https://github.com/adgirish/kaggleScape/blob/master/results/annotResults.csv represents a prioritised list.
+-------------------------+-------+------------------------------------------------+
| Method | Count | Notes |
+-------------------------+-------+------------------------------------------------+
| pd.read_csv | 1422 | y |
+-------------------------+-------+------------------------------------------------+
| pd.DataFrame | 886 | y |
+-------------------------+-------+------------------------------------------------+
| df.append | 792 | n |
+-------------------------+-------+------------------------------------------------+
| df.mean | 783 | y |
+-------------------------+-------+------------------------------------------------+
| df.head | 783 | y |
+-------------------------+-------+------------------------------------------------+
| df.drop | 761 | y |
+-------------------------+-------+------------------------------------------------+
| df.sum | 755 | y |
+-------------------------+-------+------------------------------------------------+
| df.to_csv | 693 | y |
+-------------------------+-------+------------------------------------------------+
| df.get | 669 | y |
+-------------------------+-------+------------------------------------------------+
| df.mode | 653 | n |
+-------------------------+-------+------------------------------------------------+
| df.astype | 649 | n |
+-------------------------+-------+------------------------------------------------+
| df.sub | 637 | n |
+-------------------------+-------+------------------------------------------------+
| pd.concat | 582 | n |
+-------------------------+-------+------------------------------------------------+
| df.apply | 577 | n |
+-------------------------+-------+------------------------------------------------+
| df.groupby | 557 | n |
+-------------------------+-------+------------------------------------------------+
| df.join | 544 | n |
+-------------------------+-------+------------------------------------------------+
| df.fillna | 543 | n |
+-------------------------+-------+------------------------------------------------+
| df.max | 508 | y |
+-------------------------+-------+------------------------------------------------+
| df.reset_index | 434 | n |
+-------------------------+-------+------------------------------------------------+
| pd.unique | 433 | n |
+-------------------------+-------+------------------------------------------------+
| df.le | 405 | n |
+-------------------------+-------+------------------------------------------------+
| df.count | 399 | y |
+-------------------------+-------+------------------------------------------------+
| pd.value_counts | 397 | y |
+-------------------------+-------+------------------------------------------------+
| df.sort_values | 390 | n |
+-------------------------+-------+------------------------------------------------+
| df.transform | 387 | n |
+-------------------------+-------+------------------------------------------------+
| df.merge | 376 | n |
+-------------------------+-------+------------------------------------------------+
| df.add | 346 | n |
+-------------------------+-------+------------------------------------------------+
| df.isnull | 338 | n |
+-------------------------+-------+------------------------------------------------+
| df.min | 321 | y |
+-------------------------+-------+------------------------------------------------+
| df.copy | 314 | n |
+-------------------------+-------+------------------------------------------------+
| df.replace | 300 | n |
+-------------------------+-------+------------------------------------------------+
| df.std | 261 | n |
+-------------------------+-------+------------------------------------------------+
| df.hist | 246 | y |
+-------------------------+-------+------------------------------------------------+
| df.filter | 234 | n |
+-------------------------+-------+------------------------------------------------+
| df.describe | 220 | y |
+-------------------------+-------+------------------------------------------------+
| df.ne | 218 | n |
+-------------------------+-------+------------------------------------------------+
| df.corr | 217 | n |
+-------------------------+-------+------------------------------------------------+
| df.median | 217 | n |
+-------------------------+-------+------------------------------------------------+
| df.items | 212 | n |
+-------------------------+-------+------------------------------------------------+
| pd.to_datetime | 204 | n |
+-------------------------+-------+------------------------------------------------+
| df.isin | 203 | n |
+-------------------------+-------+------------------------------------------------+
| df.dropna | 195 | n |
+-------------------------+-------+------------------------------------------------+
| pd.get_dummies | 190 | n |
+-------------------------+-------+------------------------------------------------+
| df.rename | 185 | n |
+-------------------------+-------+------------------------------------------------+
| df.info | 180 | y |
+-------------------------+-------+------------------------------------------------+
| df.set_index | 166 | n |
+-------------------------+-------+------------------------------------------------+
| df.keys | 159 | y |
+-------------------------+-------+------------------------------------------------+
| df.sample | 155 | n |
+-------------------------+-------+------------------------------------------------+
| df.agg | 140 | y |
+-------------------------+-------+------------------------------------------------+
| df.where | 138 | n |
+-------------------------+-------+------------------------------------------------+
| df.boxplot | 134 | n |
+-------------------------+-------+------------------------------------------------+
| df.clip | 116 | n |
+-------------------------+-------+------------------------------------------------+
| df.round | 116 | n |
+-------------------------+-------+------------------------------------------------+
| df.abs | 101 | n |
+-------------------------+-------+------------------------------------------------+
| df.stack | 97 | n |
+-------------------------+-------+------------------------------------------------+
| df.tail | 94 | y |
+-------------------------+-------+------------------------------------------------+
| df.update | 92 | n |
+-------------------------+-------+------------------------------------------------+
| df.iterrows | 90 | n |
+-------------------------+-------+------------------------------------------------+
| df.transpose | 87 | n |
+-------------------------+-------+------------------------------------------------+
| df.any | 85 | n |
+-------------------------+-------+------------------------------------------------+
| df.pipe | 80 | n |
+-------------------------+-------+------------------------------------------------+
| pd.eval | 73 | n |
+-------------------------+-------+------------------------------------------------+
| df.eval | 73 | n |
+-------------------------+-------+------------------------------------------------+
| pd.read_json | 72 | n |
+-------------------------+-------+------------------------------------------------+
| df.nunique | 70 | y |
+-------------------------+-------+------------------------------------------------+
| df.pivot | 70 | n |
+-------------------------+-------+------------------------------------------------+
| df.select | 68 | n |
+-------------------------+-------+------------------------------------------------+
| df.as_matrix | 67 | n |
+-------------------------+-------+------------------------------------------------+
| df.notnull | 66 | n |
+-------------------------+-------+------------------------------------------------+
| df.cumsum | 66 | n |
+-------------------------+-------+------------------------------------------------+
| df.prod | 64 | n |
+-------------------------+-------+------------------------------------------------+
| df.unstack | 64 | n |
+-------------------------+-------+------------------------------------------------+
| df.drop_duplicates | 63 | n |
+-------------------------+-------+------------------------------------------------+
| df.div | 63 | n |
+-------------------------+-------+------------------------------------------------+
| pd.crosstab | 59 | n |
+-------------------------+-------+------------------------------------------------+
| df.select_dtypes | 57 | y |
+-------------------------+-------+------------------------------------------------+
| df.pow | 56 | n |
+-------------------------+-------+------------------------------------------------+
| df.sort_index | 56 | n |
+-------------------------+-------+------------------------------------------------+
| df.product | 52 | n |
+-------------------------+-------+------------------------------------------------+
| df.isna | 51 | n |
+-------------------------+-------+------------------------------------------------+
| df.dot | 46 | n |
+-------------------------+-------+------------------------------------------------+
| pd.cut | 45 | n |
+-------------------------+-------+------------------------------------------------+
| df.bool | 44 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_dict | 44 | n |
+-------------------------+-------+------------------------------------------------+
| df.diff | 44 | n |
+-------------------------+-------+------------------------------------------------+
| df.insert | 44 | n |
+-------------------------+-------+------------------------------------------------+
| df.pop | 44 | n |
+-------------------------+-------+------------------------------------------------+
| df.query | 43 | y |
+-------------------------+-------+------------------------------------------------+
| df.var | 43 | n |
+-------------------------+-------+------------------------------------------------+
| df.__init__ | 41 | y |
+-------------------------+-------+------------------------------------------------+
| pd.to_numeric | 39 | n |
+-------------------------+-------+------------------------------------------------+
| df.squeeze | 39 | n |
+-------------------------+-------+------------------------------------------------+
| df.ge | 37 | n |
+-------------------------+-------+------------------------------------------------+
| df.quantile | 37 | n |
+-------------------------+-------+------------------------------------------------+
| df.reindex | 37 | n |
+-------------------------+-------+------------------------------------------------+
| df.rolling | 35 | n |
+-------------------------+-------+------------------------------------------------+
| pd.factorize | 32 | n |
+-------------------------+-------+------------------------------------------------+
| pd.melt | 31 | n |
+-------------------------+-------+------------------------------------------------+
| df.melt | 31 | n |
+-------------------------+-------+------------------------------------------------+
| df.rank | 31 | n |
+-------------------------+-------+------------------------------------------------+
| pd.read_table | 30 | n |
+-------------------------+-------+------------------------------------------------+
| pd.pivot_table | 30 | n |
+-------------------------+-------+------------------------------------------------+
| df.idxmax | 30 | n |
+-------------------------+-------+------------------------------------------------+
| pd.test | 29 | n |
+-------------------------+-------+------------------------------------------------+
| df.iteritems | 29 | n |
+-------------------------+-------+------------------------------------------------+
| df.shift | 28 | n |
+-------------------------+-------+------------------------------------------------+
| df.mul | 28 | n |
+-------------------------+-------+------------------------------------------------+
| pd.qcut | 25 | n |
+-------------------------+-------+------------------------------------------------+
| df.set_value | 25 | n |
+-------------------------+-------+------------------------------------------------+
| df.all | 24 | n |
+-------------------------+-------+------------------------------------------------+
| df.skew | 24 | n |
+-------------------------+-------+------------------------------------------------+
| df.aggregate | 23 | y |
+-------------------------+-------+------------------------------------------------+
| pd.match | 22 | n |
+-------------------------+-------+------------------------------------------------+
| df.nlargest | 22 | n |
+-------------------------+-------+------------------------------------------------+
| df.multiply | 21 | n |
+-------------------------+-------+------------------------------------------------+
| df.set_axis | 19 | n |
+-------------------------+-------+------------------------------------------------+
| df.eq | 18 | n |
+-------------------------+-------+------------------------------------------------+
| df.resample | 18 | n |
+-------------------------+-------+------------------------------------------------+
| pd.read_sql | 17 | n |
+-------------------------+-------+------------------------------------------------+
| df.duplicated | 16 | n |
+-------------------------+-------+------------------------------------------------+
| pd.date_range | 16 | n |
+-------------------------+-------+------------------------------------------------+
| df.interpolate | 15 | n |
+-------------------------+-------+------------------------------------------------+
| df.memory_usage | 15 | n |
+-------------------------+-------+------------------------------------------------+
| df.divide | 14 | n |
+-------------------------+-------+------------------------------------------------+
| df.cov | 13 | n |
+-------------------------+-------+------------------------------------------------+
| df.assign | 12 | n |
+-------------------------+-------+------------------------------------------------+
| df.subtract | 12 | n |
+-------------------------+-------+------------------------------------------------+
| pd.read_pickle | 11 | n |
+-------------------------+-------+------------------------------------------------+
| df.applymap | 11 | n |
+-------------------------+-------+------------------------------------------------+
| df.first | 11 | n |
+-------------------------+-------+------------------------------------------------+
| df.kurt | 10 | n |
+-------------------------+-------+------------------------------------------------+
| df.truncate | 10 | n |
+-------------------------+-------+------------------------------------------------+
| df.get_value | 9 | n |
+-------------------------+-------+------------------------------------------------+
| pd.read_hdf | 9 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_html | 9 | y |
+-------------------------+-------+------------------------------------------------+
| pd.read_sql_query | 9 | n |
+-------------------------+-------+------------------------------------------------+
| df.take | 8 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_pickle | 7 | n |
+-------------------------+-------+------------------------------------------------+
| df.itertuples | 7 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_string | 7 | y |
+-------------------------+-------+------------------------------------------------+
| df.last | 7 | n |
+-------------------------+-------+------------------------------------------------+
| df.sem | 7 | n |
+-------------------------+-------+------------------------------------------------+
| pd.to_pickle | 7 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_json | 7 | n |
+-------------------------+-------+------------------------------------------------+
| df.idxmin | 7 | n |
+-------------------------+-------+------------------------------------------------+
| df.xs | 6 | n |
+-------------------------+-------+------------------------------------------------+
| df.combine | 6 | n |
+-------------------------+-------+------------------------------------------------+
| pd.rolling_mean | 6 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_period | 6 | n |
+-------------------------+-------+------------------------------------------------+
| df.convert_objects | 5 | n |
+-------------------------+-------+------------------------------------------------+
| df.mask | 4 | n |
+-------------------------+-------+------------------------------------------------+
| df.pct_change | 4 | n |
+-------------------------+-------+------------------------------------------------+
| df.add_prefix | 4 | n |
+-------------------------+-------+------------------------------------------------+
| pd.read_excel | 4 | n |
+-------------------------+-------+------------------------------------------------+
| pd.rolling_std | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_records | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.corrwith | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.swapaxes | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.__iter__ | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_sql | 3 | n |
+-------------------------+-------+------------------------------------------------+
| pd.read_feather | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_feather | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.__len__ | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.kurtosis | 3 | n |
+-------------------------+-------+------------------------------------------------+
| df.mod | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_sparse | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.get_values | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.__eq__ | 2 | n |
+-------------------------+-------+------------------------------------------------+
| pd.bdate_range | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.get_dtype_counts | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.combine_first | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df._get_numeric_data | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.nsmallest | 2 | n |
+-------------------------+-------+------------------------------------------------+
| pd.scatter_matrix | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.rename_axis | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.__setstate__ | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.cumprod | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.__getstate__ | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.equals | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.__getitem__ | 2 | y |
+-------------------------+-------+------------------------------------------------+
| df.clip_upper | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.floordiv | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_excel | 2 | n |
+-------------------------+-------+------------------------------------------------+
| df.reindex_axis | 1 | n |
+-------------------------+-------+------------------------------------------------+
| pd.to_timedelta | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.ewm | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.tz_localize | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.tz_convert | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_hdf | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.lookup | 1 | n |
+-------------------------+-------+------------------------------------------------+
| pd.merge_ordered | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.swaplevel | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.first_valid_index | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.lt | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.add_suffix | 1 | n |
+-------------------------+-------+------------------------------------------------+
| pd.rolling_median | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_dense | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.mad | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.align | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.__copy__ | 1 | n |
+-------------------------+-------+------------------------------------------------+
| pd.set_eng_float_format | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.add_suffix | 1 | n |
+-------------------------+-------+------------------------------------------------+
| pd.rolling_median | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.to_dense | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.mad | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.align | 1 | n |
+-------------------------+-------+------------------------------------------------+
| df.__copy__ | 1 | n |
+-------------------------+-------+------------------------------------------------+
| pd.set_eng_float_format | 1 | n |
+-------------------------+-------+------------------------------------------------+
+---------------------------+---------------------------------+----------------------------------------------------+
| DataFrame method | Eland Implementation? (Y/N/P/D) | Notes for Current implementation |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``T`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``abs`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``add`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``add_prefix`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``add_suffix`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``agg`` | Y | |
| ``aggregate`` | | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``align`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``all`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``any`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``append`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``apply`` | N | See ``agg`` |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``applymap`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``as_blocks`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``as_matrix`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``asfreq`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``asof`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``assign`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``astype`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``at`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``at_time`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``axes`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``between_time`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``bfill`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``blocks`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``bool`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``boxplot`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``clip`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``clip_lower`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``clip_upper`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``combine`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``combine_first`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``compound`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``consolidate`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``convert_objects`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``copy`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``corr`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``corrwith`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``count`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``cov`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``cummax`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``cummin`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``cumprod`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``cumsum`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``describe`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``diff`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``div`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``divide`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``dot`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``drop`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``drop_duplicates`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``dropna`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``dtypes`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``duplicated`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``empty`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``eq`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``equals`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``eval`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``ewm`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``expanding`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``ffill`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``fillna`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``filter`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``first`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``first_valid_index`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``floordiv`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``from_csv`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``from_dict`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``from_items`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``from_records`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``ftypes`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``ge`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``get`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``get_dtype_counts`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``get_ftype_counts`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``get_value`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``get_values`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``groupby`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``gt`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``head`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``hist`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``iat`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``idxmax`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``idxmin`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``iloc`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``infer_objects`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``info`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``insert`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``interpolate`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``is_copy`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``isin`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``isna`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``isnull`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``items`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``iteritems`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``iterrows`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``itertuples`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``ix`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``join`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``keys`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``kurt`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``kurtosis`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``last`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``last_valid_index`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``le`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``loc`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``lookup`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``lt`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``mad`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``mask`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``max`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``mean`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``median`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``melt`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``memory_usage`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``merge`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``min`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``mod`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``mode`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``mul`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``multiply`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``ndim`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``ne`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``nlargest`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``notna`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``notnull`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``nsmallest`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``nunique`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``pct_change`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``pipe`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``pivot`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``pivot_table`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``plot`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``pop`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``pow`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``prod`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``product`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``quantile`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``query`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``radd`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rank`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rdiv`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``reindex`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``reindex_axis`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``reindex_like`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rename`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rename_axis`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``reorder_levels`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``replace`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``resample`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``reset_index`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rfloordiv`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rmod`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rmul`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rolling`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``round`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rpow`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rsub`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``rtruediv`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``sample`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``select`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``select_dtypes`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``sem`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``set_axis`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``set_index`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``set_value`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``shape`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``shift`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``size`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``skew`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``slice_shift`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``sort_index`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``sort_values`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``sortlevel`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``squeeze`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``stack`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``std`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``style`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``sub`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``subtract`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``sum`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``swapaxes`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``swaplevel`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``tail`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``take`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_clipboard`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_csv`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_dense`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_dict`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_excel`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_feather`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_gbq`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_hdf`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_html`` | Y | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_json`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_latex`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_msgpack`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_panel`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_parquet`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_period`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_pickle`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_records`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_sparse`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_sql`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_stata`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_string`` | Y | Default sets `max_rows=60` |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_timestamp`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``to_xarray`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``transform`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``transpose`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``truediv`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``truncate`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``tshift`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``tz_convert`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``tz_localize`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``unstack`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``update`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``values`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``var`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``where`` | N | |
+---------------------------+---------------------------------+----------------------------------------------------+
| ``xs`` | N | Deprecated in pandas |
+---------------------------+---------------------------------+----------------------------------------------------+

View File

@ -1,11 +0,0 @@
.. _implementation:
====================
Implementation Notes
====================
.. toctree::
:maxdepth: 2
details.rst
dataframe_supported.rst

View File

@ -1,52 +0,0 @@
.. eland documentation master file, created by
.. module:: eland
****************************************************************
eland: pandas-like data analysis toolkit backed by Elasticsearch
****************************************************************
**Date**: |today| **Version**: |version|
**Useful links**:
`Source Repository <https://github.com/elastic/eland>`__ |
`Issues & Ideas <https://github.com/elastic/eland/issues>`__ |
`Q&A Support <https://discuss.elastic.co>`__ |
:mod:`eland` is an open source, Apache2-licensed elasticsearch Python client to analyse, explore and manipulate data that resides in elasticsearch.
Where possible the package uses existing Python APIs and data structures to make it easy to switch between Numpy, Pandas, Scikit-learn to their elasticsearch powered equivalents.
In general, the data resides in elasticsearch and not in memory, which allows eland to access large datasets stored in elasticsearch.
.. toctree::
:maxdepth: 2
:hidden:
reference/index
implementation/index
development/index
examples/index
* :doc:`reference/index`
* :doc:`reference/io`
* :doc:`reference/general_utility_functions`
* :doc:`reference/dataframe`
* :doc:`reference/series`
* :doc:`reference/index`
* :doc:`reference/indexing`
* :doc:`implementation/index`
* :doc:`implementation/details`
* :doc:`implementation/dataframe_supported`
* :doc:`development/index`
* :doc:`development/contributing`
* :doc:`examples/index`
* :doc:`examples/demo_notebook`
* :doc:`examples/online_retail_analysis`

Binary file not shown.

Before

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 45 KiB

View File

@ -1,6 +0,0 @@
eland.DataFrame.agg
===================
.. currentmodule:: eland
.. automethod:: DataFrame.agg

View File

@ -1,6 +0,0 @@
eland.DataFrame.aggregate
=========================
.. currentmodule:: eland
.. automethod:: DataFrame.aggregate

View File

@ -1,6 +0,0 @@
eland.DataFrame.columns
=======================
.. currentmodule:: eland
.. autoattribute:: DataFrame.columns

View File

@ -1,6 +0,0 @@
eland.DataFrame.describe
========================
.. currentmodule:: eland
.. automethod:: DataFrame.describe

View File

@ -1,6 +0,0 @@
eland.DataFrame.drop
====================
.. currentmodule:: eland
.. automethod:: DataFrame.drop

View File

@ -1,6 +0,0 @@
eland.DataFrame.dtypes
======================
.. currentmodule:: eland
.. autoattribute:: DataFrame.dtypes

View File

@ -1,6 +0,0 @@
eland.DataFrame.empty
=====================
.. currentmodule:: eland
.. autoattribute:: DataFrame.empty

View File

@ -1,6 +0,0 @@
eland.DataFrame.get
===================
.. currentmodule:: eland
.. automethod:: DataFrame.get

View File

@ -1,6 +0,0 @@
eland.DataFrame.head
====================
.. currentmodule:: eland
.. automethod:: DataFrame.head

View File

@ -1,8 +0,0 @@
eland.DataFrame.hist
====================
.. currentmodule:: eland
.. automethod:: DataFrame.hist
.. image:: eland-DataFrame-hist-1.png

View File

@ -1,6 +0,0 @@
eland.DataFrame.index
=====================
.. currentmodule:: eland
.. autoattribute:: DataFrame.index

View File

@ -1,6 +0,0 @@
eland.DataFrame.info
====================
.. currentmodule:: eland
.. automethod:: DataFrame.info

View File

@ -1,6 +0,0 @@
eland.DataFrame.info_es
=======================
.. currentmodule:: eland
.. automethod:: DataFrame.info_es

View File

@ -1,6 +0,0 @@
eland.DataFrame.keys
====================
.. currentmodule:: eland
.. automethod:: DataFrame.keys

View File

@ -1,6 +0,0 @@
eland.DataFrame.max
===================
.. currentmodule:: eland
.. automethod:: DataFrame.max

View File

@ -1,6 +0,0 @@
eland.DataFrame.mean
====================
.. currentmodule:: eland
.. automethod:: DataFrame.mean

View File

@ -1,6 +0,0 @@
eland.DataFrame.min
===================
.. currentmodule:: eland
.. automethod:: DataFrame.min

View File

@ -1,6 +0,0 @@
eland.DataFrame.nunique
=======================
.. currentmodule:: eland
.. automethod:: DataFrame.nunique

View File

@ -1,6 +0,0 @@
eland.DataFrame.query
=====================
.. currentmodule:: eland
.. automethod:: DataFrame.query

View File

@ -1,18 +0,0 @@
eland.DataFrame
================
.. currentmodule:: eland
.. autoclass:: DataFrame
..
HACK -- the point here is that we don't want this to appear in the output, but the autosummary should still generate the pages.
.. autosummary::
:toctree:
DataFrame.abs
DataFrame.add

View File

@ -1,6 +0,0 @@
eland.DataFrame.select_dtypes
=============================
.. currentmodule:: eland
.. automethod:: DataFrame.select_dtypes

View File

@ -1,6 +0,0 @@
eland.DataFrame.shape
=====================
.. currentmodule:: eland
.. autoattribute:: DataFrame.shape

View File

@ -1,6 +0,0 @@
eland.DataFrame.sum
===================
.. currentmodule:: eland
.. automethod:: DataFrame.sum

View File

@ -1,6 +0,0 @@
eland.DataFrame.tail
====================
.. currentmodule:: eland
.. automethod:: DataFrame.tail

View File

@ -1,6 +0,0 @@
eland.DataFrame.to_csv
======================
.. currentmodule:: eland
.. automethod:: DataFrame.to_csv

View File

@ -1,6 +0,0 @@
eland.DataFrame.to_html
=======================
.. currentmodule:: eland
.. automethod:: DataFrame.to_html

View File

@ -1,6 +0,0 @@
eland.DataFrame.to_numpy
========================
.. currentmodule:: eland
.. automethod:: DataFrame.to_numpy

View File

@ -1,6 +0,0 @@
eland.DataFrame.to_string
=========================
.. currentmodule:: eland
.. automethod:: DataFrame.to_string

View File

@ -1,6 +0,0 @@
eland.DataFrame.values
======================
.. currentmodule:: eland
.. autoattribute:: DataFrame.values

View File

@ -1,6 +0,0 @@
eland.Index
===========
.. currentmodule:: eland
.. autoclass:: Index

View File

@ -1,6 +0,0 @@
eland.Series.add
================
.. currentmodule:: eland
.. automethod:: Series.add

View File

@ -1,6 +0,0 @@
eland.Series.describe
=====================
.. currentmodule:: eland
.. automethod:: Series.describe

View File

@ -1,6 +0,0 @@
eland.Series.div
================
.. currentmodule:: eland
.. automethod:: Series.div

View File

@ -1,6 +0,0 @@
eland.Series.empty
==================
.. currentmodule:: eland
.. autoattribute:: Series.empty

View File

@ -1,6 +0,0 @@
eland.Series.floordiv
=====================
.. currentmodule:: eland
.. automethod:: Series.floordiv

View File

@ -1,6 +0,0 @@
eland.Series.head
=================
.. currentmodule:: eland
.. automethod:: Series.head

View File

@ -1,8 +0,0 @@
eland.Series.hist
====================
.. currentmodule:: eland
.. automethod:: Series.hist
.. image:: eland-Series-hist-1.png

View File

@ -1,6 +0,0 @@
eland.Series.index
==================
.. currentmodule:: eland
.. autoattribute:: Series.index

View File

@ -1,6 +0,0 @@
eland.Series.info_es
====================
.. currentmodule:: eland
.. automethod:: Series.info_es

View File

@ -1,6 +0,0 @@
eland.Series.max
================
.. currentmodule:: eland
.. automethod:: Series.max

View File

@ -1,6 +0,0 @@
eland.Series.mean
=================
.. currentmodule:: eland
.. automethod:: Series.mean

View File

@ -1,6 +0,0 @@
eland.Series.min
================
.. currentmodule:: eland
.. automethod:: Series.min

View File

@ -1,6 +0,0 @@
eland.Series.mod
================
.. currentmodule:: eland
.. automethod:: Series.mod

View File

@ -1,6 +0,0 @@
eland.Series.mul
================
.. currentmodule:: eland
.. automethod:: Series.mul

View File

@ -1,6 +0,0 @@
eland.Series.name
=================
.. currentmodule:: eland
.. autoattribute:: Series.name

View File

@ -1,6 +0,0 @@
eland.Series.nunique
====================
.. currentmodule:: eland
.. automethod:: Series.nunique

View File

@ -1,6 +0,0 @@
eland.Series.pow
================
.. currentmodule:: eland
.. automethod:: Series.pow

View File

@ -1,6 +0,0 @@
eland.Series.radd
=================
.. currentmodule:: eland
.. automethod:: Series.radd

Some files were not shown because too many files have changed in this diff Show More