mirror of
https://github.com/elastic/eland.git
synced 2025-07-11 00:02:14 +08:00
89 lines
3.3 KiB
Plaintext
89 lines
3.3 KiB
Plaintext
[[overview]]
|
|
== Overview
|
|
|
|
Eland is a Python client and toolkit for DataFrames and {ml} in {es}.
|
|
Full documentation is available on https://eland.readthedocs.io[Read the Docs].
|
|
Source code is available on https://github.com/elastic/eland[GitHub].
|
|
|
|
[discrete]
|
|
=== Compatibility
|
|
|
|
- Supports Python 3.7+ and Pandas 1.3
|
|
- Supports {es} clusters that are 7.11+, recommended 7.14 or later for all features to work.
|
|
Make sure your Eland major version matches the major version of your Elasticsearch cluster.
|
|
|
|
The recommended way to set your requirements in your `setup.py` or
|
|
`requirements.txt` is::
|
|
|
|
# Elasticsearch 8.x
|
|
eland>=8,<9
|
|
|
|
# Elasticsearch 7.x
|
|
eland>=7,<8
|
|
|
|
[discrete]
|
|
=== Getting Started
|
|
|
|
Create a `DataFrame` object connected to an {es} cluster running on `http://localhost:9200`:
|
|
|
|
[source,python]
|
|
------------------------------------
|
|
>>> import eland as ed
|
|
>>> df = ed.DataFrame(
|
|
... es_client="http://localhost:9200",
|
|
... es_index_pattern="flights",
|
|
... )
|
|
>>> df
|
|
AvgTicketPrice Cancelled ... dayOfWeek timestamp
|
|
0 841.265642 False ... 0 2018-01-01 00:00:00
|
|
1 882.982662 False ... 0 2018-01-01 18:27:00
|
|
2 190.636904 False ... 0 2018-01-01 17:11:14
|
|
3 181.694216 True ... 0 2018-01-01 10:33:28
|
|
4 730.041778 False ... 0 2018-01-01 05:13:00
|
|
... ... ... ... ... ...
|
|
13054 1080.446279 False ... 6 2018-02-11 20:42:25
|
|
13055 646.612941 False ... 6 2018-02-11 01:41:57
|
|
13056 997.751876 False ... 6 2018-02-11 04:09:27
|
|
13057 1102.814465 False ... 6 2018-02-11 08:28:21
|
|
13058 858.144337 False ... 6 2018-02-11 14:54:34
|
|
|
|
[13059 rows x 27 columns]
|
|
------------------------------------
|
|
|
|
[discrete]
|
|
==== Elastic Cloud
|
|
|
|
You can also connect Eland to an Elasticsearch instance in Elastic Cloud:
|
|
|
|
[source,python]
|
|
------------------------------------
|
|
>>> import eland as ed
|
|
>>> from elasticsearch import Elasticsearch
|
|
|
|
# First instantiate an 'Elasticsearch' instance connected to Elastic Cloud
|
|
>>> es = Elasticsearch(cloud_id="...", api_key=("...", "..."))
|
|
|
|
# then wrap the client in an Eland DataFrame:
|
|
>>> df = ed.DataFrame(es, es_index_pattern="flights")
|
|
>>> df.head(5)
|
|
AvgTicketPrice Cancelled ... dayOfWeek timestamp
|
|
0 841.265642 False ... 0 2018-01-01 00:00:00
|
|
1 882.982662 False ... 0 2018-01-01 18:27:00
|
|
2 190.636904 False ... 0 2018-01-01 17:11:14
|
|
3 181.694216 True ... 0 2018-01-01 10:33:28
|
|
4 730.041778 False ... 0 2018-01-01 05:13:00
|
|
[5 rows x 27 columns]
|
|
------------------------------------
|
|
|
|
Eland can be used for complex queries and aggregations:
|
|
|
|
[source,python]
|
|
------------------------------------
|
|
>>> df[df.Carrier != "Kibana Airlines"].groupby("Carrier").mean(numeric_only=False)
|
|
AvgTicketPrice Cancelled timestamp
|
|
Carrier
|
|
ES-Air 630.235816 0.129814 2018-01-21 20:45:00.200000000
|
|
JetBeats 627.457373 0.134698 2018-01-21 14:43:18.112400635
|
|
Logstash Airways 624.581974 0.125188 2018-01-21 16:14:50.711798340
|
|
------------------------------------
|