eland/docs/guide/overview.asciidoc
2021-01-12 10:26:01 -06:00

92 lines
3.5 KiB
Plaintext

[[overview]]
== Overview
Eland is a Python client and toolkit for DataFrames and {ml} in {es}.
Full documentation is available on https://eland.readthedocs.io[Read the Docs].
Source code is available on https://github.com/elastic/eland[GitHub].
[discrete]
=== Compatibility
The library is compatible with Python 3.6 and later and all
{es} versions since `7.6.x` but you **have to use a matching major version**:
The recommended way to set your requirements in your `setup.py` or
`requirements.txt` is::
# Elasticsearch 7.x
eland>=7,<8
Because Eland uses some experimental APIs for {ml} it
is also recommended to install the same major and minor for `elasticsearch-py`
as your cluster. For example if your cluster is v7.8.1 you would install
like so::
$ python -m pip install 'eland>=7,<8' 'elasticsearch>=7.8,<7.9'
[discrete]
=== Getting Started
Create a `DataFrame` object connected to an {es} cluster running on `localhost:9200`:
[source,python]
------------------------------------
>>> import eland as ed
>>> df = ed.DataFrame(
... es_client="localhost:9200",
... es_index_pattern="flights",
... )
>>> df
AvgTicketPrice Cancelled ... dayOfWeek timestamp
0 841.265642 False ... 0 2018-01-01 00:00:00
1 882.982662 False ... 0 2018-01-01 18:27:00
2 190.636904 False ... 0 2018-01-01 17:11:14
3 181.694216 True ... 0 2018-01-01 10:33:28
4 730.041778 False ... 0 2018-01-01 05:13:00
... ... ... ... ... ...
13054 1080.446279 False ... 6 2018-02-11 20:42:25
13055 646.612941 False ... 6 2018-02-11 01:41:57
13056 997.751876 False ... 6 2018-02-11 04:09:27
13057 1102.814465 False ... 6 2018-02-11 08:28:21
13058 858.144337 False ... 6 2018-02-11 14:54:34
[13059 rows x 27 columns]
------------------------------------
[discrete]
==== Elastic Cloud
You can also connect Eland to an Elasticsearch instance in Elastic Cloud:
[source,python]
------------------------------------
>>> import eland as ed
>>> from elasticsearch import Elasticsearch
# First instantiate an 'Elasticsearch' instance connected to Elastic Cloud
>>> es = Elasticsearch(cloud_id="...", api_key=("...", "..."))
# then wrap the client in an Eland DataFrame:
>>> df = ed.DataFrame(es, es_index_pattern="flights")
>>> df.head(5)
AvgTicketPrice Cancelled ... dayOfWeek timestamp
0 841.265642 False ... 0 2018-01-01 00:00:00
1 882.982662 False ... 0 2018-01-01 18:27:00
2 190.636904 False ... 0 2018-01-01 17:11:14
3 181.694216 True ... 0 2018-01-01 10:33:28
4 730.041778 False ... 0 2018-01-01 05:13:00
[5 rows x 27 columns]
------------------------------------
Eland can be used for complex queries and aggregations:
[source,python]
------------------------------------
>>> df[df.Carrier != "Kibana Airlines"].groupby("Carrier").mean(numeric_only=False)
AvgTicketPrice Cancelled timestamp
Carrier
ES-Air 630.235816 0.129814 2018-01-21 20:45:00.200000000
JetBeats 627.457373 0.134698 2018-01-21 14:43:18.112400635
Logstash Airways 624.581974 0.125188 2018-01-21 16:14:50.711798340
------------------------------------