eland/docs/guide/overview.asciidoc
2021-12-09 15:01:26 -06:00

85 lines
3.2 KiB
Plaintext

[[overview]]
== Overview
Eland is a Python client and toolkit for DataFrames and {ml} in {es}.
Full documentation is available on https://eland.readthedocs.io[Read the Docs].
Source code is available on https://github.com/elastic/eland[GitHub].
[discrete]
=== Compatibility
- Supports Python 3.7+ and Pandas 1.3
- Supports {es} clusters that are 7.11+, recommended 7.14 or later for all features to work.
The recommended way to set your requirements in your `setup.py` or
`requirements.txt` is::
# Elasticsearch 7.x
eland>=7,<8
[discrete]
=== Getting Started
Create a `DataFrame` object connected to an {es} cluster running on `http://localhost:9200`:
[source,python]
------------------------------------
>>> import eland as ed
>>> df = ed.DataFrame(
... es_client="http://localhost:9200",
... es_index_pattern="flights",
... )
>>> df
AvgTicketPrice Cancelled ... dayOfWeek timestamp
0 841.265642 False ... 0 2018-01-01 00:00:00
1 882.982662 False ... 0 2018-01-01 18:27:00
2 190.636904 False ... 0 2018-01-01 17:11:14
3 181.694216 True ... 0 2018-01-01 10:33:28
4 730.041778 False ... 0 2018-01-01 05:13:00
... ... ... ... ... ...
13054 1080.446279 False ... 6 2018-02-11 20:42:25
13055 646.612941 False ... 6 2018-02-11 01:41:57
13056 997.751876 False ... 6 2018-02-11 04:09:27
13057 1102.814465 False ... 6 2018-02-11 08:28:21
13058 858.144337 False ... 6 2018-02-11 14:54:34
[13059 rows x 27 columns]
------------------------------------
[discrete]
==== Elastic Cloud
You can also connect Eland to an Elasticsearch instance in Elastic Cloud:
[source,python]
------------------------------------
>>> import eland as ed
>>> from elasticsearch import Elasticsearch
# First instantiate an 'Elasticsearch' instance connected to Elastic Cloud
>>> es = Elasticsearch(cloud_id="...", api_key=("...", "..."))
# then wrap the client in an Eland DataFrame:
>>> df = ed.DataFrame(es, es_index_pattern="flights")
>>> df.head(5)
AvgTicketPrice Cancelled ... dayOfWeek timestamp
0 841.265642 False ... 0 2018-01-01 00:00:00
1 882.982662 False ... 0 2018-01-01 18:27:00
2 190.636904 False ... 0 2018-01-01 17:11:14
3 181.694216 True ... 0 2018-01-01 10:33:28
4 730.041778 False ... 0 2018-01-01 05:13:00
[5 rows x 27 columns]
------------------------------------
Eland can be used for complex queries and aggregations:
[source,python]
------------------------------------
>>> df[df.Carrier != "Kibana Airlines"].groupby("Carrier").mean(numeric_only=False)
AvgTicketPrice Cancelled timestamp
Carrier
ES-Air 630.235816 0.129814 2018-01-21 20:45:00.200000000
JetBeats 627.457373 0.134698 2018-01-21 14:43:18.112400635
Logstash Airways 624.581974 0.125188 2018-01-21 16:14:50.711798340
------------------------------------