487 Commits

Author SHA1 Message Date
Seth Michael Larson
38251ddf08
No spaces in delimiters for serialized ML model 2020-04-02 07:40:51 -05:00
Stephen Dodson
71f2a3f793
Added 'use_pandas_index_for_es_ids' param to pandas_to_eland() 2020-03-31 09:20:47 -05:00
Daniel Mesejo-León
03582b9f5e
Import __version__ and other metadata by name 2020-03-30 07:45:04 -05:00
Seth Michael Larson
790e2b0de8
Update README with supported versions, pandas v1 outputs 2020-03-27 13:13:50 -05:00
Daniel Mesejo-León
e27a508c59
Update supported Pandas to v1.0 2020-03-27 12:21:15 -05:00
Seth Michael Larson
0c1d7222fe
Drop support for Python 3.5, add Black 2020-03-27 07:56:28 -05:00
Stephen Dodson
9e2997c00d
Bug/is scripted error (#149)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Minor mod to support 6.x style indices.

Currently, there is no specific test for this as
it requires a 6.x cluster. 6.x is not officially
supported by 7.x clients, but this is a friendly
option for users.

* Adding unittest for FieldMappings._extract_fields_from_mapping

* Changing to f-string formatting and adding exception test

* Reverting to OrderedDict

Will change after https://github.com/elastic/eland/pull/150 is merged.
2020-03-26 15:17:10 +00:00
Seth Michael Larson
2e74a56c0a
Release v7.6.0a4 7.6.0a4 2020-03-23 08:43:59 -05:00
Seth Michael Larson
e9a5180dac
Add python_requires to setup.py 2020-03-23 08:35:07 -05:00
Stephen Dodson
9fffbc4f39
Update README.md 2020-03-13 09:19:05 +00:00
Stephen Dodson
2c29e28a2f Updating logo 2020-03-13 09:17:56 +00:00
Stephen Dodson
43e4d03b39
Too long frame exception2 (#137)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Fixing too_long_frame_exception in scan/scroll
2020-02-28 12:49:59 +00:00
Stephen Dodson
a33ff45ebc
Too long frame exception fixes (#135)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Resolving too_long_frame_exception on large mappings

- Embedded _source parameters in bodt rather than url
- Fixed bug in DataFrame.info on empty DataFrame
- Removed warning from TestImportedMLModel

* Resolving too_long_frame_exception on large mappings

- Embedded _source parameters in bodt rather than url
- Fixed bug in DataFrame.info on empty DataFrame
- Removed warning from TestImportedMLModel
2020-02-26 12:50:14 +00:00
Stephen Dodson
206677818f Fixes to enforce xgboost==0.90
Issue raised to upgrade xgboost version
2020-02-24 09:20:36 +00:00
stevedodson
62b3133eae
7.6.0a3 (#132)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Bumping version following doc fixes

* Change ExternalMLModel to ImportedMLModel

* Bumping version
7.6.0a3
2020-02-15 20:33:33 +01:00
stevedodson
1a90e9232e
7.6.0a3 (#131)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Bumping version following doc fixes

* Change ExternalMLModel to ImportedMLModel
2020-02-15 20:29:03 +01:00
stevedodson
fa930b6cea
7.6.0a2 (#130)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Bumping version following doc fixes
7.6.0a2
2020-02-15 20:10:41 +01:00
stevedodson
163d18d84e
Updating ML docs (#129)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs
2020-02-15 19:52:04 +01:00
stevedodson
1cfcd0ab2b
Resolving docs issues (#128)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues
2020-02-15 19:37:41 +01:00
stevedodson
404e658a26
Updating test matrix for 7.6 + removing oss for now. (#127) 2020-02-15 18:48:17 +01:00
stevedodson
b535e69b92
Updating to 7.6.0a1 (#126) 7.6.0a1 2020-02-15 16:14:48 +01:00
stevedodson
7c1c2945a7
ML add externral models (#125)
* Partially implemented implementation of ml.ExternalModel

* Adding eland.ml.ExternalMLModel

More testing to be added + more support for MLModels
2020-02-15 15:54:29 +01:00
stevedodson
4ac67a73ea
Bumping version (#123) 7.5.1a4 2020-02-05 09:59:54 +00:00
stevedodson
c5f5d00bb0
Adding support for df['timestamp'].min() etc. (#122)
There is still a difference between pandas/eland in terms
of min/max etc. aggregations as pandas supports this
on strings.
2020-01-30 11:03:37 +00:00
stevedodson
2ca538c49d
Feature/show progress (#120)
* Adding show_progress debug option to eland_to_pandas

* Adding show_progress debug option to eland_to_pandas
2020-01-29 12:59:48 +00:00
stevedodson
409cb043c8
Refactoring of plotting + fixes for multiple charts (#117)
* Refactoring of plotting + fixes for multiple charts

Updates to plotting inline with pandas 0.25.3
Enables plotting of multiple histograms on the
same figure.

* Fix to setup.py to allow submodules

+ reformat of code and better Series.hist docs
2020-01-29 07:07:56 +00:00
stevedodson
46b428d59b
Improved read_csv docs + made 'to_eland' params consistent (#114)
* Improved read_csv docs + made 'to_eland' params consistent

Note, will change API.

* Removing additional args from pytest.

doctests + nbval tests in the CI are not addressed by
this PR.
2020-01-16 10:17:49 +00:00
stevedodson
1914644f93
Improve docs (#113)
* Adding more examples

* Adding more examples to README.md + pypi first page.

* Updated README.md
7.5.1a3
2020-01-13 15:32:41 +00:00
stevedodson
86c51dc267
Fix licensing headers (#112)
* Minor fixes for readthedocs compatibility.

* Adding doc templates

* Setting first version to 7.5.1
2020-01-13 11:54:43 +00:00
stevedodson
db3bb02335
Rename LICENSE to LICENSE.txt 2020-01-13 11:42:20 +00:00
stevedodson
277a52a242
Update LICENSE 2020-01-13 11:41:43 +00:00
stevedodson
2f87ca5901
Delete LICENSE.txt (#111)
* Delete LICENSE.txt

* Create LICENSE
2020-01-13 11:26:11 +00:00
stevedodson
5995e11bfd
Update README.md 2020-01-13 10:22:42 +00:00
stevedodson
a4736150f6
Update README.md 2020-01-13 09:01:34 +00:00
stevedodson
d7207bab3b
7.5.1a2 (#110)
* Updating README.md

* New version

* Fixing description for pypi
7.5.1a2
2020-01-10 15:40:15 +00:00
stevedodson
00fb775d29
Feature/versioning (#109)
* Minor fixes for readthedocs compatibility.

* Adding doc templates

* Setting first version to 7.5

* Resolving pypi issues + minor docs
7.5.1a1
2020-01-10 14:38:56 +00:00
stevedodson
f93b893f9d
Setting version number to valid version (#108)
* Minor fixes for readthedocs compatibility.

* Adding doc templates

* Setting first version to 7.5
2020-01-10 11:47:52 +00:00
stevedodson
1c772d0e50
More readthedocs fixes. (#107)
* Minor fixes for readthedocs compatibility.

* Adding doc templates
2020-01-10 11:33:51 +00:00
stevedodson
1d273ae465
Update README.md 2020-01-10 11:13:29 +00:00
stevedodson
679f8f4170
Minor fixes for readthedocs compatibility. (#106) 2020-01-10 11:02:51 +00:00
stevedodson
c3c2f8a020
Minor updates to README.md + merge fixes (#105) 2020-01-10 09:26:13 +00:00
stevedodson
a3293168a1
Feature/filtered hist (#104)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5

* Adding simple cache for mappings in flatten

Improves performance significantly on large
datasets (>10000 rows).

* Adding updated notebooks (new info_es).

All tests (doc + nbval + pytest) pass.

* Fixing issue with non-zero offset histograms.
2020-01-10 08:17:45 +00:00
stevedodson
903fbf0341
Feature/mapping cache (#103)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5

* Adding simple cache for mappings in flatten

Improves performance significantly on large
datasets (>10000 rows).

* Adding updated notebooks (new info_es).

All tests (doc + nbval + pytest) pass.
2020-01-10 08:12:03 +00:00
stevedodson
efe21a6d87
Feature/arithmetic ops (#102)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5
2020-01-10 08:05:43 +00:00
Martijn Laarman
617583183f Move to latest .ci script structure (#101)
Introduces a dedicated `run-repository.sh` for the repository custom
bits.

This allows us to keep `run-elasticsearch.sh` and `run-tests` in sync
through file copying or patches easier.
2020-01-09 11:18:56 +01:00
stevedodson
bdaea4658c
Fixing addition repr test for python 3.5. (#100) 2019-12-12 15:57:52 +01:00
Camilla
a5380813a7
Adds Python 3.8 support (#96)
* Adds build status sticker to README

* Adds Python version to test matrix

* Adds debug echo message

* Adds back Python 3.5.3 to test matrix

* Adds Python version to test matrix

* Adds back Python 3.5.3 to test matrix

* Adds Python 3.8 to test matrix
2019-12-12 14:57:52 +01:00
stevedodson
5a3c73ea54
Feature/info es fix (#99)
* Resolving inconsistent __repr__ test on python 3.5

* Fixing layout for info_es + adding Series.hist doc
2019-12-12 14:36:56 +01:00
stevedodson
4bb73215a0
Resolving inconsistent __repr__ test on python 3.5 (#98) 2019-12-12 12:51:29 +01:00
Michael Hirsch
79fdb1727e
Add Support for Series Histograms (#95)
* add support for series plotting
* update docs for series plotting support
* add tests for series plotting
* fix typo
* adds comment to ed_hist_series
2019-12-11 14:51:47 -05:00