119 Commits

Author SHA1 Message Date
stevedodson
fa930b6cea
7.6.0a2 (#130)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs

* Bumping version following doc fixes
2020-02-15 20:10:41 +01:00
stevedodson
163d18d84e
Updating ML docs (#129)
* Updating test matrix for 7.6 + removing oss for now.

* Resolving 7.6.0 docs issues

* Updating ML docs
2020-02-15 19:52:04 +01:00
stevedodson
b535e69b92
Updating to 7.6.0a1 (#126) 2020-02-15 16:14:48 +01:00
stevedodson
7c1c2945a7
ML add externral models (#125)
* Partially implemented implementation of ml.ExternalModel

* Adding eland.ml.ExternalMLModel

More testing to be added + more support for MLModels
2020-02-15 15:54:29 +01:00
stevedodson
4ac67a73ea
Bumping version (#123) 2020-02-05 09:59:54 +00:00
stevedodson
c5f5d00bb0
Adding support for df['timestamp'].min() etc. (#122)
There is still a difference between pandas/eland in terms
of min/max etc. aggregations as pandas supports this
on strings.
2020-01-30 11:03:37 +00:00
stevedodson
2ca538c49d
Feature/show progress (#120)
* Adding show_progress debug option to eland_to_pandas

* Adding show_progress debug option to eland_to_pandas
2020-01-29 12:59:48 +00:00
stevedodson
409cb043c8
Refactoring of plotting + fixes for multiple charts (#117)
* Refactoring of plotting + fixes for multiple charts

Updates to plotting inline with pandas 0.25.3
Enables plotting of multiple histograms on the
same figure.

* Fix to setup.py to allow submodules

+ reformat of code and better Series.hist docs
2020-01-29 07:07:56 +00:00
stevedodson
46b428d59b
Improved read_csv docs + made 'to_eland' params consistent (#114)
* Improved read_csv docs + made 'to_eland' params consistent

Note, will change API.

* Removing additional args from pytest.

doctests + nbval tests in the CI are not addressed by
this PR.
2020-01-16 10:17:49 +00:00
stevedodson
1914644f93
Improve docs (#113)
* Adding more examples

* Adding more examples to README.md + pypi first page.

* Updated README.md
2020-01-13 15:32:41 +00:00
stevedodson
86c51dc267
Fix licensing headers (#112)
* Minor fixes for readthedocs compatibility.

* Adding doc templates

* Setting first version to 7.5.1
2020-01-13 11:54:43 +00:00
stevedodson
d7207bab3b
7.5.1a2 (#110)
* Updating README.md

* New version

* Fixing description for pypi
2020-01-10 15:40:15 +00:00
stevedodson
00fb775d29
Feature/versioning (#109)
* Minor fixes for readthedocs compatibility.

* Adding doc templates

* Setting first version to 7.5

* Resolving pypi issues + minor docs
2020-01-10 14:38:56 +00:00
stevedodson
f93b893f9d
Setting version number to valid version (#108)
* Minor fixes for readthedocs compatibility.

* Adding doc templates

* Setting first version to 7.5
2020-01-10 11:47:52 +00:00
stevedodson
679f8f4170
Minor fixes for readthedocs compatibility. (#106) 2020-01-10 11:02:51 +00:00
stevedodson
c3c2f8a020
Minor updates to README.md + merge fixes (#105) 2020-01-10 09:26:13 +00:00
stevedodson
a3293168a1
Feature/filtered hist (#104)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5

* Adding simple cache for mappings in flatten

Improves performance significantly on large
datasets (>10000 rows).

* Adding updated notebooks (new info_es).

All tests (doc + nbval + pytest) pass.

* Fixing issue with non-zero offset histograms.
2020-01-10 08:17:45 +00:00
stevedodson
903fbf0341
Feature/mapping cache (#103)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5

* Adding simple cache for mappings in flatten

Improves performance significantly on large
datasets (>10000 rows).

* Adding updated notebooks (new info_es).

All tests (doc + nbval + pytest) pass.
2020-01-10 08:12:03 +00:00
stevedodson
efe21a6d87
Feature/arithmetic ops (#102)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5
2020-01-10 08:05:43 +00:00
stevedodson
bdaea4658c
Fixing addition repr test for python 3.5. (#100) 2019-12-12 15:57:52 +01:00
stevedodson
5a3c73ea54
Feature/info es fix (#99)
* Resolving inconsistent __repr__ test on python 3.5

* Fixing layout for info_es + adding Series.hist doc
2019-12-12 14:36:56 +01:00
stevedodson
4bb73215a0
Resolving inconsistent __repr__ test on python 3.5 (#98) 2019-12-12 12:51:29 +01:00
Michael Hirsch
79fdb1727e
Add Support for Series Histograms (#95)
* add support for series plotting
* update docs for series plotting support
* add tests for series plotting
* fix typo
* adds comment to ed_hist_series
2019-12-11 14:51:47 -05:00
stevedodson
c5730e6d38
Feature/python 3.5 (#93)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.
2019-12-11 14:27:35 +01:00
stevedodson
e8a0fbb9f3
Feature/pandas.0.25.3 (#91)
* Added example notebooks + pytest for these notebooks1

* Fixed paths

* Fixing link in docs

* Minor update for pandas 0.25.3

* Updates for pandas 0.25.3

* Fixing doc links with pandas 0.25.3 update.

* Reverting overwrite to changes to notebooks.
2019-12-10 16:05:37 +01:00
stevedodson
133b227b93
Added example notebooks + pytest for notebooks (#87)
* Added example notebooks + pytest for these notebooks1

* Fixed paths

* Fixing link in docs

* Adding cleaner demo_notebook
2019-12-10 15:27:13 +01:00
stevedodson
206276c5fa
Adding Apache 2 copyright header to all .py files (#86) 2019-12-06 09:44:05 +00:00
stevedodson
f06219f0ec
Feature/refactor tasks (#83)
* Significant refactor of task list in operations.py

Classes based on composite pattern replace tuples for
tasks.

* Addressing review comments for eland/operations.py

* Minor update to review fixes

* Minor fix for some better handling of non-aggregatable fields: https://github.com/elastic/eland/issues/71

* Test for non-aggrgatable value_counts

* Refactoring tasks/actions

* Removing debug and fixing doctest
2019-12-06 08:46:43 +00:00
Michael Hirsch
f263e21b8a Better Handling of Non Aggregatable Fields (#85)
* updates ecommerce mapping to include non-aggregatable text field

* updates exists tests and adds new tests for non-aggregatable field

* better handling on non-aggregatable fields

* fixes formatting

* swaps series in assertion

* adds newline
2019-12-06 08:20:09 +00:00
Francesco Vigliaturo
99bfea42b6
Added support for 2 date formats: (#70)
* Adds support for multiple date formats
2019-12-04 17:42:50 +01:00
Stephen Dodson
1423aaad2d Adding minor fixes to last PR 2019-12-03 14:07:05 +00:00
Stephen Dodson
57857277cd Merge remote-tracking branch 'upstream/master' into feature/fix_nested_not_filters 2019-12-03 14:03:03 +00:00
Stephen Dodson
bf6c56878a Correcting license files + fixing bug in filter
LICENSE and NOTICE conform to Elastic policy. Bug in
nested negated filters fixed.

Also, some limited cleanup.
2019-12-03 13:56:49 +00:00
Winterflower
3e82d43351 Merge branch 'pull-request-job' of https://github.com/Winterflower/eland into pull-request-job 2019-12-02 20:32:30 +01:00
Winterflower
10e1adb680 Removes code duplication in test code 2019-12-02 20:31:53 +01:00
Camilla
5a5f38e28a
Merge branch 'master' into pull-request-job 2019-12-02 20:26:10 +01:00
Winterflower
33674088ca Refactors eland tests to accept pre-configured client 2019-12-02 20:23:40 +01:00
Camilla
4b2daa9ddc
Merge pull request #80 from Winterflower/pull-request-job
Adds Pull Request job for eland CI
2019-12-02 18:15:54 +01:00
Winterflower
4e6305c24a Removes stripping of http from url 2019-12-02 17:29:17 +01:00
Winterflower
8aadc33687 Removes stripping of http from url 2019-12-02 17:18:40 +01:00
Winterflower
08357126cb Adds function to increase script.max_compilations_rate to prevent test failures 2019-12-02 13:08:47 +01:00
Camilla
99ae4057f4
Merge pull request #61 from Winterflower/ci-setup
Eland CI Setup
2019-11-29 10:19:06 +01:00
Winterflower
83d0c3de38 Adds helpful statement to know which ES instance you are connecting to 2019-11-27 21:03:47 +01:00
Winterflower
ce477021c1 Strips hostname from extra http if present in string to prevent failures in low level Python socket module 2019-11-27 21:03:03 +01:00
Michael Hirsch
a3dd86075a
String Arithmetics: __add__ ops (#68)
* adds support for __add__ ops for string objects and literals

* adds tests for string arithmetic

* updates comment in numeric field resolution

* adds op_type parameter for numeric_ops
2019-11-27 10:44:17 -05:00
Stephen Dodson
93dadc054c Fixing docstring format 2019-11-26 11:10:18 +00:00
Stephen Dodson
86686ebb18 Reformat and cleanup based on PyCharm 2019-11-26 11:02:46 +00:00
Winterflower
c5b479b4f9 Adding line to read ES hostname from docker env var 2019-11-25 20:41:58 +01:00
Stephen Dodson
9bbe9bbb1c Fixing issue with addition for strings
e.g. df['currency']+1
2019-11-25 16:15:50 +00:00
Stephen Dodson
85422e2023 Adding series __r* docs 2019-11-25 15:49:27 +00:00