355 Commits

Author SHA1 Message Date
stevedodson
679f8f4170
Minor fixes for readthedocs compatibility. (#106) 2020-01-10 11:02:51 +00:00
stevedodson
c3c2f8a020
Minor updates to README.md + merge fixes (#105) 2020-01-10 09:26:13 +00:00
stevedodson
a3293168a1
Feature/filtered hist (#104)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5

* Adding simple cache for mappings in flatten

Improves performance significantly on large
datasets (>10000 rows).

* Adding updated notebooks (new info_es).

All tests (doc + nbval + pytest) pass.

* Fixing issue with non-zero offset histograms.
2020-01-10 08:17:45 +00:00
stevedodson
903fbf0341
Feature/mapping cache (#103)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5

* Adding simple cache for mappings in flatten

Improves performance significantly on large
datasets (>10000 rows).

* Adding updated notebooks (new info_es).

All tests (doc + nbval + pytest) pass.
2020-01-10 08:12:03 +00:00
stevedodson
efe21a6d87
Feature/arithmetic ops (#102)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5
2020-01-10 08:05:43 +00:00
stevedodson
bdaea4658c
Fixing addition repr test for python 3.5. (#100) 2019-12-12 15:57:52 +01:00
stevedodson
5a3c73ea54
Feature/info es fix (#99)
* Resolving inconsistent __repr__ test on python 3.5

* Fixing layout for info_es + adding Series.hist doc
2019-12-12 14:36:56 +01:00
stevedodson
4bb73215a0
Resolving inconsistent __repr__ test on python 3.5 (#98) 2019-12-12 12:51:29 +01:00
Michael Hirsch
79fdb1727e
Add Support for Series Histograms (#95)
* add support for series plotting
* update docs for series plotting support
* add tests for series plotting
* fix typo
* adds comment to ed_hist_series
2019-12-11 14:51:47 -05:00
stevedodson
c5730e6d38
Feature/python 3.5 (#93)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.
2019-12-11 14:27:35 +01:00
stevedodson
e8a0fbb9f3
Feature/pandas.0.25.3 (#91)
* Added example notebooks + pytest for these notebooks1

* Fixed paths

* Fixing link in docs

* Minor update for pandas 0.25.3

* Updates for pandas 0.25.3

* Fixing doc links with pandas 0.25.3 update.

* Reverting overwrite to changes to notebooks.
2019-12-10 16:05:37 +01:00
stevedodson
133b227b93
Added example notebooks + pytest for notebooks (#87)
* Added example notebooks + pytest for these notebooks1

* Fixed paths

* Fixing link in docs

* Adding cleaner demo_notebook
2019-12-10 15:27:13 +01:00
stevedodson
206276c5fa
Adding Apache 2 copyright header to all .py files (#86) 2019-12-06 09:44:05 +00:00
stevedodson
f06219f0ec
Feature/refactor tasks (#83)
* Significant refactor of task list in operations.py

Classes based on composite pattern replace tuples for
tasks.

* Addressing review comments for eland/operations.py

* Minor update to review fixes

* Minor fix for some better handling of non-aggregatable fields: https://github.com/elastic/eland/issues/71

* Test for non-aggrgatable value_counts

* Refactoring tasks/actions

* Removing debug and fixing doctest
2019-12-06 08:46:43 +00:00
Michael Hirsch
f263e21b8a Better Handling of Non Aggregatable Fields (#85)
* updates ecommerce mapping to include non-aggregatable text field

* updates exists tests and adds new tests for non-aggregatable field

* better handling on non-aggregatable fields

* fixes formatting

* swaps series in assertion

* adds newline
2019-12-06 08:20:09 +00:00
Francesco Vigliaturo
99bfea42b6
Added support for 2 date formats: (#70)
* Adds support for multiple date formats
2019-12-04 17:42:50 +01:00
Stephen Dodson
1423aaad2d Adding minor fixes to last PR 2019-12-03 14:07:05 +00:00
Stephen Dodson
57857277cd Merge remote-tracking branch 'upstream/master' into feature/fix_nested_not_filters 2019-12-03 14:03:03 +00:00
Stephen Dodson
bf6c56878a Correcting license files + fixing bug in filter
LICENSE and NOTICE conform to Elastic policy. Bug in
nested negated filters fixed.

Also, some limited cleanup.
2019-12-03 13:56:49 +00:00
Winterflower
3e82d43351 Merge branch 'pull-request-job' of https://github.com/Winterflower/eland into pull-request-job 2019-12-02 20:32:30 +01:00
Winterflower
10e1adb680 Removes code duplication in test code 2019-12-02 20:31:53 +01:00
Camilla
5a5f38e28a
Merge branch 'master' into pull-request-job 2019-12-02 20:26:10 +01:00
Winterflower
33674088ca Refactors eland tests to accept pre-configured client 2019-12-02 20:23:40 +01:00
Camilla
4b2daa9ddc
Merge pull request #80 from Winterflower/pull-request-job
Adds Pull Request job for eland CI
2019-12-02 18:15:54 +01:00
Winterflower
4e6305c24a Removes stripping of http from url 2019-12-02 17:29:17 +01:00
Winterflower
8aadc33687 Removes stripping of http from url 2019-12-02 17:18:40 +01:00
Winterflower
08357126cb Adds function to increase script.max_compilations_rate to prevent test failures 2019-12-02 13:08:47 +01:00
Camilla
99ae4057f4
Merge pull request #61 from Winterflower/ci-setup
Eland CI Setup
2019-11-29 10:19:06 +01:00
Winterflower
83d0c3de38 Adds helpful statement to know which ES instance you are connecting to 2019-11-27 21:03:47 +01:00
Winterflower
ce477021c1 Strips hostname from extra http if present in string to prevent failures in low level Python socket module 2019-11-27 21:03:03 +01:00
Michael Hirsch
a3dd86075a
String Arithmetics: __add__ ops (#68)
* adds support for __add__ ops for string objects and literals

* adds tests for string arithmetic

* updates comment in numeric field resolution

* adds op_type parameter for numeric_ops
2019-11-27 10:44:17 -05:00
Stephen Dodson
93dadc054c Fixing docstring format 2019-11-26 11:10:18 +00:00
Stephen Dodson
86686ebb18 Reformat and cleanup based on PyCharm 2019-11-26 11:02:46 +00:00
Winterflower
c5b479b4f9 Adding line to read ES hostname from docker env var 2019-11-25 20:41:58 +01:00
Stephen Dodson
9bbe9bbb1c Fixing issue with addition for strings
e.g. df['currency']+1
2019-11-25 16:15:50 +00:00
Stephen Dodson
85422e2023 Adding series __r* docs 2019-11-25 15:49:27 +00:00
Stephen Dodson
b99f25e4ee Adding __r* operations and resolving issues with df.info() 2019-11-25 15:00:02 +00:00
Stephen Dodson
ac8cb302de Updates based on PR review. 2019-11-25 12:43:37 +00:00
Stephen Dodson
e755a2e160 Minor doc fix for Series.to_string 2019-11-22 16:29:51 +00:00
Stephen Dodson
91c811345c Minor updates to docs and doctests 2019-11-22 16:22:16 +00:00
Stephen Dodson
84e23ab5d1 Added Series metric aggs + Series docs
Also, improved Series.to_string()
2019-11-22 15:44:55 +00:00
Stephen Dodson
5d119215f8 Fixing rename and truediv issues
tests pass
TODO - implement additional orithmetic ops
2019-11-21 20:37:54 +00:00
Stephen Dodson
c12bf9357b Series rename and arithmetic initial implementation
Partially implemented, tests fail with this commit.
2019-11-21 15:39:13 +00:00
Stephen Dodson
6564f26245 Adding 'development' section to docs
Adding contributing section based on Elasticsearch/CONTRIBUTING.md
TODO - add testing docs (based on CI)1
2019-11-20 10:32:35 +00:00
stevedodson
2a409962ea
Merge pull request #55 from blaklaybul/fix-boolean-index
Instantiates Column as Series with Specified dtype
2019-11-20 08:14:59 +00:00
Michael Hirsch
f1ec6c0d8b fixes UnboundLocalError when displaying empty dataframes 2019-11-19 15:52:03 -05:00
Michael Hirsch
c90602dd65 sets max_rows=1 in case of empty dataframe 2019-11-19 15:13:18 -05:00
Michael Hirsch
9c03d5a0d4 instantiates column as series with specified dtype 2019-11-19 13:13:08 -05:00
Michael Hirsch
9c9ca90c0d
Adds Support for Series.value_counts() (#49)
* adds support for series.value_counts

* adds docs for series.value_counts

* adds tests for series.value_counts

* updates keyerror language

* adds es docs as an external source

* adds parameters for metrics and terms aggs

* adds 2 tests to check for exceptions

* explains the size parameter

* removes print statements from tests

* checks that es_size is a positive integer

* implements assert_series_equal
2019-11-19 11:27:15 -05:00
stevedodson
885a0a4aba
Merge pull request #51 from stevedodson/master
Updating docs + added supported methods doc
2019-11-19 14:09:13 +00:00