47 Commits

Author SHA1 Message Date
stevedodson
efe21a6d87
Feature/arithmetic ops (#102)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5
2020-01-10 08:05:43 +00:00
stevedodson
5a3c73ea54
Feature/info es fix (#99)
* Resolving inconsistent __repr__ test on python 3.5

* Fixing layout for info_es + adding Series.hist doc
2019-12-12 14:36:56 +01:00
stevedodson
c5730e6d38
Feature/python 3.5 (#93)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.
2019-12-11 14:27:35 +01:00
stevedodson
e8a0fbb9f3
Feature/pandas.0.25.3 (#91)
* Added example notebooks + pytest for these notebooks1

* Fixed paths

* Fixing link in docs

* Minor update for pandas 0.25.3

* Updates for pandas 0.25.3

* Fixing doc links with pandas 0.25.3 update.

* Reverting overwrite to changes to notebooks.
2019-12-10 16:05:37 +01:00
stevedodson
133b227b93
Added example notebooks + pytest for notebooks (#87)
* Added example notebooks + pytest for these notebooks1

* Fixed paths

* Fixing link in docs

* Adding cleaner demo_notebook
2019-12-10 15:27:13 +01:00
stevedodson
206276c5fa
Adding Apache 2 copyright header to all .py files (#86) 2019-12-06 09:44:05 +00:00
stevedodson
f06219f0ec
Feature/refactor tasks (#83)
* Significant refactor of task list in operations.py

Classes based on composite pattern replace tuples for
tasks.

* Addressing review comments for eland/operations.py

* Minor update to review fixes

* Minor fix for some better handling of non-aggregatable fields: https://github.com/elastic/eland/issues/71

* Test for non-aggrgatable value_counts

* Refactoring tasks/actions

* Removing debug and fixing doctest
2019-12-06 08:46:43 +00:00
Stephen Dodson
93dadc054c Fixing docstring format 2019-11-26 11:10:18 +00:00
Stephen Dodson
86686ebb18 Reformat and cleanup based on PyCharm 2019-11-26 11:02:46 +00:00
Stephen Dodson
b99f25e4ee Adding __r* operations and resolving issues with df.info() 2019-11-25 15:00:02 +00:00
Stephen Dodson
ac8cb302de Updates based on PR review. 2019-11-25 12:43:37 +00:00
Stephen Dodson
91c811345c Minor updates to docs and doctests 2019-11-22 16:22:16 +00:00
Stephen Dodson
84e23ab5d1 Added Series metric aggs + Series docs
Also, improved Series.to_string()
2019-11-22 15:44:55 +00:00
Stephen Dodson
5d119215f8 Fixing rename and truediv issues
tests pass
TODO - implement additional orithmetic ops
2019-11-21 20:37:54 +00:00
Stephen Dodson
c12bf9357b Series rename and arithmetic initial implementation
Partially implemented, tests fail with this commit.
2019-11-21 15:39:13 +00:00
Michael Hirsch
f1ec6c0d8b fixes UnboundLocalError when displaying empty dataframes 2019-11-19 15:52:03 -05:00
Michael Hirsch
c90602dd65 sets max_rows=1 in case of empty dataframe 2019-11-19 15:13:18 -05:00
Michael Hirsch
9c9ca90c0d
Adds Support for Series.value_counts() (#49)
* adds support for series.value_counts

* adds docs for series.value_counts

* adds tests for series.value_counts

* updates keyerror language

* adds es docs as an external source

* adds parameters for metrics and terms aggs

* adds 2 tests to check for exceptions

* explains the size parameter

* removes print statements from tests

* checks that es_size is a positive integer

* implements assert_series_equal
2019-11-19 11:27:15 -05:00
Stephen Dodson
9b4fe40305 Updating docs + added supported methods doc 2019-11-19 10:42:23 +00:00
Stephen Dodson
fb2a1fae7b Updated to_string/to_html docs 2019-11-18 15:27:43 +00:00
Stephen Dodson
327f43d912 Fixing issue in to_html/to_string if max_rows is set 2019-11-18 14:47:35 +00:00
Stephen Dodson
d92ed94ef0 Improve to_string/to_html/__repr__/_repr_html_ tests
Added more rigorious tests for string representation
and fixing issue with to_html.
2019-11-18 12:55:23 +00:00
Michael Hirsch
30d307bdaf implements min rows to truncate display for large results 2019-11-15 17:38:46 -05:00
Michael Hirsch
b0be68e1db tabular display: show 10 rows if index is larger than max_rows 2019-11-15 11:10:35 -05:00
Stephen Dodson
f5025b9f39 Renamed ed_to_pd eland_to_pandas and added docs.
+ added some additions to .gitignore
+ removed DataFrame.squeeze for now
2019-11-15 11:21:27 +00:00
Stephen Dodson
5a546577f4 Resolving DataFrame.query issues + more docs 2019-11-14 20:04:38 +00:00
Stephen Dodson
dff49d01fe More doc updates. 2019-11-13 18:23:43 +00:00
Stephen Dodson
e181476dfe First effort at tidying up docs. Still work-in-progress. 2019-11-12 20:26:59 +00:00
Stephen Dodson
8de7a1db7d Resolved minor PyCharm issues 2019-11-05 13:31:10 +00:00
Stephen Dodson
c1ee409a33 Major cleanup - removed modin as dependency
modin removed as a dependency and iloc feature
removed for now - TODO add back in.
2019-11-04 13:13:42 +00:00
Stephen Dodson
9dad8613d3 Fixing tests, and upgrading to pandas 0.25.1 2019-10-18 08:06:07 +00:00
Stephen Dodson
315d4c3287 Resolving some issues with import dependencies 2019-10-08 14:39:24 +00:00
Stephen Dodson
ef289bfe78 Adding partial DataFrame.query support
Only > and == currently implemented for PoC. 'query'
language not supported yet.
2019-08-14 14:44:04 +00:00
Stephen Dodson
49bad292d3 Added DataFrame.to_csv - tests still failing 2019-08-09 07:54:44 +00:00
Stephen Dodson
c6e0c5b92b Adding smaller test and first effort to implement aggs 2019-08-06 14:58:38 +00:00
Stephen Dodson
67b7aee9c9 Adding DataFrame.hist tests and DataFrame.select_dtypes 2019-08-01 12:55:17 +00:00
Stephen Dodson
3435ffac1b Adding first implementation of eland.DataFrame.hist 2019-07-31 09:59:52 +00:00
Stephen Dodson
1fa4d3fbe7 Partial implementation of hist - does not work
Backup push
2019-07-12 15:24:32 +00:00
Stephen Dodson
d71ce9f50c Adding drop + the ability for operations to have a query
Significant refactor - needs cleanup
2019-07-11 10:11:57 +00:00
Stephen Dodson
a73c999290 iloc is (mainly) working. 2019-07-09 10:02:08 +00:00
Stephen Dodson
d0ea715c31 Added test data and additional test cases 2019-07-04 19:25:47 +00:00
Stephen Dodson
15e0c37182 Major refactor. eland is now backed by modin.
First push, still not functional.
2019-07-04 13:00:19 +00:00
Stephen Dodson
5e10b2e818 Checkpoint code before attempting major investigation into using modin 2019-07-03 09:49:58 +00:00
Stephen Dodson
30df901fce Introduction of eland.Series - big refactor
Creation of eland.NDFrame as base class for DataFrame and Series
2019-07-01 18:41:56 +00:00
Stephen Dodson
c4d2683743 Adding eland.Index features 2019-06-28 14:43:20 +00:00
Stephen Dodson
c723633526 Resolving merge issue 2019-06-22 06:55:30 +00:00
Winterflower
52cf04a97f Renaming modules to lowercase 2019-06-18 10:54:26 +02:00