37 Commits

Author SHA1 Message Date
Seth Michael Larson
05a24cbe0b Add isort, rename Nox session to 'format' 2020-10-15 17:11:29 -05:00
P. Sai Vinay
b7c6c26606
Change DataFrame.filter() to preserve the order of items 2020-10-13 10:58:09 -05:00
P. Sai Vinay
66b24f9e8a
Replace MLModel(overwrite) with es_if_exists 2020-08-17 12:10:27 -05:00
Seth Michael Larson
92170c22d9 Add try_sort() to eland.utils
This function was deprecated and removed in Pandas v1.1
2020-08-14 12:55:02 -05:00
Seth Michael Larson
140623283a
Support Series/collections in Series.isin(), add type hints 2020-07-14 11:39:52 -05:00
Seth Michael Larson
ceacf759c3
Add long Apache-2.0 license header to all files 2020-07-08 15:10:43 -05:00
Léonard Binet
5d0df757cf
Add column names to DataFrame.__dir__ for better auto-completion support 2020-07-02 08:49:52 -05:00
Seth Michael Larson
1378544933
Normalize and prune top-level APIs 2020-05-18 14:55:41 -05:00
Seth Michael Larson
7946eb4daa
Add an enforce license headers 2020-04-25 16:26:58 -05:00
Seth Michael Larson
33b4976f9a
Add type hints to base modules 2020-04-24 12:39:13 -05:00
Stephen Dodson
50734f8bd9
Allow user to specify es data types in read_csv and pandas_to_eland (#181)
* Allow user to specify es data types in read_csv and pandas_to_eland

Also, some minor maintenance modifications:

- replaced pandas.util.testing with pandas.testing (required in 1.x)
- updated elasticsearch-py requirements to 7.6+ (to support ML code)

* linting file
2020-04-14 15:04:12 +00:00
Seth Michael Larson
448770df78
Restrict public API, update license header 2020-04-14 07:31:23 -05:00
Seth Michael Larson
064d43b9ef
Remove eland.Client, use Elasticsearch directly 2020-04-06 07:25:25 -05:00
Stephen Dodson
71f2a3f793
Added 'use_pandas_index_for_es_ids' param to pandas_to_eland() 2020-03-31 09:20:47 -05:00
Seth Michael Larson
0c1d7222fe
Drop support for Python 3.5, add Black 2020-03-27 07:56:28 -05:00
stevedodson
2ca538c49d
Feature/show progress (#120)
* Adding show_progress debug option to eland_to_pandas

* Adding show_progress debug option to eland_to_pandas
2020-01-29 12:59:48 +00:00
stevedodson
46b428d59b
Improved read_csv docs + made 'to_eland' params consistent (#114)
* Improved read_csv docs + made 'to_eland' params consistent

Note, will change API.

* Removing additional args from pytest.

doctests + nbval tests in the CI are not addressed by
this PR.
2020-01-16 10:17:49 +00:00
stevedodson
efe21a6d87
Feature/arithmetic ops (#102)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.

* Adding support for multiple arithmetic operations.

Added new 'arithmetics' file to manage this process.
More tests to be added + cleanup.

* Signficant refactor to arithmetics and mappings.

Work in progress. Tests don't pass.

* Major refactor to Mappings.

Field name mappings were stored in different places
(Mappings, QueryCompiler, Operations) and needed to
be keep in sync.

With the addition of complex arithmetic operations
this became complex and difficult to maintain. Therefore,
all field naming is now in 'FieldMappings' which
replaces 'Mappings'.

Note this commit removes the cache for some of the
mapped values and so the code is SIGNIFICANTLY
slower on large indices.

In addition, the addition of date_format to
Mappings has been removed. This again added more
unncessary complexity.

* Adding OrderedDict for 3.5 compatibility

* Fixes to ordering issues with 3.5
2020-01-10 08:05:43 +00:00
stevedodson
c5730e6d38
Feature/python 3.5 (#93)
* Adding python 3.5 compatibility.

Main issue is ordering of dictionaries.

* Updating notebooks with 3.7 results.

* Removing tempoorary code.

* Defaulting to OrderedDict for python 3.5 + lint all code

All code reformated by PyCharm and inspection results analysed.
2019-12-11 14:27:35 +01:00
stevedodson
e8a0fbb9f3
Feature/pandas.0.25.3 (#91)
* Added example notebooks + pytest for these notebooks1

* Fixed paths

* Fixing link in docs

* Minor update for pandas 0.25.3

* Updates for pandas 0.25.3

* Fixing doc links with pandas 0.25.3 update.

* Reverting overwrite to changes to notebooks.
2019-12-10 16:05:37 +01:00
stevedodson
133b227b93
Added example notebooks + pytest for notebooks (#87)
* Added example notebooks + pytest for these notebooks1

* Fixed paths

* Fixing link in docs

* Adding cleaner demo_notebook
2019-12-10 15:27:13 +01:00
stevedodson
206276c5fa
Adding Apache 2 copyright header to all .py files (#86) 2019-12-06 09:44:05 +00:00
Stephen Dodson
86686ebb18 Reformat and cleanup based on PyCharm 2019-11-26 11:02:46 +00:00
Stephen Dodson
9b4fe40305 Updating docs + added supported methods doc 2019-11-19 10:42:23 +00:00
Stephen Dodson
2f4d601932 Adding eland.read_csv
TODO - resolve issue with ordering of eland.DataFrame compared to csv
2019-11-15 15:14:12 +00:00
Stephen Dodson
f5025b9f39 Renamed ed_to_pd eland_to_pandas and added docs.
+ added some additions to .gitignore
+ removed DataFrame.squeeze for now
2019-11-15 11:21:27 +00:00
Stephen Dodson
dff49d01fe More doc updates. 2019-11-13 18:23:43 +00:00
Stephen Dodson
e181476dfe First effort at tidying up docs. Still work-in-progress. 2019-11-12 20:26:59 +00:00
Stephen Dodson
c1ee409a33 Major cleanup - removed modin as dependency
modin removed as a dependency and iloc feature
removed for now - TODO add back in.
2019-11-04 13:13:42 +00:00
Stephen Dodson
9dad8613d3 Fixing tests, and upgrading to pandas 0.25.1 2019-10-18 08:06:07 +00:00
Stephen Dodson
1fa4d3fbe7 Partial implementation of hist - does not work
Backup push
2019-07-12 15:24:32 +00:00
Stephen Dodson
d71ce9f50c Adding drop + the ability for operations to have a query
Significant refactor - needs cleanup
2019-07-11 10:11:57 +00:00
Stephen Dodson
15e0c37182 Major refactor. eland is now backed by modin.
First push, still not functional.
2019-07-04 13:00:19 +00:00
Stephen Dodson
5e10b2e818 Checkpoint code before attempting major investigation into using modin 2019-07-03 09:49:58 +00:00
Stephen Dodson
9030f84f4c Added __getitem__
Implementation copies DataFrame and changes underlying mappings
object.
2019-06-25 08:41:25 +00:00
Stephen Dodson
2b83edad69 Added json file for pandas comparison
+ renamed from_es to read_es1
2019-06-12 12:12:40 +00:00
Stephen Dodson
f1e27f1dda First prototype code commit
Experimental prototype, for internal development use only!
2019-06-12 11:46:20 +00:00