eland

mirror of https://github.com/elastic/eland.git synced 2025-07-11 00:02:14 +08:00

Author	SHA1	Message	Date
Benjamin Trent	afe08f8107	[ML] Improve NLP model import by using nicely defined types (#459 ) This adds some more definite types for our NLP tasks and tokenization configurations. This is the first step in allowing users to more easily import their own transformer models via something other than hugging face.	2022-05-03 15:19:03 -04:00
David Olaru	fe3422100c	Hub model import script improvements (#461 ) ## Changes ### Better logging Switched from `print` statements to `logging` for a cleaner and more informative output - timestamps and log level are shown. The logging is now a bit more verbose, but it will help users to better understand what the script is doing. ### Add support for ES authentication using username/password or api key Instead of being limited to passing credentials in the URL, there are now 2 additional methods: - username/password using `--es-username` and `--es-password` - API key using `--es-api-key` Credentials can also be specified as environment variables with `ES_USERNAME`/`ES_PASSWORD` or `ES_API_KEY` ### Graceful handling of missing PyTorch requirements In order to use the `eland_import_hub_model` script, PyTorch extras are required to be installed. If the user does not have the required packages installed, a helpful message is logged with a hint to install `eland[pytorch]` with `pip`. ### Graceful handling of already existing trained model If a trained model with the same ID as the one we're trying to import already exists, and `--clear-previous` was not specified, we now log a clearer message about why the script can't proceed along with a hint to use the `--clear-previous` flag. Prior to this change, we were letting the API exception seep through and the user was faced with a stack trace. ### `tqdm` added to main dependencies If the user doesn't have `eland[pytorch]` extras installed, the first module to be reported as missing is `tqdm`. Since this module is [used in eland codebase](`8294224e34/eland/ml/pytorch/_pytorch_model.py (L24)`) directly, it makes sense to me to have it as part of the main set of requirements. ### Nit: Set tqdm unit to `parts` in `_pytorch_model.put_model` The default unit is `it`, but `parts` better describes what the progress bar is tracking - uploading trained model definition parts.	2022-04-27 15:13:58 +01:00
Benjamin Trent	8294224e34	[ML] Fix XGBoost model import for xgboost>=1.6	2022-04-20 09:20:50 -05:00
Seth Michael Larson	cb839a9ac9	Release 8.1.0	2022-03-31 17:12:26 -05:00
P. Sai Vinay	76a52b7947	Add support for eland.Series.unqiue()	2022-03-31 08:33:15 -05:00
Benjamin Trent	15a3007288	[ML] add roberta bart transformer upload support (#443 ) Related to: https://github.com/elastic/elasticsearch/pull/84777 This allows BART and RoBERTa models to be uploaded to Elasticsearch for our currently defined NLP tasks.	2022-03-14 12:26:12 -04:00
David Kyle	5678525b15	Fix mypy type errors for elasticsearch-python v8.0.0	2022-03-08 17:50:39 -06:00
Seth Michael Larson	abd05df50b	Release 8.0.0	2022-02-10 14:29:54 -06:00
Ashton Sidhu	e3bff8a623	Add option to disable schema enforcement for `pandas_to_eland`	2022-01-14 07:35:58 -06:00
Benjamin Trent	72856e2c3f	[ML] Add support for MPNet PyTorch models	2022-01-10 11:21:30 -06:00
Ashton Sidhu	64daa07a65	Using the 'date' field for datetime64+timezone columns	2022-01-04 22:03:49 -06:00
Florian Winkler	3db93cd789	Allow using datetime types in filters	2022-01-04 14:46:18 -06:00
Seth Michael Larson	c14bc24032	Release 8.0.0-beta1	2021-12-16 07:42:38 -06:00
Seth Michael Larson	cd0897f5d7	Add a warning when connecting to incompatible Elasticsearch versions	2021-12-15 14:08:20 -06:00
Seth Michael Larson	109387184a	Support the v8.0 Elasticsearch client	2021-12-09 15:01:26 -06:00
Seth Michael Larson	4e489de424	Bump version to 8.0.0	2021-12-02 08:41:11 -06:00
Josh Devins	5bc1a824a7	Add PyTorch modules to noxfile We added the `pytorch` module which is type checked but was not in the noxfile as such. This change also addresses type errors that arose after adding type checking.	2021-11-29 08:03:25 -08:00
Josh Devins	7209f61773	Adds max_length padding to transformer tracing (#411 ) The padding parameter needs to be set on the tokenization call and not in the constructor. Furthermore, the True value will only pad to the largest input in a batch, however we don't trace with batches so this value had no effect. The proper place to pass this parameter is in the tokenization call itself and the proper value to use is "max_length" which will pad the input to the maximum input size specified by the model. Although we measure no functional or performance impact of this setting, it has been suggested that this is a best practice. See: https://huggingface.co/transformers/serialization.html#dummy-inputs-and-standard-lengths	2021-11-11 13:18:55 +01:00
Benjamin Trent	a3b0907c5b	[ML] Add inference results tests for PyTorch transformer models	2021-11-10 06:50:10 -06:00
Seth Michael Larson	19014f1227	Avoid DeprecationWarnings when using the new Elasticsearch client (7.15+)	2021-10-28 09:24:36 -05:00
P. Sai Vinay	704c8982bc	Optimize to_pandas() internally to improve performance Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2021-10-13 13:23:04 -05:00
P. Sai Vinay	f9d2defb1b	Add number_samples to sklearn MLModel	2021-10-07 08:14:54 -05:00
Josh Devins	014943d3b8	Add initial implementation of PyTorch ML models	2021-10-06 08:44:40 -05:00
P. Sai Vinay	995f2432b6	Add number_samples to LightGBM MLModel and leaf_count to leaf nodes * Add number_samples to lightgbm ML Model * Add leaf_count for leaf nodes	2021-09-30 08:13:44 -05:00
P. Sai Vinay	dabb327b8b	Refactor df.info() for better readability	2021-09-28 15:12:29 -05:00
P. Sai Vinay	f241ae971a	Add flynt and --cov-report=term-missing	2021-09-21 11:18:01 -05:00
Seth Michael Larson	7aabc88e4a	Rename 'master' branch to 'main'	2021-09-08 11:51:50 -05:00
Jabin Kong	77f9a455e9	Fix docstring formatting	2021-09-07 11:40:19 -05:00
P. Sai Vinay	315f94b201	Add excluded lines for coverage and improve coverage	2021-09-07 11:39:19 -05:00
Seth Michael Larson	a50c3657c4	Release v7.14.1b1	2021-08-30 13:42:55 -05:00
Jabin Kong	1aa193da9e	Add `iterrows()` and `itertuples()` to DataFrame Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2021-08-20 08:34:52 -05:00
Seth Michael Larson	e4f88a34a6	Yield list of hits from _search_yield_hits() instead of individual hits	2021-08-17 12:16:10 -05:00
P. Sai Vinay	011bf29816	Simplify ES->pandas logic by removing Collectors	2021-08-16 12:22:02 -05:00
Seth Michael Larson	76d83ea47f	Bump version to 7.14.0b1	2021-08-09 09:21:49 -05:00
Seth Michael Larson	15ba8d3e02	Fallback on using scroll searches for Elasticsearch <7.12 PIT+search_after became universally safe in Elasticsearch 7.12 by adding an automatic sort tiebreaker field when using PITs called `_shard_doc` but now we need to do feature detection to make sure we use the previous scroll method on Elasticsearch <7.12 clusters	2021-08-08 12:19:41 -05:00
P. Sai Vinay	30876c8899	Switch to Point-in-Time with search_after instead of using scroll APIs Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>	2021-08-07 16:05:33 -05:00
P. Sai Vinay	d3f8d7b8f6	Optimize FieldMappings.aggregate_field_name() method	2021-08-06 11:27:59 -05:00
P. Sai Vinay	823f01cc6c	Add type hints to 'eland.operations' and 'eland.ndframe'	2021-08-02 11:50:35 -05:00
P. Sai Vinay	4c1af42c14	Add idxmax and idxmin methods to DataFrame	2021-07-28 07:55:26 -05:00
Seth Michael Larson	c74fccbd74	Drop support for Python 3.6, pandas<1.2	2021-07-27 14:43:03 -05:00
P. Sai Vinay	193bcb73ef	Add support for Pandas v1.3 and LightGBM v3.x	2021-07-27 11:01:35 -05:00
Seth Michael Larson	1555ea9534	Fix typo in version number Should be `7.13.0b1` instead of `7.13.1b1`	2021-06-22 12:03:46 -05:00
Seth Michael Larson	16178dfb5d	Release 7.13.0b1	2021-06-22 11:59:27 -05:00
P. Sai Vinay	ac2efb5863	Optimize df.describe() to use aggregations instead of own query	2021-06-22 11:29:54 -05:00
P. Sai Vinay	5fe32a24df	Add quantile() to DataFrameGroupBy	2021-06-22 10:54:33 -05:00
P. Sai Vinay	7e8520a8ef	Remove deprecated code in XGBoost and test suite	2021-06-08 15:19:56 -05:00
P. Sai Vinay	e9c0b897f5	Add quantile() to DataFrame and Series	2021-06-08 13:02:44 -05:00
P. Sai Vinay	aa9d60e7e7	Add sort order to groupby dropna=False (#322 ) * Add sort order to groupby dropna=False * Fix rebase	2021-04-21 13:24:52 +00:00
Stephen Dodson	1040160451	Fix bugs with field mapping and lint issue (#346 ) * Fix bugs with field mapping: 1. If no permission to call _mapping, return readable error 2. If index is wildcard, fix issues with user warnings * Fixing lint issues * Removing trailing backslashes in doc * Remove pandas/matplotlib deprecation warning This warning is due to a conflict between pandas/matplotlib that may be resolved in a later version. For now, ignore the warning so CI works.	2021-03-30 14:49:54 +00:00
Seth Michael Larson	985afe74e0	Release 7.10.1b1	2021-01-12 12:36:23 -06:00

1 2 3 4 5 ...

355 Commits