Josh Devins
7209f61773
Adds max_length padding to transformer tracing ( #411 )
...
The padding parameter needs to be set on the tokenization call and not
in the constructor. Furthermore, the True value will only pad to the
largest input in a batch, however we don't trace with batches so this
value had no effect. The proper place to pass this parameter is in the
tokenization call itself and the proper value to use is "max_length"
which will pad the input to the maximum input size specified by the
model. Although we measure no functional or performance impact of this
setting, it has been suggested that this is a best practice.
See: https://huggingface.co/transformers/serialization.html#dummy-inputs-and-standard-lengths
2021-11-11 13:18:55 +01:00
Benjamin Trent
a3b0907c5b
[ML] Add inference results tests for PyTorch transformer models
2021-11-10 06:50:10 -06:00
Seth Michael Larson
66e3e4eaad
Set 'script.max_compilations_rate: use-context'
2021-11-02 10:09:25 -04:00
Josh Devins
1e5b475bee
Adds NLP with PyTorch basic example to README
...
The Machine Learning section now has two sub-sections — one for
traditional regression/classification and the other for NLP with
PyTorch. The examples show two ways to upload models from the Hugging
Face model hub.
2021-11-02 08:00:33 -05:00
Josh Devins
df51f8af07
Document how to install transitive binary dependencies, add repo Dockerfile
...
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-10-28 12:05:39 -05:00
Seth Michael Larson
19014f1227
Avoid DeprecationWarnings when using the new Elasticsearch client (7.15+)
2021-10-28 09:24:36 -05:00
Benjamin Trent
79b66eb6b4
Updating node type to larger ubuntu node ( #404 )
...
* Updating node type to larger ubuntu node
* adding torch location
* formatting
* formatting
;
* removing torch location specification
2021-10-25 14:48:26 -05:00
Benjamin Trent
d39c1cd784
[ML] Make eland_import_hub_model an installable script
2021-10-19 11:29:58 -05:00
P. Sai Vinay
704c8982bc
Optimize to_pandas() internally to improve performance
...
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-10-13 13:23:04 -05:00
James Rodewig
6088f2e39d
[DOCS] Retitle Eland Python Client docs book
2021-10-12 20:22:17 -05:00
P. Sai Vinay
f9d2defb1b
Add number_samples to sklearn MLModel
2021-10-07 08:14:54 -05:00
Josh Devins
014943d3b8
Add initial implementation of PyTorch ML models
2021-10-06 08:44:40 -05:00
P. Sai Vinay
995f2432b6
Add number_samples to LightGBM MLModel and leaf_count to leaf nodes
...
* Add number_samples to lightgbm ML Model
* Add leaf_count for leaf nodes
2021-09-30 08:13:44 -05:00
P. Sai Vinay
dabb327b8b
Refactor df.info() for better readability
2021-09-28 15:12:29 -05:00
P. Sai Vinay
bc201e22dd
Improve coverage for eland.dataframe
2021-09-28 15:11:57 -05:00
Seth Michael Larson
b8e192b7d0
Rename Jenkins job to 'main'
2021-09-28 10:07:16 -05:00
P. Sai Vinay
f241ae971a
Add flynt and --cov-report=term-missing
2021-09-21 11:18:01 -05:00
Seth Michael Larson
7aabc88e4a
Rename 'master' branch to 'main'
2021-09-08 11:51:50 -05:00
Jabin Kong
77f9a455e9
Fix docstring formatting
2021-09-07 11:40:19 -05:00
P. Sai Vinay
315f94b201
Add excluded lines for coverage and improve coverage
2021-09-07 11:39:19 -05:00
Seth Michael Larson
a50c3657c4
Release v7.14.1b1
v7.14.1b1
2021-08-30 13:42:55 -05:00
Seth Michael Larson
7a2e845a76
Speedup CI by only installing Nox in Dockerfile
2021-08-20 08:39:02 -05:00
Jabin Kong
1aa193da9e
Add iterrows()
and itertuples()
to DataFrame
...
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-08-20 08:34:52 -05:00
Seth Michael Larson
e4f88a34a6
Yield list of hits from _search_yield_hits() instead of individual hits
2021-08-17 12:16:10 -05:00
P. Sai Vinay
011bf29816
Simplify ES->pandas logic by removing Collectors
2021-08-16 12:22:02 -05:00
Seth Michael Larson
76d83ea47f
Bump version to 7.14.0b1
v7.14.0b1
2021-08-09 09:21:49 -05:00
Seth Michael Larson
b0c8434c06
Release 7.14.0b1
2021-08-09 09:11:57 -05:00
Seth Michael Larson
15ba8d3e02
Fallback on using scroll searches for Elasticsearch <7.12
...
PIT+search_after became universally safe in Elasticsearch 7.12 by adding an automatic sort tiebreaker field when using PITs called `_shard_doc` but now we need to do feature detection to make sure we use the previous scroll method on Elasticsearch <7.12 clusters
2021-08-08 12:19:41 -05:00
P. Sai Vinay
30876c8899
Switch to Point-in-Time with search_after instead of using scroll APIs
...
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
2021-08-07 16:05:33 -05:00
P. Sai Vinay
8f84a315be
Add test case for pseudohubererror for XGBoost
2021-08-06 15:59:48 -05:00
P. Sai Vinay
d3f8d7b8f6
Optimize FieldMappings.aggregate_field_name() method
2021-08-06 11:27:59 -05:00
Seth Michael Larson
54b497ed9a
Update supported versions of Python, pandas, and Elasticsearch
2021-08-04 13:21:17 -05:00
P. Sai Vinay
823f01cc6c
Add type hints to 'eland.operations' and 'eland.ndframe'
2021-08-02 11:50:35 -05:00
P. Sai Vinay
c0e861dc77
Fix installed pandas version on Jenkins
2021-07-31 12:51:11 -05:00
P. Sai Vinay
4c1af42c14
Add idxmax and idxmin methods to DataFrame
2021-07-28 07:55:26 -05:00
Seth Michael Larson
c74fccbd74
Drop support for Python 3.6, pandas<1.2
2021-07-27 14:43:03 -05:00
P. Sai Vinay
193bcb73ef
Add support for Pandas v1.3 and LightGBM v3.x
2021-07-27 11:01:35 -05:00
P. Sai Vinay
22475cdc46
Add PANDAS_VERSION to Jenkins matrix
2021-07-26 11:17:46 -05:00
Seth Michael Larson
1555ea9534
Fix typo in version number
...
Should be `7.13.0b1` instead of `7.13.1b1`
v7.13.0b1
2021-06-22 12:03:46 -05:00
Seth Michael Larson
16178dfb5d
Release 7.13.0b1
2021-06-22 11:59:27 -05:00
P. Sai Vinay
ac2efb5863
Optimize df.describe() to use aggregations instead of own query
2021-06-22 11:29:54 -05:00
P. Sai Vinay
5fe32a24df
Add quantile() to DataFrameGroupBy
2021-06-22 10:54:33 -05:00
P. Sai Vinay
7e8520a8ef
Remove deprecated code in XGBoost and test suite
2021-06-08 15:19:56 -05:00
P. Sai Vinay
e9c0b897f5
Add quantile() to DataFrame and Series
2021-06-08 13:02:44 -05:00
P. Sai Vinay
aa9d60e7e7
Add sort order to groupby dropna=False ( #322 )
...
* Add sort order to groupby dropna=False
* Fix rebase
2021-04-21 13:24:52 +00:00
Stephen Dodson
1040160451
Fix bugs with field mapping and lint issue ( #346 )
...
* Fix bugs with field mapping:
1. If no permission to call _mapping, return readable error
2. If index is wildcard, fix issues with user warnings
* Fixing lint issues
* Removing trailing backslashes in doc
* Remove pandas/matplotlib deprecation warning
This warning is due to a conflict between
pandas/matplotlib that may be resolved in a later
version. For now, ignore the warning so CI works.
2021-03-30 14:49:54 +00:00
Seth Michael Larson
985afe74e0
Release 7.10.1b1
7.10.1b1
2021-01-12 12:36:23 -06:00
Seth Michael Larson
26354622b5
Add more sections for elastic.co/guide
2021-01-12 10:26:01 -06:00
P. Sai Vinay
421d84fd20
Add mode() method to DataFrame and Series
2021-01-07 12:17:10 -06:00
P. Sai Vinay
27717eead1
Remove deprecated options and aliases
2021-01-04 13:20:45 -06:00