Adding 'development' section to docs

Adding contributing section based on Elasticsearch/CONTRIBUTING.md TODO - add testing docs (based on CI)1
2025-07-11 00:02:14 +08:00 · 2019-11-20 10:32:35 +00:00 · 2019-11-20 10:32:35 +00:00 · 6564f26245
commit 6564f26245
parent 2a409962ea
9 changed files with 194 additions and 65 deletions
--- a/.gitignore
+++ b/.gitignore
@ -10,6 +10,12 @@ build/
 # docs build folder
 docs/build/
 # pytest results
 eland/tests/dataframe/results/
 eland/tests/dataframe/results/
 result_images/
 # Python egg metadata, regenerated from source files by setuptools.
 /*.egg-info
--- a/NOTES.md
+++ b/NOTES.md
@ -1,58 +0,0 @@
 # Implementation Notes
 The goal of an `eland.DataFrame` is to enable users who are familiar with `pandas.DataFrame` 
 to access, explore and manipulate data that resides in Elasticsearch. 
 Ideally, all data should reside in Elasticsearch and not to reside in memory.
 This restricts the API, but allows access to huge data sets that do not fit into memory, and allows
 use of powerful Elasticsearch features such as aggrergations.
 ## Implementation Details
 ### 3rd Party System Access
 Generally, integrations with [3rd party storage systems](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) 
 (SQL, Google Big Query etc.) involve accessing these systems and reading all external data into an 
 in-core pandas data structure. This also applies to [Apache Arrow](https://arrow.apache.org/docs/python/pandas.html) 
 structures.
 Whilst this provides access to data in these systems, for large datasets this can require significant
 in-core memory, and for systems such as Elasticsearch, bulk export of data can be an inefficient way
 of exploring the data.
 An alternative option is to create an API that proxies `pandas.DataFrame`-like calls to Elasticsearch
 queries and operations. This could allow the Elasticsearch cluster to perform operations such as
 aggregations rather than exporting all the data and performing this operation in-core.
 ### Implementation Options
 An option would be to replace the `pandas.DataFrame` backend in-core memory structures with Elasticsearch
 accessors. This would allow full access to the `pandas.DataFrame` APIs. However, this has issues:
 * If a `pandas.DataFrame` instance maps to an index, typical manipulation of a `pandas.DataFrame` 
 may involve creating many derived `pandas.DataFrame` instances. Constructing an index per 
 `pandas.DataFrame` may result in many Elasticsearch indexes and a significant load on Elasticsearch. 
 For example, `df_a = df['a']` should not require Elasticsearch indices `df` and `df_a`
 * Not all `pandas.DataFrame` APIs map to things we may want to do in Elasticsearch. In particular, 
 API calls that involve exporting all data from Elasticsearch into memory e.g. `df.to_dict()`. 
 * The backend `pandas.DataFrame` structures are not easily abstractable and are deeply embedded in 
 the implementation.
 Another option is to create a `eland.DataFrame` API that mimics appropriate aspects of 
 the `pandas.DataFrame` API. This resolves some of the issues above as:
 * `df_a = df['a']` could be implemented as a change to the Elasticsearch query used, rather 
 than a new index
 * Instead of supporting the enitre `pandas.DataFrame` API we can support a subset appropriate for
 Elasticsearch. If addition calls are required, we could to create a `eland.DataFrame._to_pandas()` 
 method which would explicitly export all data to a `pandas.DataFrame` 
 * Creating a new `eland.DataFrame` API gives us full flexibility in terms of implementation. However, 
 it does create a large amount of work which may duplicate a lot of the `pandas` code - for example,
 printing objects etc. - this creates maintenance issues etc.
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -25,8 +25,7 @@ sys.path.extend(
 # -- Project information -----------------------------------------------------
 project = 'eland'
-copyright = '2019, Stephen Dodson'
+copyright = '2019, Elasticsearch B.V.'
 author = 'Stephen Dodson'
 # The full version, including alpha/beta/rc tags
 release = '0.1'
@ -95,4 +94,4 @@ html_theme = "pandas_sphinx_theme"
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static']
+#html_static_path = ['_static']
--- a/docs/source/development/contributing.rst
+++ b/docs/source/development/contributing.rst
@ -0,0 +1,167 @@
 =====================
 Contributing to eland
 =====================
 Eland is an open source project and we love to receive contributions
 from our community — you! There are many ways to contribute, from
 writing tutorials or blog posts, improving the documentation, submitting
 bug reports and feature requests or writing code which can be
 incorporated into eland itself.
 Bug reports
 -----------
 If you think you have found a bug in eland, first make sure that you are
 testing against the `latest version of
 eland <https://github.com/elastic/eland>`__ - your issue may already
 have been fixed. If not, search our `issues
 list <https://github.com/elastic/eland/issues>`__ on GitHub in case a
 similar issue has already been opened.
 It is very helpful if you can prepare a reproduction of the bug. In
 other words, provide a small test case which we can run to confirm your
 bug. It makes it easier to find the problem and to fix it. Test cases
 should be provided as python scripts, ideally with some details of your
 Elasticsearch environment and index mappings, and (where appropriate) a
 pandas example.
 Provide as much information as you can. You may think that the problem
 lies with your query, when actually it depends on how your data is
 indexed. The easier it is for us to recreate your problem, the faster it
 is likely to be fixed.
 Feature requests
 ----------------
 If you find yourself wishing for a feature that doesn't exist in eland,
 you are probably not alone. There are bound to be others out there with
 similar needs. Many of the features that eland has today have been added
 because our users saw the need. Open an issue on our `issues
 list <https://github.com/elastic/eland/issues>`__ on GitHub which
 describes the feature you would like to see, why you need it, and how it
 should work.
 Contributing code and documentation changes
 -------------------------------------------
 If you have a bugfix or new feature that you would like to contribute to
 eland, please find or open an issue about it first. Talk about what you
 would like to do. It may be that somebody is already working on it, or
 that there are particular issues that you should know about before
 implementing the change.
 We enjoy working with contributors to get their code accepted. There are
 many approaches to fixing a problem and it is important to find the best
 approach before writing too much code.
 Note that it is unlikely the project will merge refactors for the sake
 of refactoring. These types of pull requests have a high cost to
 maintainers in reviewing and testing with little to no tangible benefit.
 This especially includes changes generated by tools. For example,
 converting all generic interface instances to use the diamond operator.
 The process for contributing to any of the `Elastic
 repositories <https://github.com/elastic/>`__ is similar. Details for
 individual projects can be found below.
 Fork and clone the repository
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 You will need to fork the main eland code or documentation repository
 and clone it to your local machine. See `github help
 page <https://help.github.com/articles/fork-a-repo>`__ for help.
 Further instructions for specific projects are given below.
 Submitting your changes
 ~~~~~~~~~~~~~~~~~~~~~~~
 Once your changes and tests are ready to submit for review:
 1. Test your changes
   Run the test suite to make sure that nothing is broken (TODO add link
   to testing doc).
 2. Sign the Contributor License Agreement
   Please make sure you have signed our `Contributor License
   Agreement <https://www.elastic.co/contributor-agreement/>`__. We are
   not asking you to assign copyright to us, but to give us the right to
   distribute your code without restriction. We ask this of all
   contributors in order to assure our users of the origin and
   continuing existence of the code. You only need to sign the CLA once.
 3. Rebase your changes
   Update your local repository with the most recent code from the main
   eland repository, and rebase your branch on top of the latest master
   branch. We prefer your initial changes to be squashed into a single
   commit. Later, if we ask you to make changes, add them as separate
   commits. This makes them easier to review. As a final step before
   merging we will either ask you to squash all commits yourself or
   we'll do it for you.
 4. Submit a pull request
   Push your local changes to your forked copy of the repository and
   `submit a pull
   request <https://help.github.com/articles/using-pull-requests>`__. In
   the pull request, choose a title which sums up the changes that you
   have made, and in the body provide more details about what your
   changes do. Also mention the number of the issue where discussion has
   taken place, eg “Closes #123”.
 Then sit back and wait. There will probably be discussion about the pull
 request and, if any changes are needed, we would love to work with you
 to get your pull request merged into eland.
 Please adhere to the general guideline that you should never force push
 to a publicly shared branch. Once you have opened your pull request, you
 should consider your branch publicly shared. Instead of force pushing
 you can just add incremental commits; this is generally easier on your
 reviewers. If you need to pick up changes from master, you can merge
 master into your branch. A reviewer might ask you to rebase a
 long-running pull request in which case force pushing is okay for that
 request. Note that squashing at the end of the review process should
 also not be done, that can be done when the pull request is `integrated
 via GitHub <https://github.com/blog/2141-squash-your-commits>`__.
 Contributing to the eland codebase
 ----------------------------------
 **Repository:** https://github.com/elastic/eland
 We internally develop using the PyCharm IDE. For PyCharm, we are
 currently using a minimum version of PyCharm 2019.2.4.
 Configuring PyCharm And Running Tests
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 (All commands should be run from module root)
 -  Create a new project via 'Check out from Version Control'->'Git'
   on the "Welcome to PyCharm" page (or other)
 -  Enter the URL to your fork of eland
   (e.g. ``git@github.com:stevedodson/eland.git``)
 -  Click 'Yes' for 'Checkout from Version Control'
 -  Configure PyCharm environment:
 -  In 'Preferences' configure a 'Project: eland'->'Project Interpreter'.
   Generally, we recommend creating a virtual environment (TODO link to
   installing for python version support).
 -  In 'Preferences' set 'Tools'->'Python Integrated Tools'->'Default
   test runner' to ``pytest``
 -  In 'Preferences' set 'Tools'->'Python Integrated Tools'->'Docstring
   format' to ``numpy``
 -  Install development requirements. Open terminal in virtual
   environment and run ``pip install -r requirements-dev.txt``
 -  Setup Elasticsearch instance (assumes ``localhost:9200``), and run
   ``python -m eland.tests.setup_tests`` to setup test environment -
   *note this modifies Elasticsearch indices*
 -  Run ``pytest --doctest-modules`` to validate install
 Documentation
 ~~~~~~~~~~~~~
 -  Install documentation requirements. Open terminal in virtual
   environment and run ``pip install -r requirements-dev.txt``
--- a/docs/source/development/index.rst
+++ b/docs/source/development/index.rst
@ -0,0 +1,10 @@
 .. _development:
 ===========
 Development
 ===========
 .. toctree::
   :maxdepth: 2
   contributing.rst 
--- a/docs/source/implementation/details.rst
+++ b/docs/source/implementation/details.rst
@ -9,7 +9,7 @@ to access, explore and manipulate data that resides in Elasticsearch.
 Ideally, all data should reside in Elasticsearch and not to reside in memory.
 This restricts the API, but allows access to huge data sets that do not fit into memory, and allows
-use of powerful Elasticsearch features such as aggrergations.
+use of powerful Elasticsearch features such as aggregations.
 Pandas and 3rd Party Storage Systems
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -24,6 +24,7 @@ In general, the data resides in elasticsearch and not in memory, which allows el
   reference/index
   implementation/index
   development/index
 * :doc:`reference/index`
@ -38,3 +39,7 @@ In general, the data resides in elasticsearch and not in memory, which allows el
  * :doc:`implementation/details`
  * :doc:`implementation/dataframe_supported`
 * :doc:`development/index`
  * :doc:`development/contributing`
--- a/eland/version.py
+++ b/eland/version.py
@ -1,6 +1,6 @@
 __title__ = 'eland'
 __description__ = 'Python elasticsearch client to analyse, explore and manipulate data that resides in elasticsearch.'
-__url__ = 'https://github.com/elastic/app-search-python'
+__url__ = 'https://github.com/elastic/eland'
-__version__ = '0.1'
+__version__ = '0.1a1'
 __maintainer__ = 'Elasticsearch B.V.'
 __maintainer_email__ = 'steve.dodson@elastic.co'
--- a/setup.py
+++ b/setup.py
@ -23,7 +23,7 @@ setup(
    maintainer_email=about['__maintainer_email__'],
    license='Apache 2.0',
    classifiers=[
-        'Development Status :: 4 - Beta',
+        'Development Status :: 3 - Alpha',
        'Intended Audience :: Developers',
        'License :: OSI Approved :: Apache Software License',
        'Programming Language :: Python :: 3.7',