New Reusable Data Science Method#


This documentation section is under active development.

The RUG Docker-CDS methods are designed to run from a command line interface (CLI) and from a web graphical user interface (GUI). To achieve this, we use JSON objects and JSON schemas. In the JSON objects three main kinds of name/value pairs should be defined, input data, method's parameters and output specification format. The JSON schema is mainly used to validate the JSON object and eventually will be used to generate a JSON form to collect inputs.

The containerization of data science methods will follow a hierarchical structure. Currently, two categories have been defined Visualization Techniques and Containerized papers. New categories can be added as needed. New methods will be included in an appropriate branch of the hierarchical tree or in a new category if necessary.

Implementing a new data science method#

  1. Chose a method.

  2. Verify that the method has not been implemented yet.

  3. Define a JSON schema to validate the input parameters.

  4. Implement the method considering the following:

    • Develop your method directly in a Docker container such as gcc , OpenJDK , rocker, Jupyter Project or you can build your own Docker image.

    • The main function receive parameters derived from a JSON object/file.

    • The parameters are validated using the JSON schema defined in the previous step.

    • Make sure you can run the method from the command line interface. In Python you could use the argparse Python package or in R you can use argparse R package. For example, an R method can be executed as follows:

      Rscript method.R [options] <parameters.json>

      or using a bash-style script #!<path to interpreter>

      method.R [options] <parameters.json>

      In fact, method.R can be any executable file and parameters.json is a required argument containing essential information to run the method.

    • Follow standards for command line interfaces according to GNU.

    • Enable functions such as ? or help to display documentation. In R you could create an R-package and use roxigen2. In Python you could use Sphinx.

    • Create appropriate unit tests.

    • Identify a potential Docker image to embed the method or build a minimal container. In the latter case, use the command to run the method as ENTRYPOINT in the Dockerfile.

    • Follow Dockerfile best practices.

    • Push the docker image to Docker Hub.

  5. Update RUG Docker-CDS documentation by adding the new method in the contents section including details and examples to run the method.

Examples of implemented methods can be found in the rvispack package.