New Reusable Data Science Method
Contents
New Reusable Data Science Method#
Note
This documentation section is under active development.
The RUG Docker-CDS methods are designed to run from a command line
interface (CLI) and from a web graphical user interface (GUI). To
achieve this, we use JSON objects and JSON schemas. In the JSON
objects three main kinds of name/value pairs should be defined,
input data
, method's parameters
and output specification
format
. The JSON schema is mainly used to validate the JSON object
and eventually will be used to generate a JSON form to
collect inputs.
The containerization of data science methods will follow a
hierarchical structure. Currently, two categories have been defined
Visualization Techniques
and Containerized papers
. New
categories can be added as needed. New methods will be included in an
appropriate branch of the hierarchical tree or in a new category if
necessary.
Implementing a new data science method#
Chose a method.
Verify that the method has not been implemented yet.
Define a JSON schema to validate the input parameters.
Implement the method considering the following:
Develop your method directly in a Docker container such as gcc , OpenJDK , rocker, Jupyter Project or you can build your own Docker image.
The main function receive parameters derived from a JSON object/file.
The parameters are validated using the JSON schema defined in the previous step.
Make sure you can run the method from the command line interface. In Python you could use the argparse Python package or in R you can use argparse R package. For example, an
R
method can be executed as follows:Rscript method.R [options] <parameters.json>
or using a bash-style script
#!<path to interpreter>
method.R [options] <parameters.json>
In fact, method.R can be any executable file and
parameters.json
is a required argument containing essential information to run the method.Follow standards for command line interfaces according to GNU.
Enable functions such as
?
orhelp
to display documentation. In R you could create an R-package and use roxigen2. In Python you could use Sphinx.Create appropriate unit tests.
Identify a potential Docker image to embed the method or build a minimal container. In the latter case, use the command to run the method as
ENTRYPOINT
in the Dockerfile.Follow Dockerfile best practices.
Push the docker image to Docker Hub.
Update RUG Docker-CDS documentation by adding the new method in the contents section including details and examples to run the method.
Examples of implemented methods can be found in the rvispack package.