Note

This project is under active development.

RUG Docker-CDS#

GitHub Read the Docs GitHub commit activity

RUG Docker-CDS are a set of containerized data science methods. The containers are designed to run as black boxes executed from a command line or graphical interface (CLI or GUI). The input of a method is defined in a JSON file including information such as the data file, the parameters of the method and the output formats.

The User Guide on ReadTheDocs provides the latest documentation of the containerized data science methods.

CLI Quick Start#

To run the examples below you need to have Docker installed.

Visualization techniques#

This command pulls the venustiano/rugplot:0.1.0 image from Docker Hub in case it is not present on the local host. Then, it displays information about the containerized package and the available visualization techniques. Finally, it removes the stopped container (using the --rm flag).

docker run --rm venustiano/rugplot:0.1.0

To create visualizations using this image, you need a tabular data file and a JSON object stored in the current working directory. The supported file formats are defined by fread function implemented in the R data.table package.

Violin plots:

Download the data.

wget https://raw.githubusercontent.com/rijksuniversiteit-groningen/rvispack/master/tests/testthat/data/iris.csv

1. Create a violin JSON template#

PowerShell#
docker run --rm -v ${PWD}:/app/data venustiano/rugplot:0.1.0 `
template -p violin
Linux#
docker run --rm -v "$PWD":/app/data venustiano/rugplot:0.1.0 \
template -p violin

2. Update the following key/value pairs in the violin_params.json file as follows:#

{
        "filename": "iris.csv",
        "aesthetics": {
            "y_variable": "sepal_length"
        }
}

Create the visualization under Linux or MacOS#

docker run --rm -v "$PWD":/app/data venustiano/rugplot:0.1.0 \
plot -p violin -f violin_params.json

Creating the visualization using Windows powershell#

docker run --rm -v ${PWD}:/app/data venustiano/rugplot:0.1.0 `
plot -p violin -f violin_params.json

will produce a violin plot in the Rplots.pdf file.

Violin plot

The -v flag mounts the current working directory ($PWD) as /app/data folder in the container, -p violin is the plot function and -f violin_params.json is the file that contains the information to create the violin plot.

Another example#

Create a JSON template called mpg_params.csv

PowerShell#

docker run --rm -v ${PWD}:/app/data venustiano/rugplot:0.1.0 `
template -p violin -f mpg_params.json

Linux#

docker run --rm -v "$PWD":/app/data venustiano/rugplot:0.1.0 \
template -p violin -f mpg_params.json

2. Update the following key/value pairs in the mpg_params.json file as follows:#

{
    "filename": "ggplotmpg.csv",
    "aesthetics": {
        "y_variable": "hwy",
        "x_variable": "class",
        "factorx": false,
        "fill": "class",
        "colour": "class",
    },
    "rotxlabs": 45,
    "boxplot": {
                "addboxplot": true,
            "width": 0.1
    },
        "save":{
          "save": true,
          "width": 15,
          "height": 10,
          "device": "png"
        }
}

Download the data#

wget https://raw.githubusercontent.com/rijksuniversiteit-groningen/rvispack/master/tests/testthat/data/ggplotmpg.csv

Create the visualization#

Linux#
docker run --rm -v "$PWD":/app/data venustiano/rugplot:0.1.0 \
plot -p violin -f  mpg_params.json
PowerShell#
docker run --rm -v ${PWD}:/app/data venustiano/rugplot:0.1.0 `
plot -p violin -f  mpg_params.json

MPG violinplots

Using singularity#

singularity build pcr.sif docker://venustiano/rugplot:0.1.0
./pcr.sif
./pcr.sif plot -p violin -f mpg_params.json

Contributing#

Please see the Contributor Guide on ReadTheDocs for information about how to contribute updates, features, tests and community maintained methods.

RUG Docker-CDS Containerized Data Science#