RUG Docker-CDS
Contents
Note
This project is under active development.
RUG Docker-CDS#
RUG Docker-CDS are a set of containerized data science methods. The containers are designed to run as black boxes executed from a command line or graphical interface (CLI or GUI). The input of a method is defined in a JSON file including information such as the data file, the parameters of the method and the output formats.
The User Guide on ReadTheDocs provides the latest documentation of the containerized data science methods.
CLI Quick Start#
To run the examples below you need to have Docker installed.
Visualization techniques#
This command pulls the venustiano/rugplot:0.1.0
image from
Docker Hub in case it is not present on the local host. Then, it
displays information about the containerized package and the available
visualization techniques. Finally, it removes the stopped container
(using the --rm
flag).
docker run --rm venustiano/rugplot:0.1.0
To create visualizations using this image, you need a tabular data
file and a JSON object stored in the current working directory. The
supported file formats are defined by
fread
function
implemented in the R data.table
package.
Violin plots:
Download the data.
wget https://raw.githubusercontent.com/rijksuniversiteit-groningen/rvispack/master/tests/testthat/data/iris.csv
1. Create a violin JSON template#
PowerShell#
docker run --rm -v ${PWD}:/app/data venustiano/rugplot:0.1.0 `
template -p violin
Linux#
docker run --rm -v "$PWD":/app/data venustiano/rugplot:0.1.0 \
template -p violin
2. Update the following key/value pairs in the violin_params.json
file as follows:#
{
"filename": "iris.csv",
"aesthetics": {
"y_variable": "sepal_length"
}
}
Create the visualization under Linux or MacOS#
docker run --rm -v "$PWD":/app/data venustiano/rugplot:0.1.0 \
plot -p violin -f violin_params.json
Creating the visualization using Windows powershell#
docker run --rm -v ${PWD}:/app/data venustiano/rugplot:0.1.0 `
plot -p violin -f violin_params.json
will produce a violin plot in the Rplots.pdf
file.
The -v
flag mounts the current working directory ($PWD
) as
/app/data
folder in the container, -p violin
is the plot function
and -f violin_params.json
is the file that contains the information
to create the violin plot.
Another example#
Create a JSON template called mpg_params.csv
PowerShell#
docker run --rm -v ${PWD}:/app/data venustiano/rugplot:0.1.0 `
template -p violin -f mpg_params.json
Linux#
docker run --rm -v "$PWD":/app/data venustiano/rugplot:0.1.0 \
template -p violin -f mpg_params.json
2. Update the following key/value pairs in the mpg_params.json
file as follows:#
{
"filename": "ggplotmpg.csv",
"aesthetics": {
"y_variable": "hwy",
"x_variable": "class",
"factorx": false,
"fill": "class",
"colour": "class",
},
"rotxlabs": 45,
"boxplot": {
"addboxplot": true,
"width": 0.1
},
"save":{
"save": true,
"width": 15,
"height": 10,
"device": "png"
}
}
Download the data#
wget https://raw.githubusercontent.com/rijksuniversiteit-groningen/rvispack/master/tests/testthat/data/ggplotmpg.csv
Create the visualization#
Linux#
docker run --rm -v "$PWD":/app/data venustiano/rugplot:0.1.0 \
plot -p violin -f mpg_params.json
PowerShell#
docker run --rm -v ${PWD}:/app/data venustiano/rugplot:0.1.0 `
plot -p violin -f mpg_params.json
Using singularity#
singularity build pcr.sif docker://venustiano/rugplot:0.1.0
./pcr.sif
./pcr.sif plot -p violin -f mpg_params.json
Contributing#
Please see the Contributor Guide on ReadTheDocs for information about how to contribute updates, features, tests and community maintained methods.