rugplot: Histograms#

A histogram is commonly used to visualize an approximation of the distribution of one-dimensional continuos data. Here some examples are given to show how to create histograms using the rugplot container. The famous iris dataset, (Fisher, 1936) will be used to create such examples. The dataset can be downloaded directly from DataHub, by running the following command

wget https://datahub.io/machine-learning/iris/r/iris.csv

or by adding the link in the previous command to the JSON template that will be created in the second step to create the histogram.

Creating a histogram using the rugplot container#

For simplicity it is better to create an alias, see the Docker commands section.

  1. Step 1, create a rugplot histogram template

    rugplot template -p histogram
    

    A histogram_params.json file will be created including some of the name/value pairs listed below:

    {
        "description": "Parameters to create a histogram(s) using the 'rugplot' R package",
        "filename": "<filename path>",
        "variables": null,
        "aesthetics": {
            "y_variable": null,
            "x_variable": "<X required column name>",
            "fill": null,
        },
        "labels": {
            "title": null,
            "subtitle": null,
        },
    }
    
  2. Step 2, add the 'data file', 'y variable' and the 'title' values in the template:

    {
        "filename": "https://datahub.io/machine-learning/iris/r/iris.csv",
        "aesthetics": {
            "x_variable": "sepallength",
        },
        "labels": {
            "title": "Sepal length histogram",
        },
    }
    
  3. Step 3, create the histogram

    rugplot plot -p histogram --file histogram_params.json
    

    The result will be stored in the Rplots.pdf file.

    pca projection result

Customizing the histogram#

Different attributes can be customized such as other labels, colours and file format. For example, adding the values below (to save space, only the updated name/value are listed) in histogram_params.json

"colour": "class",
"labels": {
    "x": "Sepal length",
},
"save": {
    "save": true,
    "outputfilename": "sepal-length_histogram.png",
    "device": "png",
}

and running the exactly same command in step 3 will produce the following visualization stored in a png file.

pca projection result

The png file has the default size 10x15 cm (height/width) and 72 dots per inch. These properties can be changed in the "save" attributes of the JSON file.

Other properties can also be added such as facets, interactive plots and LaTeX tikDevice plots.