rugplot: Violin Plots#

Similar to histograms, violin plos are used to visualize an approximation of the distribution of one-dimensional continuos data. However, instead of showing bars, violin plots adjust a probability density function to fit a curve on the histogram, then the curve is rotated 90 degrees and finally reflected. Here we show how to create violin plots using the rugplot container. The famous Auto Imports Database, (Jeffrey, 1936) will be used, such dataset can be downloaded directly from DataHub autos, by running the following command

wget https://datahub.io/machine-learning/autos/r/autos.csv

or by adding the link in the previous command to the JSON template that will be created in the second step to create the histogram.

Creating violin plots using the rugplot container#

An alias can hide the length of the command, see the Docker commands section, but copying the raw command or PowerShell is also easy.

  1. Step 1, create a rugplot violin` JSON template

    rugplot template -p violin
    

    A violin_params.json file will be created including some of the name/value pairs listed below:

    {
        "description": "Parameters to create violin plots using the 'rugplot' R package",
        "filename": "<filename path>",
        "aesthetics": {
            "y_variable": "<Y required column name>",
            "x_variable": "''",
            "fill": null,
            "colour": null,
        },
        "boxplot": {
            "addboxplot": false,
        },
        "rotxlabs": 0,
        "save": {
            "save": false,
            "dpi": 72,
            "device": "pdf",
        }
    }
    
  2. Step 2, add the values below to the violin_params.json file:

    {
        "description": "Parameters to create violin plots using the 'rugplot' R package",
        "filename": "https://datahub.io/machine-learning/autos/r/autos.csv",
        "aesthetics": {
            "y_variable": "highway-mpg",
            "x_variable": "body-style",
            "colour": "body-style",
            "fill": "body-style",
        },
        "boxplot": {
            "addboxplot": true,
        },
        "rotxlabs": 45,
        "save": {
            "save": true,
            "dpi": 300,
            "device": "png",
        }
    }
    
  3. Step 3, create the violin plots

    rugplot plot -p violin --file violin_params.json
    

    The result as follows

    Error

    Error in ggplot2::geom_violin():

    ! Problem while computing aesthetics.

    ℹ Error occurred in the 1st layer.

    Note

    Special characters such as "-" may cause ggplot2 to break. In this case the easiest solution is to rename the column names highway-mpg and body_style to highwaympg and bodystyle in the autos.csv file.

    After changing the column names in the autos.csv file, updating the JSON file as follows

    {
        "filename": "autos.csv",
        "aesthetics": {
            "y_variable": "highwaympg",
            "x_variable": "bodystyle",
            "fill": "bodystyle",
            "colour": "bodystyle",
        },
    }
    

    and running the command in step 3 a png file with name like autos.csv-violin-20230203_121603.366.png will be saved.

    autos violin plot

A last example in this section is the creation of tikz LaTeX quality plots including a LaTeX formula, using the tikzDevice R package. The tikz version of the previous looks as follows:

violin tikz example

The changes to the JSON template are the following:

{
     "labels": {
         "title": "A \\\\\\\\LaTeX formula in the $x$ axis label",
         "x": "$p(x)=\\\\\\\\frac{1}{\\\\\\\\sqrt{2\\\\\\\\pi}}e^{-\\\\\\\\frac{x^2}{2}}$",
     }
    "save": {
        "device": "tikz",
        "sanitize": false
    }
}

and finally, run the command in step 3 again.

Note

LaTeX commands have to be escaped using eight backslashes and the "sanitize" property must be set to false. However, in axis breaks only four backslashes are needed, see the heatmap example.