Skip to contents

Introduction

In this vignette I explain the conventions of dmplot and the design choices - this will guide you in working with data and effectively leveraging dmplot and the ggplot2 framework to derive insight from your analyses.

dmplot is not limited to financial data, indeed it’s conventions and design choices are applicable to any time series data. However, the package does indeed have a focus on financial data analysis and visualisation.

Licensing

The dmplot package is released under the MIT license, allowing free use and modification. Users must:

  1. Cite the original author (see LICENSE for details).
  2. Include the license in any redistribution.

Setup

Let’s install the necessary libraries.

I often use the following packages:

  • data.table for working with large datasets.
  • dmplot for plotting financial and time series datasets.
  • kucoin for interacting with the KuCoin API - getting cryptocurrency financial data.
  • ggplot2 for plotting.
  • box for loading modules in R.

I strongly recommend using data.table for any work in finance. This is indeed one of the primary reasons why data.table was created - to work with large datasets efficiently. For installing data.table on M1 MacOS consult this guide for building from source: gist.github.com/dereckmezquita/ed860601138a46cf591a1bdcc95db0a2

install.packages("data.table", type = "source")
install.packages("TTR")
install.packages("ggplot2")
install.packages("box")

remotes::install_github("dereckmezquita/dmplot")
remotes::install_github("dereckmezquita/kucoin")

Now, let’s load the required libraries:

box::use(kucoin)
box::use(dt = data.table)

dmplot and “Tidy Data” for Financial Data Analysis and Visualization

The dmplot package provides a toolkit for plotting financial and time series datasets in the ggplot2 framework. It includes functions for plotting candlestick charts, moving averages, Bollinger Bands, MACD, RSI, and Stochastic Oscillator.

In order to best leverage ggplot2 to, and thus visualise our financial analyses we must adhere to the “tidy data” convention. Whereby each column is a variable, each row is an observation, and each cell is a single value. This is the format that dmplot expects.

Thus, any calculated indicators should be added as new columns to the dataset and this is what dmplot expects.

I offer two points of guidance:

  1. Use the data.table.
  2. Functions must return a named list of values.

The reason for the first point is that data.table is a powerful package for working with large datasets and is highly efficient. The second point is that if you return a named list of values you can easily use such function within data.table to create new columns.

petal_ratios <- function(petal_length, petal_width, sepal_length, sepal_width) {
    petal_ratio <- petal_length / petal_width
    sepal_ratio <- sepal_length / sepal_width
    return(list(petal_ratio = petal_ratio, sepal_ratio = sepal_ratio))
}

iris2 <- dt$as.data.table(iris)

head(iris2)
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <num>       <num>        <num>       <num>  <fctr>
#> 1:          5.1         3.5          1.4         0.2  setosa
#> 2:          4.9         3.0          1.4         0.2  setosa
#> 3:          4.7         3.2          1.3         0.2  setosa
#> 4:          4.6         3.1          1.5         0.2  setosa
#> 5:          5.0         3.6          1.4         0.2  setosa
#> 6:          5.4         3.9          1.7         0.4  setosa

iris2[,
    c("petal_ratio", "sepal_ratio") := petal_ratios(
        Petal.Length,
        Petal.Width,
        Sepal.Length,
        Sepal.Width
    )
]

head(iris2)
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal_ratio
#>           <num>       <num>        <num>       <num>  <fctr>       <num>
#> 1:          5.1         3.5          1.4         0.2  setosa        7.00
#> 2:          4.9         3.0          1.4         0.2  setosa        7.00
#> 3:          4.7         3.2          1.3         0.2  setosa        6.50
#> 4:          4.6         3.1          1.5         0.2  setosa        7.50
#> 5:          5.0         3.6          1.4         0.2  setosa        7.00
#> 6:          5.4         3.9          1.7         0.4  setosa        4.25
#> 1 variable(s) not shown: [sepal_ratio <num>]

As you can see sticking to this convention allows us to easily and efficiently leverage the data.table framework to calculate new columns and thus new indicators.

Loading Sample Data

We’ll use the same sample data as in the README:

ticker <- "BTC/USDT"

data <- get_market_data(
    symbols = ticker,
    from = lubridate::now() - lubridate::days(7),
    to = lubridate::now(),
    frequency = "1 hour"
)
head(data)
#>      symbol            datetime    open    high     low   close   volume
#>      <char>              <POSc>   <num>   <num>   <num>   <num>    <num>
#> 1: BTC/USDT 2024-07-06 05:00:00 56057.9 56485.8 56025.2 56363.1 79.48139
#> 2: BTC/USDT 2024-07-06 06:00:00 56363.1 56573.3 56346.6 56413.8 41.94569
#> 3: BTC/USDT 2024-07-06 07:00:00 56413.7 56672.4 56402.5 56602.2 98.02022
#> 4: BTC/USDT 2024-07-06 08:00:00 56602.2 56655.3 56508.8 56555.8 49.06419
#> 5: BTC/USDT 2024-07-06 09:00:00 56555.9 56767.0 56441.6 56759.9 42.89421
#> 6: BTC/USDT 2024-07-06 10:00:00 56757.4 56887.1 56645.3 56786.6 45.64148
#> 1 variable(s) not shown: [turnover <num>]

Calculating Financial Indicators

I provide a host of functions for calculating financial indicators in the dmplot package. These functions are designed to be used within the data.table framework and return a named list of values.

However, if you find the need use outside functions you can easily wrap them in a function which returns a named list of values.

Here we demonstrate with TTT::EMA and TTT::BBands:

box::use(TTR[EMA, BBands])
box::use(dmplot[ bb ])

# redifine our function to return a list
ema <- function(x, n, wilder = TRUE) {
    return(as.list(as.data.frame(EMA(x, n = n, wilder = wilder))))
}

# calculate the short and long moving averages
data[, ema_short := ema(close, n = 10, wilder = TRUE)]
data[, ema_long := ema(close, n = 50, wilder = TRUE)]

# calculate the bollinger bands
data[,
    c("bb_lower", "bb_mavg", "bb_upper", "bb_pct") := bb(
        close, n = 10,
        sd = 2
    )
]

head(data)
#>      symbol            datetime    open    high     low   close   volume
#>      <char>              <POSc>   <num>   <num>   <num>   <num>    <num>
#> 1: BTC/USDT 2024-07-06 05:00:00 56057.9 56485.8 56025.2 56363.1 79.48139
#> 2: BTC/USDT 2024-07-06 06:00:00 56363.1 56573.3 56346.6 56413.8 41.94569
#> 3: BTC/USDT 2024-07-06 07:00:00 56413.7 56672.4 56402.5 56602.2 98.02022
#> 4: BTC/USDT 2024-07-06 08:00:00 56602.2 56655.3 56508.8 56555.8 49.06419
#> 5: BTC/USDT 2024-07-06 09:00:00 56555.9 56767.0 56441.6 56759.9 42.89421
#> 6: BTC/USDT 2024-07-06 10:00:00 56757.4 56887.1 56645.3 56786.6 45.64148
#> 7 variable(s) not shown: [turnover <num>, ema_short <num>, ema_long <num>, bb_lower <num>, bb_mavg <num>, bb_upper <num>, bb_pct <num>]