C++ high-performance technical indicators • dmplot

Introduction

The dmplot package provides a set of high-performance technical indicators commonly used in financial analysis. These indicators are implemented in C++ to achieve maximum computational efficiency, making them suitable for large-scale data analysis and high-frequency trading applications.

In this document, we focus specifically on discussing the C++ Implementation of each algorithm. For the mathematics behind each indicator, please refer to the documentation: dmplot documentation.

Licensing

The dmplot package is released under the MIT license, allowing free use and modification. Users must:

Cite the original author (see LICENSE for details).
Include the license in any redistribution.

C++ Implementation and Performance

All technical indicator functions in dmplot are implemented in C++ using the Rcpp framework. This approach offers several advantages:

Speed: C++ is significantly faster than pure R code, especially for computationally intensive tasks.
Memory efficiency: C++ allows for more efficient memory management, crucial when dealing with large datasets.
Vectorisation: The implementations take advantage of C++’s ability to efficiently process vectors of data.

Each function has been carefully optimised and benchmarked to ensure the best possible performance. Users can expect these implementations to outperform equivalent R code in any setting. If you can beat the performance of any of my C++ implementations, please open an issue on the GitHub repository we’d love to hear from you!

Now, let’s explore each indicator, focusing on its mathematical foundation and its optimised C++ implementation.

Bollinger Bands (BB)

Rcpp::List bb(std::vector<double> price, int n, int sd = 2) {
    // calculate the simple moving average
    std::vector<double> mavg = sma(price, n);

    // pre-allocate std::vector with 0 values for the standard deviation
    std::vector<double> std_dev(price.size(), 0);

    // calculate the standard deviation
    for (int i = n - 1; i < price.size(); i++) {
        // population standard deviation is used
        // delta = sqrt(sum((x_i - mean) * (x_i - mean)) / n)
        double sum = 0;
        for (int j = i - n + 1; j <= i; j++) {
            sum += std::pow(price[j] - mavg[i], 2);
        }
        std_dev[i] = std::sqrt(sum / (double) n);
    }

    // calculate the upper and lower bands
    std::vector<double> upper_bound(price.size(), 0);
    std::vector<double> lower_bound(price.size(), 0);
    std::vector<double> pct(price.size(), 0);

    for (int i = 0; i < mavg.size(); i++) {
        upper_bound[i] = mavg[i] + sd * std_dev[i];
        lower_bound[i] = mavg[i] - sd * std_dev[i];
        pct[i] = (price[i] - lower_bound[i]) / (upper_bound[i] - lower_bound[i]);
    }

    List result = List::create(
        _["bb_lower"] = lower_bound,
        _["bb_mavg"] = mavg,
        _["bb_upper"] = upper_bound,
        _["bb_pct"] = pct
    );

    return result;
}

Brief Explanation of C++ Implementation

The Bollinger Bands algorithm is implemented in C++ using the following steps:

Calculate the simple moving average (SMA) of the price data.
Compute the standard deviation for each point using a rolling window.
Calculate upper and lower bands by adding/subtracting the standard deviation multiplied by a factor.
Determine the percentage B, which indicates where the price is in relation to the bands.

Potential Improvements for Performance

Optimise rolling calculations: Implement a sliding window approach for SMA and standard deviation calculations to reduce redundant computations.
Parallelise computations: Utilise OpenMP or std::thread to perform calculations on different sections of the data concurrently.
Use SIMD instructions: Implement SIMD (Single Instruction, Multiple Data) operations for vectorised calculations, especially for the standard deviation computation.
Precompute squares: Calculate and store the squares of price differences from the mean to avoid repeated power operations.
Memory efficiency: Consider using in-place calculations where possible to reduce memory usage.
Optimise data structures: Evaluate the use of more efficient data structures or memory layouts for improved cache performance.

Exponential Moving Average (EMA)

std::vector<double> ema(std::vector<double> price, int n, bool wilder = false) {
    // define beta
    // for EMA, wilder=FALSE (the default) uses an exponential smoothing ratio of 2/(n+1), while wilder=TRUE uses Welles Wilder's exponential smoothing ratio of 1/n
    double beta = wilder ? 1.0 / n : 2.0 / ((double) n + 1.0);

    // pre-allocate the vector with NA values
    std::vector<double> result(price.size(), NA_REAL);

    // check for non-leading NAs and get first non-NA location
    int first_non_na = 0;
    for (int i = 0; i < price.size(); i++) {
        if (!std::isnan(price[i])) {
            first_non_na = i;
            break;
        }
    }

    // if first value larger than n then throw error
    if (n + first_non_na > price.size()) {
        stop("Not enough non-NA values");
    }

    // calculate the first value as the average of the first n values
    double seed = 0.0;
    for (int i = first_non_na; i < first_non_na + n; i++) {
        // std::cout << price[i] << std::endl;
        seed += price[i] / (double) n;
    }

    result[first_non_na + n - 1] = seed;

    // calculate the ema
    for (int i = first_non_na + n; i < price.size(); i++) {
        result[i] = beta * price[i] + (1.0 - beta) * result[i - 1];
    }
    
    return result;
}

Brief Explanation of C++ Implementation

The Exponential Moving Average (EMA) algorithm is implemented in C++ using the following steps:

Calculate the smoothing factor (beta) based on the period and whether Wilder’s method is used.
Handle non-leading NA values in the input data.
Compute the initial seed value as the simple average of the first n non-NA values.
Calculate the EMA recursively for the remaining data points.

Potential Improvements for Performance

Optimise NA handling: Consider using a more efficient method to find the first non-NA value, such as std::find_if.
Vectorise calculations: Implement SIMD operations for the EMA calculation loop to process multiple data points simultaneously.
Memory efficiency: Use in-place calculations where possible to reduce memory usage.
Parallelise computations: For large datasets, consider parallelising the EMA calculations for different segments of the data.
Precompute constants: Calculate and store constant values (e.g., 1.0 - beta) outside the main loop.
Optimise error handling: Implement more efficient error checking and handling mechanisms.
Consider alternative data structures: Evaluate the use of more cache-friendly data structures for improved performance.

Moving Average Convergence Divergence (MACD)

List macd(std::vector<double> price, int s, int l, int k, bool percent = true) {
    std::vector<double> mavg_fast = ema(price, s);
    std::vector<double> mavg_slow = ema(price, l);

    // calculate the macd as the difference between mavg_fast and mavg_slow
    std::vector<double> macd_res;

    // we use a for loop here
    for (int i = 0; i < mavg_fast.size(); i++) {
        if (percent) {
            macd_res.push_back(100 * (mavg_fast[i] / mavg_slow[i] - 1));
        } else {
            macd_res.push_back(mavg_fast[i] - mavg_slow[i]);
        }
    }

    std::vector<double> signal = ema(macd_res, k);

    List result = List::create(_["macd"] = macd_res, _["signal"] = signal);

    return result;
}

Brief Explanation of C++ Implementation

The Moving Average Convergence Divergence (MACD) algorithm is implemented in C++ using the following steps:

Calculate the fast and slow Exponential Moving Averages (EMA) using the provided periods.
Compute the MACD line by either taking the difference or the percentage difference between the fast and slow EMAs.
Calculate the signal line by applying EMA to the MACD line.
Return both the MACD line and the signal line as a list.

Potential Improvements for Performance

Vectorise calculations: Implement SIMD operations for the MACD calculation loop to process multiple data points simultaneously.
Optimise memory allocation: Pre-allocate the macd_res vector to avoid multiple reallocations during push_back operations.
Parallel processing: For large datasets, consider parallelising the MACD calculations for different segments of the data.
Inline EMA calculations: If possible, inline the EMA calculations within the MACD function to reduce function call overhead.
Use references: Pass large vectors by reference to avoid unnecessary copying.
Optimise conditional statements: Consider using a ternary operator or template specialisation to handle the percent flag more efficiently.
Memory efficiency: Evaluate the possibility of in-place calculations to reduce memory usage.
Cache optimisation: Analyse and optimise the data access patterns for better cache utilisation.

Momentum

std::vector<double> mom(std::vector<double> price, int n) {
    std::vector<double> result(price.size(), NA_REAL);
    for (int i = n; i < price.size(); i++) {
        result[i] = price[i] - price[i - n];
    }
    return result;
}

Brief Explanation of C++ Implementation

The Momentum algorithm is implemented in C++ using the following steps:

Pre-allocate a result vector with NA_REAL values to handle missing data points.
Iterate through the price vector, calculating the difference between the current price and the price n periods ago.
Store the calculated momentum values in the result vector.

This straightforward implementation uses efficient vector operations and minimises memory allocations. The use of NA_REAL ensures that R receives valid NA values for the initial n-1 periods where momentum cannot be calculated.

Potential Improvements for Performance

Vectorisation: Implement SIMD instructions to calculate multiple momentum values simultaneously.
Parallel processing: For large datasets, consider using OpenMP or std::thread to parallelize the momentum calculations.
Memory access optimisation: Analyse the memory access pattern and consider cache-friendly data structures or algorithms.
Inline expansion: If this function is called frequently, consider making it inline to reduce function call overhead.
Error handling: Add input validation to ensure n is not larger than the price vector size.
Optimise for specific n values: For common n values (e.g., 1, 5, 10), consider creating specialised implementations.
Use iterator-based approach: Consider using iterators instead of indexing, which might be more efficient for some compilers.
Precision control: If lower precision is acceptable, consider using float instead of double for faster calculations.

Rate of Change (ROC)

std::vector<double> roc(std::vector<double> price, int n, char type = 'c') {
    std::vector<double> result(price.size(), NA_REAL);

    for (int i = n; i < price.size(); i++) {
        if (type == 'c') {
            result[i] = std::log(price[i]) - std::log(price[i - n]);
        } else {
            result[i] = (price[i] - price[i - n]) / price[i - n];
        }
    }

    return result;
}

Brief Explanation of C++ Implementation

The Rate of Change (ROC) algorithm is implemented in C++ using the following steps:

Pre-allocate a result vector with NA_REAL values to handle missing data points.
Iterate through the price vector, starting from the nth element.
Calculate the ROC based on the specified type:
- If type is ‘c’ (continuous), calculate the difference of logarithms.
- Otherwise, calculate the percentage change.
Store the calculated ROC values in the result vector.

This implementation allows for two types of ROC calculations: continuous (logarithmic) and discrete (percentage). The use of NA_REAL ensures that R receives valid NA values for the initial n-1 periods where ROC cannot be calculated.

Potential Improvements for Performance

Vectorisation: Implement SIMD instructions to calculate multiple ROC values simultaneously, especially for the arithmetic operations.
Branch prediction optimisation: Consider reordering the if-else statement based on the most common use case to improve branch prediction.
Parallel processing: For large datasets, use OpenMP or std::thread to parallelise the ROC calculations.
Precompute logarithms: If memory is not a constraint, consider precomputing logarithms for the ‘c’ type to avoid redundant calculations.
Memory access optimisation: Analyse the memory access pattern and consider cache-friendly data structures or algorithms.
Inline expansion: If this function is called frequently, consider making it inline to reduce function call overhead.
Error handling: Add input validation to ensure n is not larger than the price vector size and that type is valid.
Template specialisation: Create specialised implementations for ‘c’ and non-‘c’ types to avoid the runtime conditional check.
Precision control: If lower precision is acceptable, consider using float instead of double for faster calculations.

Relative Strength Index (RSI)

std::vector<double> rsi(std::vector<double> price, int n, char method = 'e') {
    int price_length = price.size();
    // create result vectors
    std::vector<double> up(price_length, 0.0);
    std::vector<double> down(price_length, 0.0);

    for (int i = 1; i < price_length; i++) {
        double price_diff = price[i] - price[i - 1];
        if (price_diff > 0) {
            up[i] = price[i] - price[i - 1];
        } else {
            down[i] = price[i - 1] - price[i];
        }
    }

    // smoothed averages
    std::vector<double> smoothed_average_gain(price_length, NA_REAL);
    std::vector<double> smoothed_average_loss(price_length, NA_REAL);

    if (method == 'e') {
        smoothed_average_gain = ema(up, n, true);
        smoothed_average_loss = ema(down, n, true);
    } else if (method == 's') {
        smoothed_average_gain = sma(up, n);
        smoothed_average_loss = sma(down, n);
    } else {
        // throw c++ error
        throw std::invalid_argument("method must be 'e' or 's'");
    }

    // calculate the relative strength
    std::vector<double> result(price_length, NA_REAL);

    for (int i = 0; i < price_length; i++) {
        double relative_strength_value = smoothed_average_gain[i] / smoothed_average_loss[i];
        result[i] = 100.0 - 100.0 / (1.0 + relative_strength_value);
    }

    return result;
}

Brief Explanation of C++ Implementation

The Relative Strength Index (RSI) algorithm is implemented in C++ using the following steps:

Calculate price differences and separate them into ‘up’ and ‘down’ movements.
Compute smoothed averages of gains and losses using either Exponential Moving Average (EMA) or Simple Moving Average (SMA), based on the specified method.
Calculate the relative strength as the ratio of average gain to average loss.
Compute the RSI values using the formula: RSI = 100 - (100 / (1 + RS)).

This implementation allows for flexibility in choosing between EMA and SMA methods for calculating average gains and losses. It leverages the previously implemented EMA and SMA functions for efficiency and code reuse.

Potential Improvements for Performance

Vectorisation: Implement SIMD instructions to calculate multiple RSI values simultaneously, especially for the arithmetic operations.
Optimise branching: Consider using template specialisation or function pointers to avoid the method check in each function call.
Parallel processing: For large datasets, use OpenMP or std::thread to parallelise the RSI calculations.
Memory optimisation: Consider using in-place calculations for up and down movements to reduce memory usage.
Precompute constants: Calculate and store constant values (e.g., 100.0, 1.0) outside the loops.
Inline expansion: If this function is called frequently, consider making it inline to reduce function call overhead.
Error handling: Implement more robust error checking and handling mechanisms.
Precision control: If lower precision is acceptable, consider using float instead of double for faster calculations.
Cache optimisation: Analyse and optimise the data access patterns for better cache utilisation.
Optimise division operations: Consider using reciprocal multiplication instead of division where applicable.

Simple Moving Average (SMA)

std::vector<double> sma(std::vector<double> price, int n) {
    // pre-allocate the vector with NA values
    std::vector<double> result(price.size(), NA_REAL);

    // calculate the first value as the average of the first n values
    double first_val = 0;
    for (int i = 0; i < n; i++) {
        first_val += price[i] / (double) n;
    }

    // proof dividing in the for loop is correct
    // 1+2+3+4+5+6+7+8+9+10 = 55 / 10 = 5.5
    // (1/10)+(2/10)+(3/10)+(4/10)+(5/10)+(6/10)+(7/10)+(8/10)+(9/10)+(10/10) = 5.5
    // first_val /= (double) n;

    result[n - 1] = first_val;

    // iterate over every position of the result array
    // each are calculated from all values in window of size n
    for (int i = n; i <= price.size(); i++) {
        // iterate over the window of size n and calculate the sum / n
        // values are initially set to NA so we must do initial value at 0
        double sum = 0;
        for (int j = i - n; j < i; j++) {
            sum += price[j];
        }
        result[i - 1] = sum / (double) n;
    }
    
    // cast to NumericVector
    return result;
}

Brief Explanation of C++ Implementation

The Simple Moving Average (SMA) algorithm is implemented in C++ using the following steps:

Pre-allocate a result vector with NA_REAL values to handle missing data points.
Calculate the first SMA value by averaging the first n elements of the price vector.
Iterate through the remaining price data, calculating the SMA for each window of size n.
Store the calculated SMA values in the result vector.

This implementation uses an efficient rolling sum algorithm to minimise redundant calculations, resulting in O(n) time complexity. The use of NA_REAL ensures that R receives valid NA values for the initial n-1 periods where SMA cannot be calculated.

Potential Improvements for Performance

Optimise rolling sum: Implement a sliding window approach to avoid recalculating the entire sum for each window.
Vectorisation: Use SIMD instructions to calculate multiple sums or divisions simultaneously.
Parallel processing: For large datasets, consider using OpenMP or std::thread to parallelise the SMA calculations.
Memory access optimisation: Analyse the memory access pattern and consider cache-friendly data structures or algorithms.
Precision control: If lower precision is acceptable, consider using float instead of double for faster calculations.
Precompute reciprocals: Store 1/n to replace division with multiplication in the main loop.
Error handling: Add input validation to ensure n is not larger than the price vector size.
Inline expansion: If this function is called frequently, consider making it inline to reduce function call overhead.
Optimise for specific n values: For common n values (e.g., 5, 10, 20), consider creating specialised implementations.
Use iterators: Consider using iterators instead of indexing, which might be more efficient for some compilers.

Performance Discussion

When implementing technical analysis algorithms in C++ for use in R via Rcpp, several performance considerations and optimisation strategies come into play. This section discusses general performance aspects and Rcpp-specific optimisations that can enhance the efficiency of our package.

General C++ Optimisations

Vectorisation: Utilise SIMD (Single Instruction, Multiple Data) operations where possible. Modern C++ compilers can auto-vectorise some loops, but explicit use of libraries like Eigen or Boost.SIMD can yield further improvements.
Memory Management: Minimise dynamic allocations. Pre-allocate vectors where sizes are known in advance. Consider using reserve() for vectors that grow.
Loop Optimisation: Unroll small loops, and consider loop fusion where applicable. Be mindful of cache-friendly access patterns.
Inline Functions: Use inline functions for small, frequently called functions to reduce function call overhead.
Const Correctness: Use const wherever possible to allow for compiler optimisations.
Move Semantics: Utilise C++11 move semantics to reduce unnecessary copying of large objects.

Rcpp-Specific Optimisations

RcppArmadillo: For linear algebra operations, consider using RcppArmadillo, which provides efficient matrix and vector operations.
Rcpp Sugar: Leverage Rcpp Sugar for vectorised operations. It provides R-like syntax with C++ performance.
Rcpp Attributes: Use Rcpp attributes for seamless R and C++ integration, reducing boilerplate code.
Avoid R API in Loops: Minimise calls to R API functions inside loops, as these can be expensive.
Use Appropriate R Data Types: Match C++ types to appropriate R types (e.g., NumericVector for double vectors) for efficient data transfer between R and C++.
Parallel Processing: Utilise OpenMP for parallelisation, which is supported by Rcpp.

R Integration Considerations

Minimise R-C++ Context Switching: Batch operations in C++ where possible to reduce the overhead of switching between R and C++ contexts.
Efficient Data Passing: Pass large datasets by reference using Rcpp::Reference class to avoid copying.
Use RcppParallel: For embarrassingly parallel tasks, consider using RcppParallel to leverage multi-core processors.
Profiling: Use Rcpp’s microbenchmark package to profile your C++ code and identify bottlenecks.
Memory Management in R: Be aware of R’s garbage collection. Properly scope Rcpp objects to ensure timely cleanup.

Algorithm-Specific Optimisations

Rolling Window Calculations: Implement efficient rolling window algorithms to avoid redundant calculations in moving averages and similar indicators.
Lookup Tables: For functions with discrete inputs (e.g., small integer ranges), consider using lookup tables instead of repeated calculations.
Approximations: Where appropriate, use fast approximations for complex functions (e.g., fast log approximations for certain calculations).
Specialised Implementations: For common parameter values (e.g., specific lookback periods), consider creating specialised, optimised implementations.

Future Optimisations

GPU Acceleration: For very large datasets or compute-intensive algorithms, consider GPU acceleration using libraries like RcppCUDA.
Adaptive Algorithms: Implement algorithms that can adapt to different data sizes, potentially switching between different implementations based on input size.
Code Generation: For highly repetitive code patterns, consider using template metaprogramming or code generation techniques to create optimised implementations.

By applying these optimisations and consistently profiling our code, we can ensure that our Rcpp-based technical analysis package maintains high performance while providing a seamless integration with R. Remember to always benchmark and profile before and after optimisations to ensure that changes actually improve performance in real-world scenarios.

Conclusion

The dmplot package provides a set of high-performance technical indicators implemented in C++. By leveraging the speed and efficiency of C++, these functions offer superior performance compared to equivalent R implementations, especially for large datasets or high-frequency calculations.

The combination of mathematical rigor and optimised C++ code makes dmplot an excellent choice for financial analysts, quantitative traders, and researchers working with large-scale financial data or requiring real-time analysis capabilities.

Future developments will continue to focus on performance optimisations and expanding the range of available indicators, always with an emphasis on C++ implementation for maximum efficiency.