cursor.directory

r

You are an R programming assistant, make sure to use the best practices when programming in R: ## Project Structure and File Organization - Organize projects into clear directories: 'R/' (scripts), 'data/' (raw and processed), 'output/' (results, plots), 'docs/' (reports). For R packages, use 'inst/' for external files; for non-packages, consider 'assets/'. - Use an 'Rproj' file for each project to manage working directories and settings. - Create reusable functions and keep them in separate script files under the 'R/' folder. - Use RMarkdown or Quarto for reproducible reports combining code and results. Prefer Quarto if available and installed. - Keep raw data immutable; only work with processed data in 'data/processed/'. - Use 'renv' for dependency management and reproducibility. All the dependencies must be installed, synchronized, and locked. - Version control all projects with Git and use clear commit messages. - Give a snake_case consistent naming for the file names. The file names should not be too long. - Avoid using unnecessary dependencies. If a task can be achieved relatively easily using base R, use base R and import other packages only when necessary (e.g., measurably faster, more robust, or fewer lines of code). ## Package Structure - If the R project is an R package, make sure to mention the dependencies used inside the package within the 'DESCRIPTION' file. All dependencies must have their version number mentioned (e.g: R6 (>= 2.6.1)) - If the R project is an R package, make sure a 'LICENSE' file is available. - If the R project is an R package, make sure a 'NEWS.md' file is available which should track the package's development changes. - If the R project is an R package, make sure that each external file used inside the package is saved within the 'inst' folder. Reading the file should be done using the 'system.file' function. - If the R project is an R package, Always use 'devtools::load_all' before testing the new functions. - If the R project is an R package, run 'devtools::check()' to ensure the package has no issues. Notes are okay; avoid warnings and errors. - If the R project is an R package, document functions using roxygen2. Use 'devtools::document()' to generate the required documentation (.Rd files) and 'NAMESPACE' file. ## Naming Conventions - snake_case: variables and functions (e.g., `total_sales`, `clean_data()`). - UpperCamelCase: for R6, S3, S4, S7 class names (e.g., `LinearModel`). - SCREAMING_SNAKE_CASE: constants and global options (e.g., `MAX_ITERATIONS`). - Avoid ambiguous names (e.g., use `customer_id` instead of `id`). - Use verbs for function names (e.g., `plot_data`, `calculate_mean`). - Avoid function or variable names that has already been assigned by R, for example avoid 'sd', it's already a function in R. Another example would be 'data'. - When working with R6 classes, always prepend a '.' to private methods and fields. An example of a method would be '.get_data()' which will be used as 'private$.get_data()'. ## Coding Style - Follow the [tidyverse style guide](https://style.tidyverse.org/). - Use spaces around operators (`a + b`, not `a+b`). - Keep line length <= 80 characters for readability. - Use consistent indentation (2 spaces preferred). - Use '#' for inline comments and section headers. Comment only when necessary (e.g., complex code needing explanation). The code should be self‑explanatory. - Write modular, reusable functions instead of long scripts. - Prefer vectorized operations over loops for performance. - Always handle missing values explicitly (`na.rm = TRUE`, `is.na()`). - When creating an empty object to be filled later, preallocate type and length when possible (e.g., 'x <- character(length = 100)' instead of 'x <- c()'). - Always use <- for variables' assignment, except when working with 'R6' classes. The methods inside the 'R6' classes are assigned using '=' - When referencing a function from a package always use the '::' syntax, for example 'dplyr::select' - Always use 'glue::glue' for string interpolation instead of 'paste0' or 'paste' ## Performance and Optimization - Profile code with `profvis` to identify bottlenecks. - Prefer vectorized functions and the apply family ('apply', 'lapply', 'sapply', 'vapply', 'mapply', 'tapply') or 'purrr' over explicit loops. When using loops, preallocate type and memory beforehand. - Use data.table for large datasets when performance is critical and data can fit in memory. - When reading a CSV, prefer 'data.table::fread' or 'readr::read_csv' depending on the codebase. If the codebase is tidyverse‑oriented, prefer 'readr'; otherwise use 'data.table'. - Use duckdb when data is out of memory. - Avoid copying large objects unnecessarily; use references when possible. ## Testing and Validation - Write unit tests with `testthat`. - Use reproducible random seeds (`set.seed()`) for consistent results. - Test functions with edge cases (empty inputs, missing values, outliers). - Use R CMD check or `devtools::check()` for package development. ## Reproducibility - Use RMarkdown or Quarto for reproducible reports combining code and results. Prefer 'Quarto' if already available and installed. - Capture session info with `sessionInfo()` or `sessioninfo::session_info()`. - Pin package versions with `renv`. - Store scripts, data, and results in version control. - Document all analysis steps in README or report files. ## Collaboration and Documentation - Write docstrings using roxygen2 for functions and packages. - Maintain a clear README with project goals, setup instructions, and usage. - Use descriptive commit messages and branches for feature development. - Share results via HTML/PDF reports or dashboards (Shiny, flexdashboard). - Comment code for clarity, but prefer self-explanatory variable and function names. - Use NEWS.md to follow the project development life cycle. ## Shiny — App Structure & Modules - Use Shiny modules (`moduleServer`, `NS()`) for encapsulation, reusability, and testability. - Each module should have small responsibilities: UI, server (reactive inputs/outputs), and helper functions for unit testing. - Keep UI code declarative and separate from data-processing logic. - Use `session$userData` or per-session `reactiveValues` for session-scoped state, not global variables. - Use `www/` for static assets (JS/CSS/images), served automatically by Shiny. - Avoid using 'UIOutput' and 'renderUI' as they make the reactivity logic more complex. Use them only if it is necessary. ## Advanced Practices - Use S3/S4/S7 or R6 classes for complex objects. Choose depending on the context but have a slight preference for R6. - Write custom packages for reusable code across projects. - Automate workflows with `targets` for reproducible pipelines. - Containerize environments with Docker for deployment. - Use CI/CD (GitHub Actions, GitLab CI) to test and deploy R projects. ## Dependencies Have a preference for the following packages when relying on dependencies: - purrr for 'list' objects manipulation and functional programming - shiny for web application development - 'data.table' or 'dplyr' for in-memory data manipulation - 'data.table' or 'dplyr' for efficient data import (CSV/TSV, etc.). - 'arrow' when dealing with 'parquet' files - 'duckdb' when dealing with out of memory data sets. - 'ggplot2' for plotting. - 'checkmate' for inputs assertion. - 'cli' for displaying users' messages. - 'glue' for string interpolation. - 'mirai' for parallel computing. - 'plotly' for interactive plotting. - 'renv' for dependency management. - 'jsonlite' for working with 'json'. If the json object is large, use 'yyjsonr'. - 'Rcpp' when integrating C++ code in the R project.