The Journal of Statistical Software was founded by Jan de Leeuw in 1996, the year before the Comprehensive R Archive Network (CRAN) first made R and contributed R packages widely available on the Internet. Within a few years, R came increasingly to dominate contributions to JSS. We trace the continuing development of R and CRAN, and the representation of R and other statistical software in the pages of JSS.
This paper introduces JASP, a free graphical software package for basic statistical procedures such as t tests, ANOVAs, linear regression models, and analyses of contingency tables. JASP is open-source and differentiates itself from existing open-source solutions in two ways. First, JASP provides several innovations in user interface design; specifically, results are provided immediately as the user makes changes to options, output is attractive, minimalist, and designed around the principle of progressive disclosure, and analyses can be peer reviewed without requiring a "syntax". Second, JASP provides some of the recent developments in Bayesian hypothesis testing and Bayesian parameter estimation. The ease with which these relatively complex Bayesian techniques are available in JASP encourages their broader adoption and furthers a more inclusive statistical reporting practice. The JASP analyses are implemented in R and a series of R packages.
Strict requirements of scientific journals allied to the need to prove the experimental data are (in)significant from the statistical standpoint have led to a steep increase in the use and development of statistical software. In this aspect, it is observed that the increasing number of software tools and packages and their wide usage has created a generation of ‘click and go’ users, who are eagerly destined to obtain the -values and multivariate graphs (projection of samples and variables on the factor plane), but have no idea on how the statistical parameters are calculated and the theoretical and practical reasons he/she performed such tests. However, in this paper, some published examples are listed and discussed in detail to provide a holistic insight (positive points and limitations) about the uses and misuses of some statistical methods using different available statistical software. Additionally, a brief description of several commercial and free statistical software is made highlighting their advantages and limitations.
The reliability of several statisitcal software packages was examined using the National Institute of Standards and Technology linear and nonlinear least squares datasets and models. Software tested include Excel 2007, GAMS 23.4, GAUSS 9.0, LIMDEP 8.0, Mathematica 7.0, MATLAB 7.5, R 2.10, SAS 9.1, SHAZAM 10, and Stata 10. While some of these packages have been previously examined, others, including GAMS and MATLAB, have not been extensively examined. Reliability tests indicate improvements in some of the software packages that were previously tested, but some of these packages failed reliability tests under certain conditions. The findings underscore the need to benchmark software packages to ascertain reliability before use and the importance of solving econometric problems using more than one package.
In this paper we review the state space approach to time series analysis and establish the notation that is adopted in this special volume of the Journal of Statistical Software. We first provide some background on the history of state space methods for the analysis of time series. This is followed by a concise overview of linear Gaussian state space analysis including the modelling framework and appropriate estimation methods. We discuss the important class of unobserved component models which incorporate a trend, a seasonal, a cycle, and fixed explanatory and intervention variables for the univariate and multivariate analysis of time series. We continue the discussion by presenting methods for the computation of different estimates for the unobserved state vector: filtering, prediction, and smoothing. Estimation approaches for the other parameters in the model are also considered. Next, we discuss how the estimation procedures can be used for constructing confidence intervals, detecting outlier observations and structural breaks, and testing model assumptions of residual independence, homoscedasticity, and normality. We then show how ARIMA and ARIMA components models fit in the state space framework to time series analysis. We also provide a basic introduction for non-Gaussian state space models. Finally, we present an overview of the software tools currently available for the analysis of time series with state space methods as they are discussed in the other contributions to this special volume.
Background: Estimating the health effects of multi-pollutant mixtures is of increasing interest in environmental epidemiology. Recently, a new approach for estimating the health effects of mixtures, Bayesian kernel machine regression (BKMR), has been developed. This method estimates the multivariable exposure-response function in a flexible and parsimonious way, conducts variable selection on the (potentially high-dimensional) vector of exposures, and allows for a grouped variable selection approach that can accommodate highly correlated exposures. However, the application of this novel method has been limited by a lack of available software, the need to derive interpretable output in a computationally efficient manner, and the inability to apply the method to non-continuous outcome variables. Methods: This paper addresses these limitations by (i) introducing an open-source software package in the R programming language, the bkmr R package, (ii) demonstrating methods for visualizing high-dimensional exposure-response functions, and for estimating scientifically relevant summaries, (iii) illustrating a probit regression implementation of BKMR for binary outcomes, and (iv) describing a fast version of BKMR that utilizes a Gaussian predictive process approach. All of the methods are illustrated using fully reproducible examples with the provided R code. Results: Applying the methods to a continuous outcome example illustrated the ability of the BKMR implementation to estimate the health effects of multi-pollutant mixtures in the context of a highly nonlinear, biologically-based dose-response function, and to estimate overall, single-exposure, and interactive health effects. The Gaussian predictive process method led to a substantial reduction in the runtime, without a major decrease in accuracy. In the setting of a larger number of exposures and a dichotomous outcome, the probit BKMR implementation was able to correctly identify the variables included in the exposure-response function and yielded interpretable quantities on the scale of a latent continuous outcome or on the scale of the outcome probability. Conclusions: This newly developed software, integrated suite of tools, and extended methodology makes BKMR accessible for use across a broad range of epidemiological applications in which multiple risk factors have complex effects on health.