Important empirical information on household behavior and finances is obtained from surveys, and these data are used heavily by researchers, central banks, and for policy consulting. However, various interdependent factors that can be controlled only to a limited extent lead to unit and item nonresponse, and missing data on certain items is a frequent source of difficulties in statistical practice. More than ever, it is important to explore techniques for the imputation of large survey data. This paper presents the theoretical underpinnings of a Markov chain Monte Carlo multiple imputation procedure and outlines important technical aspects of the application of MCMC-type algorithms to large socio-economic data sets. In an illustrative application it is found that MCMC algorithms have good convergence properties even on large data sets with complex patterns of missingness, and that the use of a rich set of covariates in the imputation models has a substantial effect on the distributions of key financial variables.

In this work, we consider a hierarchical spatio-temporal model for particulate matter (PM) concentration in the North-Italian region Piemonte. The model involves a Gaussian Field (GF), affected by a measurement error, and a state process characterized by a first order autoregressive dynamic model and spatially correlated innovations. This kind of model is well discussed and widely used in the air quality literature thanks to its flexibility in modelling the effect of relevant covariates (i.e. meteorological and geographical variables) as well as time and space dependence. However, Bayesian inference—through Markov chain Monte Carlo (MCMC) techniques—can be a challenge due to convergence problems and heavy computational loads. In particular, the computational issue refers to the infeasibility of linear algebra operations involving the big dense covariance matrices which occur when large spatio-temporal datasets are present. The main goal of this work is to present an effective estimating and spatial prediction strategy for the considered spatio-temporal model. This proposal consists in representing a GF with Matérn covariance function as a Gaussian Markov Random Field (GMRF) through the Stochastic Partial Differential Equations (SPDE) approach. The main advantage of moving from a GF to a GMRF stems from the good computational properties that the latter enjoys. In fact, GMRFs are defined by sparse matrices that allow for computationally effective numerical methods. Moreover, when dealing with Bayesian inference for GMRFs, it is possible to adopt the Integrated Nested Laplace Approximation (INLA) algorithm as an alternative to MCMC methods giving rise to additional computational advantages. The implementation of the SPDE approach through the R-library INLA ( www.r-inla.org ) is illustrated with reference to the Piemonte PM data. In particular, providing the step-by-step R-code, we show how it is easy to get prediction and probability of exceedance maps in a reasonable computing time.

The analysis of time series of counts is an emerging field of science. To obtain an ARMA-like autocorrelation structure, many models make use of thinning operations to adapt the ARMA recursion to the integer-valued case. Most popular among these probabilistic operations is the concept of binomial thinning, leading to the class of INARMA models. These models are proved to be useful, especially for processes of Poisson counts, but may lead to difficulties in the case of different count distributions. Therefore, several alternative thinning concepts have been developed. This article reviews such thinning operations and shows how they are successfully applied to define integer-valued ARMA models.

In this study, we investigate and explain the level and change of six elements of group-focused enmity (GFE; see Zick et al. in J. Soc. Issues 64(2):363–383, 2008) in Germany between 2002 and 2006: racism, xenophobia, anti-Semitism, homophobia, exclusion of homeless people and support for rights of the established. For the data analysis, a representative 4-year panel study of the adult non-immigrant German population collected during the years 2002–2006 is used, and the development of each GFE component is tested by using an unconditional second-order latent growth curve model (LGM) (with full information maximum likelihood, FIML). We find that the level of 5 of the 6 components (racism, xenophobia, anti-Semitism, homophobia, exclusion of homeless people) displays an increase at the beginning of the observed period followed by a decrease. However, the sixth aspect, rights of the established, displays a continuous linear increase over time. The different developmental pattern stands in contrast to Allport’s (The nature of prejudice. Perseus Books, Cambridge, 1954) hypothesis for the strong link between the components and their development over time. We try to explain this different developmental pattern by several sociodemographic characteristics. This is performed by using a conditional second-order latent growth curve model.

Composite marginal likelihoods are pseudolikelihoods constructed by compounding marginal densities. In several applications, they are convenient surrogates for the ordinary likelihood when it is too cumbersome or impractical to compute. This paper presents an overview of the topic with emphasis on applications.

On the one hand, kernel density estimation has become a common tool for empirical studies in any research area. This goes hand in hand with the fact that this kind of estimator is now provided by many software packages. On the other hand, since about three decades the discussion on bandwidth selection has been going on. Although a good part of the discussion is about nonparametric regression, this parameter choice is by no means less problematic for density estimation. This becomes obvious when reading empirical studies in which practitioners have made use of kernel densities. New contributions typically provide simulations only to show that the own selector outperforms some of the existing methods. We review existing methods and compare them on a set of designs that exhibit few bumps and exponentially falling tails. We concentrate on small and moderate sample sizes because for large ones the differences between consistent methods are often negligible, at least for practitioners. As a byproduct we find that a mixture of simple plug-in and cross-validation methods produces bandwidths with a quite stable performance.

With the influx of complex and detailed tracking data gathered from electronic tracking devices, the analysis of animal movement data has recently emerged as a cottage industry among biostatisticians. New approaches of ever greater complexity are continue to be added to the literature. In this paper, we review what we believe to be some of the most popular and most useful classes of statistical models used to analyse individual animal movement data. Specifically, we consider discrete-time hidden Markov models, more general state-space models and diffusion processes. We argue that these models should be core components in the toolbox for quantitative researchers working on stochastic modelling of individual animal movement. The paper concludes by offering some general observations on the direction of statistical analysis of animal movement. There is a trend in movement ecology towards what are arguably overly complex modelling approaches which are inaccessible to ecologists, unwieldy with large data sets or not based on mainstream statistical practice. Additionally, some analysis methods developed within the ecological community ignore fundamental properties of movement data, potentially leading to misleading conclusions about animal movement. Corresponding approaches, e.g. based on Lévy walk-type models, continue to be popular despite having been largely discredited. We contend that there is a need for an appropriate balance between the extremes of either being overly complex or being overly simplistic, whereby the discipline relies on models of intermediate complexity that are usable by general ecologists, but grounded in well-developed statistical practice and efficient to fit to large data sets.

Auxiliary information $${\varvec{x}}$$ x is commonly used in survey sampling at the estimation stage. We propose an estimator of the finite population distribution function $$F_{y}(t)$$ F y ( t ) when $${\varvec{x}}$$ x is available for all units in the population and related to the study variable y by a superpopulation model. The new estimator integrates ideas from model calibration and penalized calibration. Calibration estimates of $$F_{y}(t)$$ F y ( t ) with the weights satisfying benchmark constraints on the fitted values distribution function $$\hat{F}_{\hat{y}}=F_{\hat{y}}$$ F ^ y ^ = F y ^ on a set of fixed values of t can be found in the literature. Alternatively, our proposal $$\hat{F}_{y\omega }$$ F ^ y ω seeks an estimator taking into account a global distance $$D(\hat{F}_{\hat{y}\omega },F_{\hat{y}})$$ D ( F ^ y ^ ω , F y ^ ) between $$\hat{F}_{\hat{y}\omega }$$ F ^ y ^ ω and $${F}_{\hat{y}},$$ F y ^ , and a penalty parameter $$\alpha $$ α that assesses the importance of this term in the objective function. The weights are explicitly obtained for the $$L^2$$ L 2 distance and conditions are given so that $$\hat{F}_{y\omega }$$ F ^ y ω to be a distribution function. In this case $$\hat{F}_{y\omega }$$ F ^ y ω can also be used to estimate the population quantiles. Moreover, results on the asymptotic unbiasedness and the asymptotic variance of $$\hat{F}_{y\omega }$$ F ^ y ω , for a fixed $$\alpha $$ α , are obtained. The results of a simulation study, designed to compare the proposed estimator to other existing ones, reveal that its performance is quite competitive.

In this paper, multivariate partially linear model with error in the explanatory variable of nonparametric part, where the response variable is m dimensional, is considered. By modification of local-likelihood method, an estimator of parametric part is driven. Moreover, the asymptotic normality of the generalized least square estimator of the parametric component is investigated when the error distribution function is either ordinarily smooth or super smooth. Applications in the Engel curves are discussed and through Monte Carlo experiments performances of $$\hat{\beta }_{n}$$ β ^ n are investigated.

In science and engineering, we are often interested in learning about the lifetime characteristics of the system as well as those of the components that made up the system. However, in many cases, the system lifetimes can be observed but not the component lifetimes, and so we may not also have any knowledge on the structure of the system. Statistical procedures for estimating the parameters of the component lifetime distribution and for identifying the system structure based on system-level lifetime data are developed here using expectation–maximization (EM) algorithm. Different implementations of the EM algorithm based on system-level or component-level likelihood functions are proposed. A special case that the system is known to be a coherent system with unknown structure is considered. The methodologies are then illustrated by considering the component lifetimes to follow a two-parameter Weibull distribution. A numerical example and a Monte Carlo simulation study are used to evaluate the performance and related merits of the proposed implementations of the EM algorithm. Lognormally distributed component lifetimes and a real data example are used to illustrate how the methodologies can be applied to other lifetime models in addition to the Weibull model. Finally, some recommendations along with concluding remarks are provided.

It is known that the shapes of planar triangles can be represented by a set of points on the surface of the unit sphere. On the other hand, most of the objects can easily be triangulated and so each triangle can accordingly be treated in the context of shape analysis. There is a growing interest to fit a smooth path going through the cloud of shape data available in some time instances. To tackle this problem, we propose a longitudinal model through a triangulation procedure for the shape data. In fact, our strategy initially relies on a spherical regression model for triangles, but is extended to shape data via triangulation. Regarding modeling of directional data, we use the bivariate von Mises–Fisher distribution for density of the errors. Various forms of the composite likelihood functions, constructed by altering the assumptions considered for the angles defined for each triangle, are invoked. The proposed regression model is applied to rat skull data. Also, some simulations results are presented along with the real data results.

In this paper, we consider the single-index measurement error model with mismeasured covariates in the nonparametric part. To solve the problem, we develop a simulation-extrapolation (SIMEX) algorithm based on the local linear smoother and the estimating equation. For the proposed SIMEX estimation, it is not needed to assume the distribution of the unobserved covariate. We transform the boundary of a unit ball in $${\mathbb {R}}^p$$ R p to the interior of a unit ball in $${\mathbb {R}}^{p-1}$$ R p - 1 by using the constraint $$\Vert \beta \Vert =1$$ ‖ β ‖ = 1 . The proposed SIMEX estimator of the index parameter is shown to be asymptotically normal under some regularity conditions. We also derive the asymptotic bias and variance of the estimator of the unknown link function. Finally, the performance of the proposed method is examined by simulation studies and is illustrated by a real data example.

Frailty models allow us to take into account the non-observable inhomogeneity of individual hazard functions. Although models with time-independent frailty have been intensively studied over the last decades and a wide range of applications in survival analysis have been found, the studies based on the models with time-dependent frailty are relatively rare. In this paper, we formulate and prove two propositions related to the identifiability of the bivariate survival models with frailty given by a nonnegative bivariate Lévy process. We discuss parametric and semiparametric procedures for estimating unknown parameters and baseline hazard functions. Numerical experiments with simulated and real data illustrate these procedures. The statements of the propositions can be easily extended to the multivariate case.