phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. It has several automation features and is also highly flexible. Several hundred parameters enable extensive customizations for complex use cases. Multiple user‐defined refinement strategies can be applied to specific parts of the model in a single refinement run. An intuitive graphical user interface is available to guide novice users and to assist advanced users in managing refinement projects. X‐ray or neutron diffraction data can be used separately or jointly in refinement. phenix.refine is tightly integrated into the PHENIX suite, where it serves as a critical component in automated model building, final structure refinement, structure validation and deposition to the wwPDB. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
The CCP4 (Collaborative Computational Project, Number 4) software suite is a collection of programs and associated data and software libraries which can be used for macromolecular structure determination by X‐ray crystallography. The suite is designed to be flexible, allowing users a number of methods of achieving their aims. The programs are from a wide variety of sources but are connected by a common infrastructure provided by standard file formats, data objects and graphical interfaces. Structure solution by macromolecular crystallography is becoming increasingly automated and the CCP4 suite includes several automation pipelines. After giving a brief description of the evolution of CCP4 over the last 30 years, an overview of the current suite is given. While detailed descriptions are given in the accompanying articles, here it is shown how the individual programs contribute to a complete software package.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low‐resolution refinement tools such as secondary‐structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, `jelly‐body' restraints and the use of novel long‐range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high‐resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
Macromolecular X‐ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three‐dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.
The recent rapid development of single‐particle electron cryo‐microscopy (cryo‐EM) now allows structures to be solved by this method at resolutions close to 3 Å. Here, a number of tools to facilitate the interpretation of EM reconstructions with stereochemically reasonable all‐atom models are described. The BALBES database has been repurposed as a tool for identifying protein folds from density maps. Modifications to Coot, including new Jiggle Fit and morphing tools and improved handling of nucleic acids, enhance its functionality for interpreting EM maps. REFMAC has been modified for optimal fitting of atomic models into EM maps. As external structural information can enhance the reliability of the derived atomic models, stabilize refinement and reduce overfitting, ProSMART has been extended to generate interatomic distance restraints from nucleic acid reference structures, and a new tool, LIBG, has been developed to generate nucleic acid base‐pair and parallel‐plane restraints. Furthermore, restraint generation has been integrated with visualization and editing in Coot, and these restraints have been applied to both real‐space refinement in Coot and reciprocal‐space refinement in REFMAC.
Maximum‐likelihood X‐ray macromolecular structure refinement in BUSTER has been extended with restraints facilitating the exploitation of structural similarity. The similarity can be between two or more chains within the structure being refined, thus favouring NCS, or to a distinct `target' structure that remains fixed during refinement. The local structural similarity restraints (LSSR) approach considers all distances less than 5.5 Å between pairs of atoms in the chain to be restrained. For each, the difference from the distance between the corresponding atoms in the related chain is found. LSSR applies a restraint penalty on each difference. A functional form that reaches a plateau for large differences is used to avoid the restraints distorting parts of the structure that are not similar. Because LSSR are local, there is no need to separate out domains. Some restraint pruning is still necessary, but this has been automated. LSSR have been available to academic users of BUSTER since 2009 with the easy‐to‐use ‐autoncs and ‐target target.pdb options. The use of LSSR is illustrated in the re‐refinement of PDB entries 5rnt, where ‐target enables the correct ligand‐binding structure to be found, and 1osg, where ‐autoncs contributes to the location of an additional copy of the cyclic peptide ligand.
Following integration of the observed diffraction spots, the process of `data reduction' initially aims to determine the point‐group symmetry of the data and the likely space group. This can be performed with the program POINTLESS. The scaling program then puts all the measurements on a common scale, averages measurements of symmetry‐related reflections (using the symmetry determined previously) and produces many statistics that provide the first important measures of data quality. A new scaling program, AIMLESS, implements scaling models similar to those in SCALA but adds some additional analyses. From the analyses, a number of decisions can be made about the quality of the data and whether some measurements should be discarded. The effective `resolution' of a data set is a difficult and possibly contentious question (particularly with referees of papers) and this is discussed in the light of tests comparing the data‐processing statistics with trials of refinement against observed and simulated data, and automated model‐building and comparison of maps calculated with different resolution limits. These trials show that adding weak high‐resolution data beyond the commonly used limits may make some improvement and does no harm.
In macromolecular X‐ray crystallography, typical data sets have substantial multiplicity. This can be used to calculate the consistency of repeated measurements and thereby assess data quality. Recently, the properties of a correlation coefficient, CC1/2, that can be used for this purpose were characterized and it was shown that CC1/2 has superior properties compared with `merging' R values. A derived quantity, CC*, links data and model quality. Using experimental data sets, the behaviour of CC1/2 and the more conventional indicators were compared in two situations of practical importance: merging data sets from different crystals and selectively rejecting weak observations or (merged) unique reflections from a data set. In these situations controlled `paired‐refinement' tests show that even though discarding the weaker data leads to improvements in the merging R values, the refined models based on these data are of lower quality. These results show the folly of such data‐filtering practices aimed at improving the merging R values. Interestingly, in all of these tests CC1/2 is the one data‐quality indicator for which the behaviour accurately reflects which of the alternative data‐handling strategies results in the best‐quality refined model. Its properties in the presence of systematic error are documented and discussed.
Recent advances in synchrotron sources, beamline optics and detectors are driving a renaissance in room‐temperature data collection. The underlying impetus is the recognition that conformational differences are observed in functionally important regions of structures determined using crystals kept at ambient as opposed to cryogenic temperature during data collection. In addition, room‐temperature measurements enable time‐resolved studies and eliminate the need to find suitable cryoprotectants. Since radiation damage limits the high‐resolution data that can be obtained from a single crystal, especially at room temperature, data are typically collected in a serial fashion using a number of crystals to spread the total dose over the entire ensemble. Several approaches have been developed over the years to efficiently exchange crystals for room‐temperature data collection. These include in situ collection in trays, chips and capillary mounts. Here, the use of a slowly flowing microscopic stream for crystal delivery is demonstrated, resulting in extremely high‐throughput delivery of crystals into the X‐ray beam. This free‐stream technology, which was originally developed for serial femtosecond crystallography at X‐ray free‐electron lasers, is here adapted to serial crystallography at synchrotrons. By embedding the crystals in a high‐viscosity carrier stream, high‐resolution room‐temperature studies can be conducted at atmospheric pressure using the unattenuated X‐ray beam, thus permitting the analysis of small or weakly scattering crystals. The high‐viscosity extrusion injector is described, as is its use to collect high‐resolution serial data from native and heavy‐atom‐derivatized lysozyme crystals at the Swiss Light Source using less than half a milligram of protein crystals. The room‐temperature serial data allow de novo structure determination. The crystal size used in this proof‐of‐principle experiment was dictated by the available flux density. However, upcoming developments in beamline optics, detectors and synchrotron sources will enable the use of true microcrystals. This high‐throughput, high‐dose‐rate methodology provides a new route to investigating the structure and dynamics of macromolecules at ambient temperature.
iMOSFLM is a graphical user interface to the diffraction data‐integration program MOSFLM. It is designed to simplify data processing by dividing the process into a series of steps, which are normally carried out sequentially. Each step has its own display pane, allowing control over parameters that influence that step and providing graphical feedback to the user. Suitable values for integration parameters are set automatically, but additional menus provide a detailed level of control for experienced users. The image display and the interfaces to the different tasks (indexing, strategy calculation, cell refinement, integration and history) are described. The most important parameters for each step and the best way of assessing success or failure are discussed.
Coot is a molecular‐graphics application for model building and validation of biological macromolecules. The program displays electron‐density maps and atomic models and allows model manipulations such as idealization, real‐space refinement, manual rotation/translation, rigid‐body fitting, ligand search, solvation, mutations, rotamers and Ramachandran idealization. Furthermore, tools are provided for model validation as well as interfaces to external programs for refinement, validation and graphics. The software is designed to be easy to learn for novice users, which is achieved by ensuring that tools for common tasks are `discoverable' through familiar user‐interface elements (menus and toolbars) or by intuitive behaviour (mouse controls). Recent developments have focused on providing tools for expert users, with customisable key bindings, extensions and an extensive scripting interface. The software is under rapid development, but has already achieved very widespread use within the crystallographic community. The current state of the software is presented, with a description of the facilities available and of some of the underlying methods employed.
The usage and control of recent modifications of the program package XDS for the processing of rotation images are described in the context of previous versions. New features include automatic determination of spot size and reflecting range and recognition and assignment of crystal symmetry. Moreover, the limitations of earlier package versions on the number of correction/scaling factors and the representation of pixel contents have been removed. Large program parts have been restructured for parallel processing so that the quality and completeness of collected data can be assessed soon after measurement.
MolProbity is a structure‐validation web service that provides broad‐spectrum solidly based evaluation of model quality at both the global and local levels for both proteins and nucleic acids. It relies heavily on the power and sensitivity provided by optimized hydrogen placement and all‐atom contact analysis, complemented by updated versions of covalent‐geometry and torsion‐angle criteria. Some of the local corrections can be performed automatically in MolProbity and all of the diagnostics are presented in chart and graphical forms that help guide manual rebuilding. X‐ray crystallography provides a wealth of biologically important molecular data in the form of atomic three‐dimensional structures of proteins, nucleic acids and increasingly large complexes in multiple forms and states. Advances in automation, in everything from crystallization to data collection to phasing to model building to refinement, have made solving a structure using crystallography easier than ever. However, despite these improvements, local errors that can affect biological interpretation are widespread at low resolution and even high‐resolution structures nearly all contain at least a few local errors such as Ramachandran outliers, flipped branched protein side chains and incorrect sugar puckers. It is critical both for the crystallographer and for the end user that there are easy and reliable methods to diagnose and correct these sorts of errors in structures. MolProbity is the authors' contribution to helping solve this problem and this article reviews its general capabilities, reports on recent enhancements and usage, and presents evidence that the resulting improvements are now beneficially affecting the global database.
A single protein crystal structure contains information about dynamic properties of the protein as well as providing a static view of one three‐dimensional conformation. This additional information is to be found in the distribution of observed electron density about the mean position of each atom. It is general practice to account for this by refining a separate atomic displacement parameter (ADP) for each atomic center. However, these same displacements are often described well by simpler models based on TLS (translation/libration/screw) rigid‐body motion of large groups of atoms, for example interdomain hinge motion. A procedure, TLSMD, has been developed that analyzes the distribution of ADPs in a previously refined protein crystal structure in order to generate optimal multi‐group TLS descriptions of the constituent protein chains. TLSMD is applicable to crystal structures at any resolution. The models generated by TLSMD analysis can significantly improve the standard crystallographic residuals R and Rfree and can reveal intrinsic dynamic properties of the protein.
Small‐angle X‐ray scattering (SAXS) of macromolecules in solution is in increasing demand by an ever more diverse research community, both academic and industrial. To better serve user needs, and to allow automated and high‐throughput operation, a sample changer (BioSAXS Sample Changer) that is able to perform unattended measurements of up to several hundred samples per day has been developed. The Sample Changer is able to handle and expose sample volumes of down to 5 µl with a measurement/cleaning cycle of under 1 min. The samples are stored in standard 96‐well plates and the data are collected in a vacuum‐mounted capillary with automated positioning of the solution in the X‐ray beam. Fast and efficient capillary cleaning avoids cross‐contamination and ensures reproducibility of the measurements. Independent temperature control for the well storage and for the measurement capillary allows the samples to be kept cool while still collecting data at physiological temperatures. The Sample Changer has been installed at three major third‐generation synchrotrons: on the BM29 beamline at the European Synchrotron Radiation Facility (ESRF), the P12 beamline at the PETRA‐III synchrotron (EMBL@PETRA‐III) and the I22/B21 beamlines at Diamond Light Source, with the latter being the first commercial unit supplied by Bruker ASC.