BioStructures API
Package extensions are used in order to reduce the number of dependencies:
- To use
LongAA
, callusing BioSequences
. - To use
pairalign
,superimpose!
,rmsd
/displacements
with thesuperimpose
option orTransformation
on structural elements, callusing BioAlignments
. - To use
DataFrame
, callusing DataFrames
. - To use
MetaGraph
, callusing MetaGraphs
. - To use
MMTFDict
orwritemmtf
, callimport MMTF
. - To use
rundssp!
,rundssp
or therun_dssp
option withread
/retrievepdb
, callusing DSSP_jll
. - To use
runstride!
,runstride
or therun_stride
option withread
/retrievepdb
, callusing STRIDE_jll
.
Exported names
BioStructures.BioStructures
BioStructures.AbstractAtom
BioStructures.AbstractResidue
BioStructures.Atom
BioStructures.AtomRecord
BioStructures.Chain
BioStructures.ContactMap
BioStructures.DisorderedAtom
BioStructures.DisorderedResidue
BioStructures.DistanceMap
BioStructures.MMCIFDict
BioStructures.MMCIFFormat
BioStructures.MMTFDict
BioStructures.MMTFFormat
BioStructures.Model
BioStructures.MolecularStructure
BioStructures.PDBConsistencyError
BioStructures.PDBFormat
BioStructures.PDBParseError
BioStructures.PDBXMLFormat
BioStructures.Residue
BioStructures.SpatialMap
BioStructures.StructuralElement
BioStructures.StructuralElementOrList
BioStructures.Transformation
BioStructures.backboneatomnames
BioStructures.calphaatomnames
BioStructures.cbetaatomnames
BioStructures.coilsscodes
BioStructures.helixsscodes
BioStructures.pdbextension
BioStructures.proteinresnames
BioStructures.sheetsscodes
BioStructures.threeletter_to_aa
BioStructures.waterresnames
Base.collect
Base.read
BioGenerics.distance
BioStructures.acidicresselector
BioStructures.aliphaticresselector
BioStructures.allselector
BioStructures.altlocid
BioStructures.altlocids
BioStructures.applyselectors
BioStructures.applyselectors!
BioStructures.applytransform
BioStructures.applytransform!
BioStructures.aromaticresselector
BioStructures.atomid
BioStructures.atomname
BioStructures.atomnames
BioStructures.atomnameselector
BioStructures.atoms
BioStructures.backboneselector
BioStructures.basicresselector
BioStructures.bondangle
BioStructures.calphaselector
BioStructures.cbetaselector
BioStructures.chain
BioStructures.chainid
BioStructures.chainid!
BioStructures.chainids
BioStructures.chains
BioStructures.charge
BioStructures.chargedresselector
BioStructures.choosedefaultaltlocid
BioStructures.coilselector
BioStructures.collectatoms
BioStructures.collectchains
BioStructures.collectmodels
BioStructures.collectresidues
BioStructures.coordarray
BioStructures.coords
BioStructures.coords!
BioStructures.countatoms
BioStructures.countchains
BioStructures.countmodels
BioStructures.countresidues
BioStructures.defaultaltlocid
BioStructures.defaultatom
BioStructures.defaultmodel
BioStructures.defaultresidue
BioStructures.defaultresname
BioStructures.dihedralangle
BioStructures.disorderedres
BioStructures.disorderselector
BioStructures.displacements
BioStructures.downloadallobsoletepdb
BioStructures.downloadentirepdb
BioStructures.downloadpdb
BioStructures.element
BioStructures.generatechainid
BioStructures.heavyatomselector
BioStructures.helixselector
BioStructures.heteroselector
BioStructures.hydrogenselector
BioStructures.hydrophobicresselector
BioStructures.inscode
BioStructures.isdisorderedatom
BioStructures.isdisorderedres
BioStructures.ishetero
BioStructures.model
BioStructures.modelnumber
BioStructures.modelnumbers
BioStructures.models
BioStructures.neutralresselector
BioStructures.nonpolarresselector
BioStructures.notwaterselector
BioStructures.occupancy
BioStructures.omegaangle
BioStructures.omegaangles
BioStructures.pdbentrylist
BioStructures.pdbline
BioStructures.pdbobsoletelist
BioStructures.pdbrecentchanges
BioStructures.pdbstatuslist
BioStructures.phiangle
BioStructures.phiangles
BioStructures.polarresselector
BioStructures.proteinselector
BioStructures.psiangle
BioStructures.psiangles
BioStructures.ramachandranangles
BioStructures.readmultimmcif
BioStructures.resid
BioStructures.resids
BioStructures.residue
BioStructures.residues
BioStructures.resname
BioStructures.resnames
BioStructures.resnameselector
BioStructures.resnumber
BioStructures.retrievepdb
BioStructures.rmsd
BioStructures.rundssp
BioStructures.rundssp!
BioStructures.runstride
BioStructures.runstride!
BioStructures.sequentialresidues
BioStructures.serial
BioStructures.sheetselector
BioStructures.showcontactmap
BioStructures.sidechainselector
BioStructures.spaceatomname
BioStructures.sqdistance
BioStructures.sscode
BioStructures.sscode!
BioStructures.sscodeselector
BioStructures.standardselector
BioStructures.structure
BioStructures.structurename
BioStructures.superimpose!
BioStructures.tempfactor
BioStructures.updatelocalpdb
BioStructures.waterselector
BioStructures.writemmcif
BioStructures.writemmtf
BioStructures.writemultimmcif
BioStructures.writepdb
BioStructures.x
BioStructures.x!
BioStructures.y
BioStructures.y!
BioStructures.z
BioStructures.z!
BioStructures.@sel_str
Non-exported names
Docstrings
BioStructures.BioStructures
— ModuleRead, write and manipulate macromolecular structures.
BioStructures.AbstractAtom
— TypeAn atom that is part of a macromolecule - either an Atom
or a DisorderedAtom
.
BioStructures.AbstractResidue
— TypeA residue (amino acid) or other molecule - either a Residue
or a DisorderedResidue
.
BioStructures.Atom
— TypeAn atom that is part of a macromolecule.
BioStructures.AtomRecord
— TypeA record for a single atom, e.g. as represented in a Protein Data Bank (PDB) file.
BioStructures.Chain
— TypeA chain (molecule) from a macromolecular structure.
BioStructures.ContactMap
— TypeContactMap(element, contact_distance)
ContactMap(element_one, element_two, contact_distance)
ContactMap(bit_array_2D)
Calculate the contact map for a StructuralElementOrList
, or between two StructuralElementOrList
s.
This returns a ContactMap
type containing a BitArray{2}
with true
where the sub-elements are no further than the contact distance and false
otherwise. When one element is given as input this returns a symmetric square matrix. To directly access the underlying data of ContactMap
cm
, use cm.data
.
Examples
cbetas_A = collectatoms(struc["A"], cbetaselector)
cbetas_B = collectatoms(struc["B"], cbetaselector)
# Contact map of chain A using conventional Cβ and 8 Å definitions
cm = ContactMap(cbetas_A, 8.0)
# Returns true if a contact is present between the tenth and twentieth element
cm[10, 20]
# Rectangular contact map of chains A and B
cm = ContactMap(cbetas_A, cbetas_B, 8.0)
# Write the contact map to file
using DelimitedFiles
writedlm("contacts.out", Int64.(cm.data), " ")
BioStructures.DisorderedAtom
— TypeA container to hold different locations of the same atom.
BioStructures.DisorderedResidue
— TypeA container to hold different versions of the same residue (point mutations).
BioStructures.DistanceMap
— TypeDistanceMap(element)
DistanceMap(element_one, element_two)
DistanceMap(float_array_2D)
Calculate the distance map for a StructuralElementOrList
, or between two StructuralElementOrList
s.
This returns a DistanceMap
type containing a Array{Float64, 2}
with minimum distances between the sub-elements. When one element is given as input this returns a symmetric square matrix. To directly access the underlying data of DistanceMap
dm
, use dm.data
.
Examples
cbetas_A = collectatoms(struc["A"], cbetaselector)
cbetas_B = collectatoms(struc["B"], cbetaselector)
# Distance map of chain A showing how far each Cβ atom is from the others
dm = DistanceMap(cbetas_A)
# Returns the distance between the tenth and twentieth element
dm[10, 20]
# Rectangular distance map of chains A and B
dm = DistanceMap(cbetas_A, cbetas_B)
# Write the distance map to file
using DelimitedFiles
writedlm("distances.out", dm.data, " ")
BioStructures.MMCIFDict
— TypeMMCIFDict(filepath; gzip=false)
MMCIFDict(io; gzip=false)
MMCIFDict()
A macromolecular Crystallographic Information File (mmCIF) dictionary.
Can be accessed using similar functions to a standard Dict
. Keys are field names as a String
and values are always Vector{String}
, even for multiple components or numerical data. To directly access the underlying dictionary of MMCIFDict
d
, use d.dict
. Call MMCIFDict
with a filepath or stream to read the dictionary from that source. The keyword argument gzip
(default false
) determines if the input is gzipped.
BioStructures.MMCIFFormat
— TypeProtein Data Bank (PDB) mmCIF file format.
BioStructures.MMTFDict
— TypeMMTFDict(filepath; gzip=false)
MMTFDict(io; gzip=false)
MMTFDict()
A Macromolecular Transmission Format (MMTF) dictionary.
Use of the dictionary requires the MMTF.jl package to be imported. Can be accessed using similar functions to a standard Dict
. Keys are field names as a String
and values are various types. To directly access the underlying dictionary of MMTFDict
d
, use d.dict
. Call MMTFDict
with a filepath or stream to read the dictionary from that source. The keyword argument gzip
(default false
) determines if the file is gzipped.
BioStructures.MMTFFormat
— TypeProtein Data Bank (PDB) MMTF file format.
BioStructures.Model
— TypeA conformation of a macromolecular structure.
BioStructures.MolecularStructure
— TypeA container for multiple Model
s that represents a Protein Data Bank (PDB) entry.
BioStructures.PDBConsistencyError
— TypeError arising from an attempt to make an inconsistent structural state.
BioStructures.PDBFormat
— TypeProtein Data Bank (PDB) file format.
BioStructures.PDBParseError
— TypeError arising from parsing a Protein Data Bank (PDB) file.
BioStructures.PDBXMLFormat
— TypeProtein Data Bank (PDB) XML file format.
BioStructures.Residue
— TypeA residue (amino acid) or other molecule.
BioStructures.SpatialMap
— TypeA map of a structural property, e.g. a ContactMap
or a DistanceMap
.
BioStructures.StructuralElement
— TypeA macromolecular structural element.
BioStructures.Transformation
— TypeTransformation(el1, el2, residue_selectors...)
Transformation(coords1, coords2)
Transformation(trans1, trans2, rot)
A 3D transformation to map one set of coordinates onto another, found using the Kabsch algorithm.
When called with structural elements, carries out a pairwise alignment and superimposes on atoms from aligned residues. In this case the BioSequences.jl and BioAlignments.jl packages should be imported. Keyword arguments for pairwise alignment can be given, see pairalign
. The residue selectors determine which residues to do the pairwise alignment on. The keyword argument alignatoms
is an atom selector that selects the atoms to calculate the superimposition on (default calphaselector
). Can also be called with two sets of coordinates of the same size, with the number of dimensions in the first axis and the number of points in the second axis.
The returned Transformation
object consists of the mean coordinates of the first set, the mean coordinates of the second set, the rotation to map the first centred set onto the second centred set, and the indices of the aligned residues in the first and second elements if relevant.
BioStructures.StructuralElementOrList
— TypeA StructuralElement
or Vector
of StructuralElement
s up to a Vector{Model}
.
BioStructures.backboneatomnames
— ConstantSet
of protein backbone atom names.
BioStructures.calphaatomnames
— ConstantSet
of Cα atom names.
BioStructures.cbetaatomnames
— ConstantSet
of Cβ atom names.
BioStructures.coilsscodes
— ConstantSet
of secondary structure codes corresponding to a coil.
BioStructures.helixsscodes
— ConstantSet
of secondary structure codes corresponding to an α-helix.
BioStructures.pdbextension
— ConstantMapping of Protein Data Bank (PDB) formats to their file extensions.
BioStructures.proteinresnames
— ConstantSet
of residue names found in proteins and peptides.
BioStructures.sheetsscodes
— ConstantSet
of secondary structure codes corresponding to a β-sheet.
BioStructures.threeletter_to_aa
— ConstantLookup table of amino acids, re-exported from BioSymbols.
BioStructures.waterresnames
— ConstantSet
of residue names corresponding to water.
Base.collect
— Methodcollect(el)
Returns a Vector
of the sub-elements in a StructuralElementOrList
, e.g. AbstractAtom
s in a Residue
or AbstractResidue
s in a Chain
.
Base.read
— Methodread(filepath::AbstractString, format::Type; <keyword arguments>)
read(input::IO, format::Type; <keyword arguments>)
Read a Protein Data Bank (PDB) file and return a MolecularStructure
.
Arguments
format::Type
: the format of the PDB file; options are PDBFormat, MMCIFFormat and MMTFFormat. MMTFFormat requires the MMTF.jl package to be imported.structure_name::AbstractString
: the name given to the returnedMolecularStructure
; defaults to the file name.remove_disorder::Bool=false
: whether to remove atoms with alt loc ID not ' ' or 'A'.read_std_atoms::Bool=true
: whether to read standard ATOM records.read_het_atoms::Bool=true
: whether to read HETATOM records.run_dssp::Bool=false
: whether to run DSSP to assign secondary structure. Requires the DSSP_jll.jl package to be imported if set totrue
.run_stride::Bool=false
: whether to run STRIDE to assign secondary structure. Requires the STRIDE_jll.jl package to be imported if set totrue
.gzip::Bool=false
: whether the input is gzipped, not available for PDB format.
BioGenerics.distance
— Methoddistance(element_one, element_two, atom_selectors...)
Get the minimum distance in Å between two StructuralElementOrList
s.
Additional arguments are atom selector functions - only atoms that return true
from the functions are retained.
BioStructures.acidicresselector
— Methodacidicresselector(res)
acidicresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, an acidic amino acid based on the residue name.
BioStructures.aliphaticresselector
— Methodaliphaticresselector(res)
aliphaticresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, an aliphatic amino acid based on the residue name.
BioStructures.allselector
— Methodallselector(at)
allselector(res)
Trivial selector that returns true
for any structural element.
BioStructures.altlocid
— Methodaltlocid(at)
Get the alternative location ID of an AbstractAtom
as a Char
.
BioStructures.altlocids
— Methodaltlocids(dis_at)
Get the list of alternative location IDs in an AbstractAtom
as a Vector{Char}
, sorted by atom serial.
BioStructures.applyselectors!
— Methodapplyselectors!(els, selectors...)
Removes from a Vector
of StructuralElement
s all elements that do not return true
from all the selector functions.
BioStructures.applyselectors
— Methodapplyselectors(els, selectors...)
Returns a copy of a Vector
of StructuralElement
s with all elements that do not return true
from all the selector functions removed.
BioStructures.applytransform!
— Methodapplytransform!(el, transformation)
Modify all coordinates in an element according to a transformation.
BioStructures.applytransform
— Methodapplytransform(coords, transformation)
Modify coordinates according to a transformation.
BioStructures.aromaticresselector
— Methodaromaticresselector(res)
aromaticresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, an aromatic amino acid based on the residue name.
BioStructures.atomid
— Methodatomid(at)
Get a descriptive atom ID for an AbstractAtom
as a Tuple
of the form (full residue ID, residue name, atom name).
BioStructures.atomname
— Methodatomname(at; strip=true)
Get the atom name of an AbstractAtom
as a String
. strip
determines whether surrounding whitespace is stripped.
BioStructures.atomnames
— Methodatomnames(res; strip=true)
Get the sorted list of AbstractAtom
s in an AbstractResidue
. strip
determines whether surrounding whitespace is stripped.
BioStructures.atomnameselector
— Methodatomnameselector(at, atom_names; strip=true)
Determines if an AbstractAtom
has its atom name in a list of names. strip
determines whether surrounding whitespace is stripped from the atom name before it is checked in the list.
BioStructures.atoms
— Methodatoms(res)
Return the dictionary of AbstractAtom
s in an AbstractResidue
.
BioStructures.backboneselector
— Methodbackboneselector(at)
Determines if an AbstractAtom
is not a hetero-atom and corresponds to a protein backbone atom.
BioStructures.basicresselector
— Methodbasicresselector(res)
basicresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, a basic amino acid based on the residue name.
BioStructures.bondangle
— Methodbondangle(atom_a, atom_b, atom_c)
bondangle(vec_ba, vec_bc)
Calculate the bond or pseudo-bond angle in radians between three AbstractAtom
s or two vectors.
The angle between B→A and B→C is returned in the range 0 to π.
BioStructures.calphaselector
— Methodcalphaselector(at)
Determines if an AbstractAtom
is not a hetero-atom and corresponds to a Cα atom.
BioStructures.cbetaselector
— Methodcbetaselector(at)
Determines if an AbstractAtom
is not a hetero-atom and corresponds to a Cβ atom, or a Cα atom in glycine.
BioStructures.chain
— Methodchain(at)
chain(res)
Return the Chain
that an AbstractAtom
or AbstractResidue
belongs to.
BioStructures.chainid!
— Methodchainid!(ch, id)
chainid!(res, id)
Set the chain ID of a Chain
or an AbstractResidue
to a new String
.
If a chain with this ID already exists, it will be removed from its current chain and added to that chain. If a chain with this ID does not exist, a new chain will be added to the model and this residue will be added to it. If moving this residue from a chain to a new chain leaves the old chain without residues, the old chain will be removed from the Model
.
BioStructures.chainid
— Methodchainid(el)
Get the chain ID of an AbstractAtom
, AbstractResidue
or Chain
as a String
.
BioStructures.chainids
— Methodchainids(model)
chainids(struc)
Get the sorted chain IDs of the chains in a Model
, or the default Model
of a MolecularStructure
, as a Vector{String}
.
BioStructures.chains
— Methodchains(model)
chains(struc)
Return the dictionary of Chain
s in a Model
, or the default Model
of a MolecularStructure
.
BioStructures.charge
— Methodcharge(at; strip=true)
Get the charge on an AbstractAtom
as a String
. The charge is set to " "
if not specified during atom creation. strip
determines whether surrounding whitespace is stripped.
BioStructures.chargedresselector
— Methodchargedresselector(res)
chargedresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, a charged amino acid based on the residue name.
BioStructures.choosedefaultaltlocid
— Methodchoosedefaultaltlocid(at_one, at_two)
Determine which of two Atom
s representing a disorered atom better qualifies as the default location.
The Atom
with the highest occupancy is chosen; in the case of ties the Atom
with the lowest alternative location ID in alphabetical order is chosen.
BioStructures.coilselector
— Methodcoilselector(res)
coilselector(at)
Determines if an AbstractResidue
or AbstractAtom
is part of a coil, i.e. whether the secondary structure code is in coilsscodes
.
BioStructures.collectatoms
— Methodcollectatoms(el)
Returns a Vector
of the atoms in a StructuralElementOrList
.
Additional arguments are atom selector functions - only atoms that return true
from all the functions are retained. The keyword argument expand_disordered
(default false
) determines whether to return all copies of disordered atoms separately.
BioStructures.collectchains
— Methodcollectchains(el)
Returns a Vector
of the chains in a StructuralElementOrList
.
Additional arguments are chain selector functions - only chains that return true
from all the functions are retained.
BioStructures.collectmodels
— Methodcollectmodels(el)
Returns a Vector
of the models in a StructuralElementOrList
.
Additional arguments are model selector functions - only models that return true
from all the functions are retained.
BioStructures.collectresidues
— Methodcollectresidues(el)
Returns a Vector
of the residues in a StructuralElementOrList
.
Additional arguments are residue selector functions - only residues that return true
from all the functions are retained. The keyword argument expand_disordered
(default false
) determines whether to return all copies of disordered residues separately.
BioStructures.coordarray
— Methodcoordarray(element, atom_selectors...)
Get the atomic coordinates in Å of a StructuralElementOrList
as a 2D Array
.
Each column corresponds to one atom, so the size is (3, natoms). Additional arguments are atom selector functions - only atoms that return true
from all the functions are retained. The keyword argument `expanddisordered(default
false`) determines whether to return coordinates for all copies of disordered atoms separately.
BioStructures.coords!
— Methodcoords!(at, new_coords)
Set the coordinates in Å of an AbstractAtom
to a Vector
of 3 numbers.
For DisorderedAtom
s only the default atom is updated.
BioStructures.coords
— Methodcoords(at)
Get the coordinates in Å of an AbstractAtom
as a Vector{Float64}
.
BioStructures.countatoms
— Methodcountatoms(el)
Return the number of atoms in a StructuralElementOrList
as an Int
.
Additional arguments are atom selector functions - only atoms that return true
from all the functions are counted. The keyword argument expand_disordered
(default false
) determines whether to return all copies of disordered atoms separately.
BioStructures.countchains
— Methodcountchains(el)
Return the number of Chain
s in a StructuralElementOrList
as an Int
.
Additional arguments are chain selector functions - only chains that return true
from all the functions are counted.
BioStructures.countmodels
— Methodcountmodels(el)
Return the number of Model
s in a StructuralElementOrList
as an Int
.
Additional arguments are model selector functions - only models that return true
from all the functions are counted.
BioStructures.countresidues
— Methodcountresidues(el)
Return the number of residues in a StructuralElementOrList
as an Int
.
Additional arguments are residue selector functions - only residues that return true
from all the functions are counted. The keyword argument expand_disordered
(default false
) determines whether to return all copies of disordered residues separately.
BioStructures.defaultaltlocid
— Methoddefaultaltlocid(dis_at)
Get the alternative location ID of the default Atom
in a DisorderedAtom
as a Char
.
The default is the highest occupancy, or lowest character alternative location ID for ties (i.e. 'A' beats 'B').
BioStructures.defaultatom
— Methoddefaultatom(dis_at)
Return the default Atom
in a DisorderedAtom
.
The default is the highest occupancy, or lowest character alternative location ID for ties (i.e. 'A' beats 'B').
BioStructures.defaultmodel
— Methoddefaultmodel(struc)
Get the default Model
in a MolecularStructure
.
This is the Model
with the lowest model number.
BioStructures.defaultresidue
— Methoddefaultresidue(dis_res)
Return the default Residue
in a DisorderedResidue
.
The default is the first name read in.
BioStructures.defaultresname
— Methoddefaultresname(dis_res)
Get the name of the default Residue
in a DisorderedResidue
as a String
.
The default is the first name read in.
BioStructures.dihedralangle
— Methoddihedralangle(atom_a, atom_b, atom_c, atom_d)
dihedralangle(vec_ab, vec_bc, vec_cd)
Calculate the dihedral angle in radians defined by four AbstractAtom
s or three vectors.
The angle between the planes defined by atoms (A, B, C) and (B, C, D) is returned in the range -π to π.
BioStructures.disorderedres
— Methoddisorderedres(dis_res, res_name)
Return the Residue
in a DisorderedResidue
with a given residue name.
BioStructures.disorderselector
— Methoddisorderselector(at)
disorderselector(res)
Determines whether an AbstractAtom
or AbstractResidue
is disordered, i.e. has multiple locations in the case of atoms or multiple residue names (point mutants) in the case of residues.
BioStructures.displacements
— Methoddisplacements(element_one, element_two, residue_selectors...)
displacements(element_one, element_two, superimpose=false)
displacements(coords_one, coords_two)
Get the displacements in Å between atomic coordinates from two StructuralElementOrList
s or two coordinate Array
s.
If superimpose
is true
(the default), the elements are superimposed before calculation and the displacements are calculated on the superimposed residues. In this case the BioSequences.jl and BioAlignments.jl packages should be imported. See Transformation
for keyword arguments. If superimpose
is false
the elements are assumed to be superimposed and must be of the same length. The keyword argument dispatoms
is an atom selector that selects the atoms to calculate displacements on (default calphaselector
).
BioStructures.downloadallobsoletepdb
— Methoddownloadallobsoletepdb(; <keyword arguments>)
Download all obsolete Protein Data Bank (PDB) files from the RCSB server.
Returns the list of PDB IDs downloaded. Requires an internet connection.
Arguments
obsolete_dir::AbstractString=pwd()
: the directory where the PDB files are downloaded; defaults to the current working directory.format::Type=PDBFormat
: the format of the PDB file; options are PDBFormat, PDBXMLFormat and MMCIFFormat. MMTF files are no longer available to download.overwrite::Bool=false
: if settrue
, overwrites the PDB file if it exists indir
; by default skips downloading the PDB file if it exists.
BioStructures.downloadentirepdb
— Methoddownloadentirepdb(; <keyword arguments>)
Download the entire Protein Data Bank (PDB) from the RCSB server.
Returns the list of PDB IDs downloaded. Ensure you have enough disk space and time before running. The function can be stopped any time and called again to resume downloading. Requires an internet connection.
Arguments
dir::AbstractString=pwd()
: the directory to which the PDB files are downloaded; defaults to the current working directory.format::Type=PDBFormat
: the format of the PDB file; options are PDBFormat, PDBXMLFormat and MMCIFFormat. MMTF files are no longer available to download.overwrite::Bool=false
: if settrue
, overwrites the PDB file if it exists indir
; by default skips downloading the PDB file if it exists.
BioStructures.downloadpdb
— Methoddownloadpdb(pdbid::AbstractString; <keyword arguments>)
downloadpdb(pdbid::AbstractArray{<:AbstractString, 1}; <keyword arguments>)
downloadpdb(f::Function, args...)
Download files from the Protein Data Bank (PDB) via RCSB.
When given an AbstractString
, e.g. "1AKE"
, downloads the PDB file and returns the path to the file. When given an Array{<:AbstractString, 1}
, downloads the PDB files in the array and returns an array of the paths to the files. When given a function as the first argument, runs the function with the downloaded filepath(s) as an argument then removes the file(s). Requires an internet connection.
Arguments
dir::AbstractString=pwd()
: the directory to which the PDB file is downloaded; defaults to the current working directory.format::Type=PDBFormat
: the format of the PDB file; options are PDBFormat, PDBXMLFormat and MMCIFFormat. MMTF files are no longer available to download.obsolete::Bool=false
: if settrue
, the PDB file is downloaded in the auto-generated "obsolete" directory inside the specifieddir
.overwrite::Bool=false
: if settrue
, overwrites the PDB file if it exists indir
; by default skips downloading the PDB file if it exists.ba_number::Integer=0
: if set > 0 downloads the respective biological assembly; by default downloads the PDB file.
BioStructures.element
— Methodelement(at; strip=true)
Get the element of an AbstractAtom
as a String
.
The element is set to " "
if not specified during atom creation. strip
determines whether surrounding whitespace is stripped.
BioStructures.generatechainid
— Methodgeneratechainid(entity_id)
Convert a positive Integer
into a chain ID.
Goes A to Z, then AA to ZA, AB to ZB etc. This is in line with Protein Data Bank (PDB) conventions.
BioStructures.heavyatomselector
— Methodheavyatomselector(at)
Determines if an AbstractAtom
corresponds to a heavy (non-hydrogen) atom and is not a hetero-atom.
BioStructures.helixselector
— Methodhelixselector(res)
helixselector(at)
Determines if an AbstractResidue
or AbstractAtom
is part of an α-helix, i.e. whether the secondary structure code is in helixsscodes
.
BioStructures.heteroselector
— Methodheteroselector(at)
heteroselector(res)
Determines if an AbstractAtom
represents a hetero atom, e.g. came from a HETATM record in a Protein Data Bank (PDB) file, or if an AbstractResidue
represents a hetero molecule, e.g. consists of HETATM records from a PDB file.
BioStructures.hydrogenselector
— Methodhydrogenselector(at)
Determines if an AbstractAtom
represents hydrogen.
Uses the element field where possible, otherwise uses the atom name.
BioStructures.hydrophobicresselector
— Methodhydrophobicresselector(res)
hydrophobicresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, a hydrophobic amino acid based on the residue name.
BioStructures.inscode
— Methodinscode(at)
inscode(res)
Get the insertion code of an AbstractAtom
or AbstractResidue
as a Char
.
BioStructures.isdisorderedatom
— Methodisdisorderedatom(at)
Determines if an AbstractAtom
is a DisorderedAtom
, i.e. if there are multiple locations present for an atom.
BioStructures.isdisorderedres
— Methodisdisorderedres(res)
Determine if an AbstractResidue
is a DisorderedResidue
, i.e. there are multiple residue names with the same residue ID.
BioStructures.ishetero
— Methodishetero(at)
ishetero(res)
Determines if an AbstractAtom
represents a hetero atom, e.g. came from a HETATM record in a Protein Data Bank (PDB) file, or if an AbstractResidue
represents a hetero molecule, e.g. consists of HETATM records from a PDB file.
BioStructures.model
— Methodmodel(el)
Return the Model
that an AbstractAtom
, AbstractResidue
or Chain
belongs to.
BioStructures.modelnumber
— Methodmodelnumber(el)
Get the model number of a Model
, Chain
, AbstractResidue
or AbstractAtom
as an Int
.
BioStructures.modelnumbers
— Methodmodelnumbers(struc)
Get the sorted model numbers from a MolecularStructure
as a Vector{Int}
.
BioStructures.models
— Methodmodels(struc)
Return the dictionary of Model
s in a MolecularStructure
.
BioStructures.neutralresselector
— Methodneutralresselector(res)
neutralresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, a neutral amino acid based on the residue name.
BioStructures.nonpolarresselector
— Methodnonpolarresselector(res)
nonpolarresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, a non-polar amino acid based on the residue name.
BioStructures.notwaterselector
— Methodnotwaterselector(res)
notwaterselector(at)
Determines if an AbstractResidue
or AbstractAtom
does not represent a water molecule, i.e. whether the residue name is not in waterresnames
.
BioStructures.occupancy
— Methodoccupancy(at)
Get the occupancy of an AbstractAtom
as a Float64
.
The occupancy is set to 1.0
if not specified during atom creation.
BioStructures.omegaangle
— Methodomegaangle(res, res_previous)
omegaangle(chain, res_id)
Calculate the omega angle in radians for an AbstractResidue
.
Arguments can either be a residue and the previous residue or a chain and the position as a residue ID. The first residue (or one at the given index) requires the atoms "N" and "CA" and the previous residue requires the atoms "CA" and "C". The angle is in the range -π to π.
BioStructures.omegaangles
— Methodomegaangles(element, residue_selectors...)
Calculate the Vector
of omega angles of a StructuralElementOrList
.
The vectors have NaN
for residues where an angle cannot be calculated, e.g. due to missing atoms or lack of an adjacent residue. The angle is in the range -π to π. Additional arguments are residue selector functions - only residues that return true
from the functions are retained.
BioStructures.pdbentrylist
— Methodpdbentrylist()
Obtain the list of all Protein Data Bank (PDB) entries from the RCSB server.
Requires an internet connection.
BioStructures.pdbline
— Methodpdbline(at::Atom)
pdbline(at::DisorderedAtom)
pdbline(at::AtomRecord)
Form a Protein Data Bank (PDB) format ATOM or HETATM record as a String
from an Atom
, DisorderedAtom
or AtomRecord
.
This will throw an ArgumentError
if a value cannot fit into the allocated space, e.g. the chain ID is longer than one character or the atom serial is greater than 99999. In this case consider using writemmcif
or writemmtf
to write a mmCIF file or a MMTF file.
BioStructures.pdbobsoletelist
— Methodpdbobsoletelist()
Obtain the list of all obsolete Protein Data Bank (PDB) entries from the RCSB server.
Requires an internet connection.
BioStructures.pdbrecentchanges
— Methodpdbrecentchanges()
Obtain three lists giving the added, modified and obsolete Protein Data Bank (PDB) entries from the recent RCSB weekly status files.
Requires an internet connection.
BioStructures.pdbstatuslist
— Methodpdbstatuslist(url::AbstractString)
Obtain the list of Protein Data Bank (PDB) entries from a RCSB weekly status file by specifying its URL.
An example URL is https://files.wwpdb.org/pub/pdb/pub/pdb/data/status/latest/added.pdb. Requires an internet connection.
BioStructures.phiangle
— Methodphiangle(res, res_previous)
phiangle(chain, res_id)
Calculate the phi angle in radians for an AbstractResidue
.
Arguments can either be a residue and the previous residue or a chain and the position as a residue ID. The first residue (or one at the given index) requires the atoms "N", "CA" and "C" and the previous residue requires the atom "C". The angle is in the range -π to π.
BioStructures.phiangles
— Methodphiangles(element, residue_selectors...)
Calculate the Vector
of phi angles of a StructuralElementOrList
.
The vectors have NaN
for residues where an angle cannot be calculated, e.g. due to missing atoms or lack of an adjacent residue. The angle is in the range -π to π. Additional arguments are residue selector functions - only residues that return true
from the functions are retained.
BioStructures.polarresselector
— Methodpolarresselector(res)
polarresselector(at)
Determines if an AbstractResidue
is, or an AbstractAtom
is part of, a polar amino acid based on the residue name.
BioStructures.proteinselector
— Methodproteinselector(res)
proteinselector(at)
Determines if an AbstractResidue
or AbstractAtom
is part of a protein or peptide based on the residue name.
BioStructures.psiangle
— Methodpsiangle(res, res_next)
psiangle(chain, res_id)
Calculate the psi angle in radians for an AbstractResidue
.
Arguments can either be a residue and the next residue or a chain and the position as a residue ID. The first residue (or one at the given index) requires the atoms "N", "CA" and "C" and the next residue requires the atom "N". The angle is in the range -π to π.
BioStructures.psiangles
— Methodpsiangles(element, residue_selectors...)
Calculate the Vector
of psi angles of a StructuralElementOrList
.
The vectors have NaN
for residues where an angle cannot be calculated, e.g. due to missing atoms or lack of an adjacent residue. The angle is in the range -π to π. Additional arguments are residue selector functions - only residues that return true
from the functions are retained.
BioStructures.ramachandranangles
— Methodramachandranangles(element, residue_selectors...)
Calculate the Vector
s of phi and psi angles of a StructuralElementOrList
.
The vectors have NaN
for residues where an angle cannot be calculated, e.g. due to missing atoms or lack of an adjacent residue. The angles are in the range -π to π. Additional arguments are residue selector functions - only residues that return true
from the functions are retained.
BioStructures.readmultimmcif
— Methodreadmultimmcif(filepath; gzip=false)
readmultimmcif(io; gzip=false)
Read multiple MMCIFDict
s from a filepath or stream. Each MMCIFDict
in the returned Dict{String, MMCIFDict}
corresponds to an mmCIF data block from the input. An example of such a file is the chemical component dictionary from the Protein Data Bank. The keyword argument gzip
(default false
) determines if the input is gzipped.
BioStructures.resid
— Methodresid(res; full=true)
Get a descriptive residue ID String
for an AbstractAtom
or AbstractResidue
.
Format is residue number then insertion code with "H" in front for hetero residues. If full
equals true
the chain ID is also added after a colon. Examples are "50A", "H20" and "10:A".
BioStructures.resids
— Methodresids(ch)
Get the sorted list of AbstractResidue
s in a Chain
.
BioStructures.residue
— Methodresidue(at)
Get the Residue
that an AbstractAtom
belongs to.
BioStructures.residues
— Methodresidues(ch)
Return the dictionary of AbstractResidue
s in a Chain
.
BioStructures.resname
— Methodresname(at; strip=true)
resname(res; strip=true)
Get the residue name of an AbstractAtom
or AbstractResidue
as a String
.
strip
determines whether surrounding whitespace is stripped.
BioStructures.resnames
— Methodresnames(dis_res)
Get the residue names in an AbstractResidue
as a Vector{String}
.
For a DisorderedResidue
there will be multiple residue names - in this case the default residue name is placed first, then the others are ordered alphabetically.
BioStructures.resnameselector
— Methodresnameselector(res, res_names)
resnameselector(at, res_names)
Determines if an AbstractResidue
or AbstractAtom
has its residue name in a list of names.
BioStructures.resnumber
— Methodresnumber(at)
resnumber(res)
Get the residue number of an AbstractAtom
or AbstractResidue
as an Int
.
BioStructures.retrievepdb
— Methodretrievepdb(pdbid::AbstractString; <keyword arguments>)
Download and read a Protein Data Bank (PDB) file or biological assembly from the RCSB server, returning a MolecularStructure
.
Requires an internet connection.
Arguments
pdbid::AbstractString
: the PDB ID to be downloaded and read.dir::AbstractString=pwd()
: the directory to which the PDB file is downloaded; defaults to the current working directory.format::Type=MMCIFFormat
: the format of the PDB file; options are PDBFormat, PDBXMLFormat and MMCIFFormat. MMTF files are no longer available to download.obsolete::Bool=false
: if settrue
, the PDB file is downloaded in the auto-generated "obsolete" directory inside the specifieddir
.overwrite::Bool=false
: if settrue
, overwrites the PDB file if it exists indir
; by default skips downloading the PDB file if it exists.ba_number::Integer=0
: if set > 0 downloads the respective biological assembly; by default downloads the PDB file.structure_name::AbstractString="$pdbid.pdb"
: the name given to the returnedMolecularStructure
; defaults to the PDB ID.remove_disorder::Bool=false
: whether to remove atoms with alt loc ID not ' ' or 'A'.read_std_atoms::Bool=true
: whether to read standard ATOM records.read_het_atoms::Bool=true
: whether to read HETATOM records.run_dssp::Bool=false
: whether to run DSSP to assign secondary structure. Requires the DSSP_jll.jl package to be imported if set totrue
.run_stride::Bool=false
: whether to run STRIDE to assign secondary structure. Requires the STRIDE_jll.jl package to be imported if set totrue
.
BioStructures.rmsd
— Methodrmsd(element_one, element_two, residue_selectors...)
rmsd(element_one, element_two, superimpose=false)
rmsd(coords_one, coords_two)
Get the root-mean-square deviation (RMSD) in Å between two StructuralElementOrList
s or two coordinate Array
s.
If superimpose
is true
(the default), the elements are superimposed before RMSD calculation and the RMSD is calculated on the superimposed residues. In this case the BioSequences.jl and BioAlignments.jl packages should be imported. See Transformation
for keyword arguments. If superimpose
is false
the elements are assumed to be superimposed and must be of the same length. The keyword argument rmsdatoms
is an atom selector that selects the atoms to calculate RMSD on (default calphaselector
).
BioStructures.rundssp
— Functionrundssp(struc)
rundssp(model)
rundssp(filepath_in, dssp_filepath_out)
Return a copy of the structural element with DSSP (Define Secondary Structure of Proteins) run to assign secondary structure, or run DSSP directly on a PDB or mmCIF file.
Requires the DSSP_jll.jl package to be imported.
BioStructures.rundssp!
— Functionrundssp!(struc)
rundssp!(model)
Run DSSP (Define Secondary Structure of Proteins) on the given structural element to assign secondary structure.
Requires the DSSP_jll.jl package to be imported. A temporary PDB file is written, so this will fail if the structural element cannot be written to a PDB file, for example if there are two-letter chain IDs.
BioStructures.runstride
— Functionrunstride(struc)
runstride(model)
runstride(filepath_in, stride_filepath_out)
Return a copy of the structural element with STRIDE run to assign secondary structure, or run STRIDE directly on a PDB file.
Requires the STRIDE_jll.jl package to be imported.
BioStructures.runstride!
— Functionrunstride!(struc)
runstride!(model)
Run STRIDE on the given structural element to assign secondary structure.
Requires the STRIDE_jll.jl package to be imported. A temporary PDB file is written, so this will fail if the structural element cannot be written to a PDB file, for example if there are two-letter chain IDs.
BioStructures.sequentialresidues
— Methodsequentialresidues(res_first, res_second)
Determine if the second residue follows the first in sequence.
For this to be true
the residues need to have the same chain ID, both need to be standard/hetero residues and the residue number of the second needs to be one greater than that of the first (or the residue numbers the same and the insertion code of the second greater than the first).
BioStructures.serial
— Methodserial(at)
Get the serial number of an AbstractAtom
as an Int
.
BioStructures.sheetselector
— Methodsheetselector(res)
sheetselector(at)
Determines if an AbstractResidue
or AbstractAtom
is part of a β-sheet, i.e. whether the secondary structure code is in sheetsscodes
.
BioStructures.showcontactmap
— Methodshowcontactmap(contact_map)
showcontactmap(io, contact_map)
Print a representation of a ContactMap
to stdout
, or a specified IO
instance.
A fully plotted version can be obtained with plot(contact_map)
but that requires Plots.jl; showcontactmap
works without that dependency.
BioStructures.sidechainselector
— Methodsidechainselector(at)
Determines if an AbstractAtom
is not a hetero-atom and corresponds to a protein side chain atom.
BioStructures.spaceatomname
— Methodspaceatomname(at::Atom)
Space an Atom
name such that the last element letter (generally) appears in the second column.
If the element
property of the Atom
is set it is used to get the element, otherwise the name starts from the second column where possible. This function is generally not required as spacing is recorded when atom names are read in from a Protein Data Bank (PDB) file. However this spacing can be important, for example distinguising between Cα and calcium atoms.
BioStructures.sqdistance
— Methodsqdistance(element_one, element_two, atom_selectors...)
Get the minimum square distance in Å between two StructuralElementOrList
s.
Additional arguments are atom selector functions - only atoms that return true
from the functions are retained.
BioStructures.sscode!
— Methodsscode!(res, ss_code)
Set the secondary structure code of an AbstractResidue
to a Char
.
BioStructures.sscode
— Methodsscode(res)
sscode(at)
Get the secondary structure code of an AbstractResidue
or AbstractAtom
as a Char
.
'-'
represents unassigned secondary structure. Secondary structure can be assigned using rundssp!
or runstride!
.
BioStructures.sscodeselector
— Methodsscodeselector(res, ss_codes)
sscodeselector(at, ss_codes)
Determines if an AbstractResidue
or AbstractAtom
has its secondary structure code in a list of secondary structure codes.
BioStructures.standardselector
— Methodstandardselector(at)
standardselector(res)
Determines if an AbstractAtom
represents a standard atom, e.g. came from a ATOM record in a Protein Data Bank (PDB) file, or if an AbstractResidue
represents a standard molecule, e.g. consists of ATOM records from a PDB file.
BioStructures.structure
— Methodstructure(el)
Return the MolecularStructure
that an AbstractAtom
, AbstractResidue
, Chain
or Model
belongs to.
BioStructures.structurename
— Methodstructurename(el)
Get the name of the MolecularStructure
that a StructuralElement
belongs to as a String
.
BioStructures.superimpose!
— Methodsuperimpose!(el1, el2, residue_selectors...)
Calculate the Transformation
that maps the first element onto the second, and modify all coordinates in the first element according to the transformation.
Requires the BioSequences.jl and BioAlignments.jl packages to be imported. See Transformation
for keyword arguments.
BioStructures.tempfactor
— Methodtempfactor(at)
Get the temperature factor of an AbstractAtom
as a Float64
.
The temperature factor is set to 0.0
if not specified during atom creation.
BioStructures.updatelocalpdb
— Methodupdatelocalpdb(; dir::AbstractString=pwd(), format::Type=PDBFormat)
Update a local copy of the Protein Data Bank (PDB).
Obtains the recent weekly lists of new, modified and obsolete PDB entries and automatically updates the PDB files of the given format
inside the local dir
directory. Requires an internet connection.
BioStructures.waterselector
— Methodwaterselector(res)
waterselector(at)
Determines if an AbstractResidue
or AbstractAtom
represents a water molecule, i.e. whether the residue name is in waterresnames
.
BioStructures.writemmcif
— Methodwritemmcif(output, element, atom_selectors...; gzip=false)
writemmcif(output, mmcif_dict; gzip=false)
Write a StructuralElementOrList
or a MMCIFDict
to a mmCIF format file or output stream.
Atom selector functions can be given as additional arguments - only atoms that return true
from all the functions are retained. The keyword argument expand_disordered
(default true
) determines whether to return all copies of disordered residues and atoms. The keyword argument gzip
(default false
) determines if the output is gzipped.
BioStructures.writemmtf
— Functionwritemmtf(output, element, atom_selectors...; gzip=false)
writemmtf(output, mmtf_dict; gzip=false)
Write a StructuralElementOrList
or a MMTFDict
to a MMTF file or output stream.
Requires the MMTF.jl package to be imported. Atom selector functions can be given as additional arguments - only atoms that return true
from all the functions are retained. The keyword argument expand_disordered
(default true
) determines whether to return all copies of disordered residues and atoms. The keyword argument gzip
(default false
) determines if the file should be gzipped.
BioStructures.writemultimmcif
— Methodwritemultimmcif(filepath, cifs; gzip=false)
writemultimmcif(io, cifs; gzip=false)
Write multiple MMCIFDict
s as a Dict{String, MMCIFDict}
to a filepath or stream. The keyword argument gzip
(default false
) determines if the output is gzipped.
BioStructures.writepdb
— Methodwritepdb(output, element, atom_selectors...)
Write a StructuralElementOrList
to a Protein Data Bank (PDB) format file or output stream.
Only ATOM, HETATM, MODEL and ENDMDL records are written - there is no header and there are no TER records. Atom selector functions can be given as additional arguments - only atoms that return true
from all the functions are retained. The keyword argument expand_disordered
(default true
) determines whether to return all copies of disordered residues and atoms.
BioStructures.@sel_str
— MacroString selection syntax.
BioStructures.x
— Functionx(at)
Get the x coordinate in Å of an AbstractAtom
as a Float64
.
BioStructures.x!
— Functionx!(at, val)
Set the x coordinate in Å of an AbstractAtom
to val
.
For DisorderedAtom
s only the default atom is updated.
BioStructures.y
— Functiony(at)
Get the y coordinate in Å of an AbstractAtom
as a Float64
.
BioStructures.y!
— Functiony!(at, val)
Set the y coordinate in Å of an AbstractAtom
to val
.
For DisorderedAtom
s only the default atom is updated.
BioStructures.z
— Functionz(at)
Get the z coordinate in Å of an AbstractAtom
as a Float64
.
BioStructures.z!
— Functionz!(at, val)
Set the z coordinate in Å of an AbstractAtom
to val
.
For DisorderedAtom
s only the default atom is updated.