Bio.Structure: Macromolecular Structures

The Bio.Structure module provides functionality to manipulate macromolecular structures, and in particular to read and write Protein Data Bank (PDB) files. It is designed to be used for standard structural analysis tasks, as well as acting as a platform on which others can build to create more specific tools.

Parsing PDB files

To download a PDB file:

downloadpdb("1EN2")

To parse a PDB file into a Structure-Model-Chain-Residue-Atom framework:

julia> struc = read(filepath_1EN2, PDB)
Bio.Structure.ProteinStructure
Name                        -  1EN2.pdb
Number of models            -  1
Chain(s)                    -  A
Number of residues          -  85
Number of point mutations   -  5
Number of other molecules   -  5
Number of water molecules   -  76
Number of atoms             -  614
Number of hydrogens         -  0
Number of disordered atoms  -  27

The elements of struc can be accessed as follows:

Command	Returns	Return type
`struc[1]`	Model 1	`Model`
`struc[1]['A']`	Model 1, chain A	`Chain`
`struc['A']`	The lowest model (model 1), chain A	`Chain`
`struc['A']["50"]`	Model 1, chain A, residue 50	`AbstractResidue`
`struc['A'][50]`	Shortcut to above if it is not a hetero residue and the insertion code is blank	`AbstractResidue`
`struc['A']["H_90"]`	Model 1, chain A, hetero residue 90	`AbstractResidue`
`struc['A'][50]["CA"]`	Model 1, chain A, residue 50, atom name CA	`AbstractAtom`
`struc['A'][15]["CG"]['A']`	For disordered atoms, access a specific location	`Atom`

Disordered atoms are stored in a DisorderedAtom container but calls fall back to the default atom, so disorder can be ignored if you are not interested in it.

Disordered residues (i.e. point mutations with different residue names) are stored in a DisorderedResidue container.

The idea is that disorder will only bother you if you want it to. See the Biopython discussion for more.

Properties can be retrieved as follows:

Function	Returns	Return type
`serial`	Serial number of an atom	`Int`
`atomname`	Name of an atom	`String`
`altlocid`	Alternative location ID of an atom	`Char`
`x`	x coordinate of an atom	`Float64`
`y`	y coordinate of an atom	`Float64`
`z`	z coordinate of an atom	`Float64`
`coords`	coordinates of an atom	`Array{Float64,1}`
`occupancy`	Occupancy of an atom (default is `1.0`)	`Float64`
`tempfactor`	Temperature factor of an atom (default is `0.0`)	`Float64`
`element`	Element of an atom (default is `" "`)	`String`
`charge`	Charge of an atom (default is `" "`)	`String`
`residue`	Residue an atom belongs to	`Residue`
`ishetero`	`true` if the residue or atom is a hetero residue/atom	`Bool`
`isdisorderedatom`	`true` if the atom is disordered	`Bool`
`pdbline`	PDB ATOM/HETATM record for an atom	`String`
`resname`	Residue name of a residue or atom	`String`
`resnumber`	Residue number of a residue or atom	`Int`
`inscode`	Insertion code of a residue or atom	`Char`
`resid`	Residue ID of an atom or residue (`full=true` includes chain)	`String`
`atomnames`	Atom names of the atoms in a residue, sorted by serial	`Array{String,1}`
`atoms`	Dictionary of atoms in a residue	`Dict{String, AbstractAtom}`
`isdisorderedres`	`true` if the residue has multiple residue names	`Bool`
`disorderedres`	Access a particular residue name in a `DisorderedResidue`	`Residue`
`chain`	Chain a residue or atom belongs to	`Chain`
`chainid`	Chain ID of a chain, residue or atom	`Char`
`resids`	Sorted residue IDs in a chain	`Array{String,1}`
`residues`	Dictionary of residues in a chain	`Dict{String, AbstractResidue}`
`model`	Model a chain, residue or atom belongs to	`Model`
`modelnumber`	Model number of a model, chain, residue or atom	`Int`
`chainids`	Sorted chain IDs in a model or structure	`Array{Char,1}`
`chains`	Dictionary of chains in a model or structure	`Dict{Char, Chain}`
`structure`	Structure a model, chain, residue or atom belongs to	`ProteinStructure`
`structurename`	Name of the structure an element belongs to	`String`
`modelnumbers`	Sorted model numbers in a structure	`Array{Int,1}`
`models`	Dictionary of models in a structure	`Dict{Int, Model}`

The strip keyword argument determines whether surrounding whitespace is stripped for atomname, element, charge, resname and atomnames (default true).

The coordinates of an atom can be set using x!, y!, z! and coords!.

Manipulating structures

Elements can be looped over to reveal the sub-elements in the correct order:

for mod in struc
    for ch in mod
        for res in ch
            for at in res
                # Do something
            end
        end
    end
end

Models are ordered numerically; chains are ordered by character, except the empty chain is last; residues are ordered by residue number and insertion code with hetero residues after standard residues; atoms are ordered by atom serial.

collect can be used to get arrays of sub-elements. collectatoms, collectresidues, collectchains and collectmodels return arrays of a particular type from a structural element or element array.

Selectors are functions passed as additional arguments to these functions. Only elements that return true when passed to the selector are retained.

Command	Action	Return type
`collect(struc['A'][50])`	Collect the sub-elements of an element, e.g. atoms from a residue	`Array{AbstractAtom,1}`
`collectresidues(struc)`	Collect the residues of an element	`Array{AbstractResidue,1}`
`collectatoms(struc)`	Collect the atoms of an element	`Array{AbstractAtom,1}`
`collectatoms(struc, calphaselector)`	Collect the C-alpha atoms of an element	`Array{AbstractAtom,1}`
`collectatoms(struc, calphaselector, disorderselector)`	Collect the disordered C-alpha atoms of an element	`Array{AbstractAtom,1}`

It is easy to define your own atom, residue, chain or model selectors. The below will collect all atoms with x coordinate less than 0:

xselector(at::AbstractAtom) = x(at) < 0
collectatoms(struc, xselector)

Alternatively, you can use an anonymous function:

collectatoms(struc, at -> x(at) < 0)

countatoms, countresidues, countchains and countmodels can be used to count elements. For example:

julia> countatoms(struc)
754

julia> countatoms(struc, calphaselector)
85

julia> countresidues(struc, standardselector)
85

The sequence of a protein can be retrieved by passing a Chain or array of residues to AminoAcidSequence:

julia> AminoAcidSequence(struc['A'], standardselector)
85aa Amino Acid Sequence:
RCGSQGGGSTCPGLRCCSIWGWCGDSEPYCGRTCENKCWSGERSDHRCGAAVGNPPCGQDRCCSVHGWCGGGNDYCSGGNCQYRC

Writing PDB files

PDB format files can be written:

writepdb("1EN2_out.pdb", struc)

Any element type can be given as input to writepdb. Atom selectors can also be given as additional arguments:

writepdb("1EN2_out.pdb", struc, backboneselector)

Spatial calculations

Various functions are provided to calculate spatial quantities for proteins:

Command	Returns
`distance`	Minimum distance between two elements
`sqdistance`	Minimum square distance between two elements
`bondangle`	Angle between three atoms
`dihedralangle`	Dihedral angle defined by four atoms
`omegaangle`	Omega angle between a residue and the previous residue
`phiangle`	Phi angle between a residue and the previous residue
`psiangle`	Psi angle between a residue and the next residue
`ramachandranangles`	`Vector`s of phi and psi angles of an element
`contactmap`	Contact map of two element, or one element with itself
`rmsd`	RMSD between two elements of the same size - assumes they are superimposed
`displacements`	`Vector` of displacements between two elements of the same size - assumes they are superimposed

For example:

julia> distance(struc['A'][10], struc['A'][20])
10.782158874733762

julia> rad2deg(bondangle(struc['A'][50]["N"], struc['A'][50]["CA"], struc['A'][50]["C"]))
110.77765846083398

julia> rad2deg(dihedralangle(struc['A'][50]["N"], struc['A'][50]["CA"], struc['A'][50]["C"], struc['A'][51]["N"]))
-177.38288114072924

julia> rad2deg(psiangle(struc['A'][50], struc['A'][51]))
-177.38288114072924

Examples

A few further examples of Bio.Structure usage are given below.

A) To plot the temperature factors of a protein, if you have Gadfly installed:

using Gadfly
calphas = collectatoms(struc, calphaselector)
plot(x=resnumber.(calphas),
     y=tempfactor.(calphas),
     Guide.xlabel("Residue number"),
     Guide.ylabel("Temperature factor"),
     Geom.line)

B) To print the PDB records for all C-alpha atoms within 5 Angstroms of residue 38:

for at in calphas
    if distance(struc['A'][38], at) < 5.0 && resnumber(at) != 38
        println(pdbline(at))
    end
end

D) To view the contact map of a structure:

cbetas = collectatoms(struc, cbetaselector)
contacts = contactmap(cbetas, 7.0)
for i in 1:length(cbetas)
    for j in 1:length(cbetas)
        if contacts[i,j]
            print("█")
        else
            print(" ")
        end
    end
    println()
end

cbetaselector selects C-beta atoms, or C-alpha atoms for glycine residues. contactmap can also be given two structural elements as arguments, in which case a non-symmetrical 2D array is returned showing contacts between the elements.

E) To show the Ramachandran phi/psi angle plot of a structure, if you have Gadfly installed:

using Gadfly
phi_angles, psi_angles = ramachandranangles(struc, standardselector)
plot(x=rad2deg.(phi_angles),
     y=rad2deg.(psi_angles),
     Guide.xlabel("Phi / degrees"),
     Guide.ylabel("Psi / degrees"),
     Guide.xticks(ticks=[-180,-90,0,90,180]),
     Guide.yticks(ticks=[-180,-90,0,90,180]))

F) To calculate the RMSD and displacements between the heavy (non-hydrogen) atoms of two models in an NMR structure:

downloadpdb("1SSU")
struc_nmr = read("1SSU.pdb", PDB)
rmsd(struc_nmr[5], struc_nmr[10], heavyatomselector)
displacements(struc_nmr[5], struc_nmr[10], heavyatomselector)