Compute secondary structures

First, install the package, according to the Installation instructions.

Load the package with:

julia> using ProteinSecondaryStructures

Here, we illustrate the computation of the secondary structure of a PDB file provided as an example:

julia> pdbfile = ProteinSecondaryStructures.Testing.examples[1].filename
"/home/user/.julia/dev/ProteinSecondaryStructures/test/data/pdb/pdb1fmc.pdb"

Then, to compute the secondary structure using the STRIDE algorithm, use:

julia> ss = stride_run(pdbfile)
510-element Vector{SSData}:
 SSData("MET", "A", 1, "C", 360.0, 150.62)
 SSData("PHE", "A", 2, "C", -69.01, 138.78)
 ⋮
 SSData("ASN", "B", 255, "C", -130.75, 360.0)

The output is a vector of SSData elements, which contain the residue name, chain, residue number, secondary structure code, and phi and psi angles of the residue. The list of codes follow the DSSP convention, described in Secondary structure classes.

Note

Alternativelly to the stride_run function, the dssp_run function can be used, to compute the secondary structures as defined by DSSP.

The details of the SSData structure that contain the output for each residue are described in the Data structure section.

Reference functions:

ProteinSecondaryStructures.stride_runFunction
stride_run(pdb_file::AbstractString; adjust_pdb=false)

Run stride on the PDB file and return a vector containing the stride detailed secondary structure information for each residue.

  • adjust_pdb=true is used to adjust the format of the PDB file before running stride, which requires a specific header and a specific empty-chain identifier. In this case, only the ATOM lines are kept in the PDB file.
  • If adjust_pdb=false, the PDB file provided is used as is.

Note that STRIDE will fail if residue or atoms types not recognized or if the header of the PDB file does not follow the necessary pattern.

Note
  • STRIDE might ignore some residues in the PDB file if they are not recognized, or incomplete.
  • STRIDE does not support structures in mmCIF format.
source
ProteinSecondaryStructures.dssp_runFunction
dssp_run(input_file::String; adjust_pdb=false)

Run DSSP on the PDB or mmCIF file provided and return a vector containing the detailed secondary structure information for each residue.

  • adjust_pdb option is used to fix the header of PDB files before running dssp, which is a common problem for computing the secondary structure from PDB files. In this case, only the ATOM lines are kept in the pdb file.
  • adjust_pdb=false, the PDB file provided is used as is. This (default) option must be used when the input file is in mmCIF format.

Note that DSSP will fail if residue or atoms types not recognized or if the header of the PDB file does not follow the necessary pattern.

source

Secondary structure composition

Given the ss output of the stride_run or dssp_run functions, an overview of content of the secondary structure can be obtained with the ss_composition function:

julia> comp = ss_composition(ss)
Dict{String, Int64} with 12 entries:
  "bend"        => 0
  "kappa helix" => 0
  "beta strand" => 77
  "strand"      => 81
  "loop"        => 0
  "310 helix"   => 21
  "turn"        => 70
  "helix"       => 263
  "beta bridge" => 4
  "alpha helix" => 242
  "pi helix"    => 0
  "coil"        => 96

julia> comp["alpha helix"]
242

The output is a dictionary containing the number of residues that were classified in each class. As shown above, this number can be retrieved individually.

Reference function

ProteinSecondaryStructures.ss_compositionFunction
ss_composition(data::AbstractVector{<:SSData})

Calculate the secondary structure composition of the data. Returns a dictionary of the secondary structure types and their counts.

source

Retrieving names, codes, and numeric codes

The name, single-character codes, or numeric codes of the secondary structure of each residue can be retrieved with the ss_name, ss_code, and ss_number functions. The input of these functions can be an instance of SSData or one of the other two secondary structure classification types (name, code, or number):

ProteinSecondaryStructures.ss_nameFunction
ss_name(ss::Union{SSData, Integer, String, Char})

Return the secondary structure name. The input may be a SSData object, a secondary structure Integer code (1-10) or a secondary structure code (G, H, ..., C).

The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.

Example

julia> using ProteinSecondaryStructures

julia> ss_name("H")
"alpha helix"

julia> ss_name(1)
"310 helix"

julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)

julia> ss_name(ss)
"alpha helix"
source
ProteinSecondaryStructures.ss_codeFunction
ss_code(code::Union{SSData,String,Integer})

Returns the one-letter secondary structure code. The input may be a secondary structure Integer code, a secondary structure name ("310 helix", "alpha helix", ..., "coil"), or a SSData object.

The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.

Example

julia> using ProteinSecondaryStructures

julia> ss_code(2)
"H"

julia> ss_code("beta bridge")
"B"

julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)

julia> ss_code(ss)
"H"
source
ProteinSecondaryStructures.ss_numberFunction
ss_number(code::Union{SSData,AbstractString,AbstractChar})

Returns the secondary structure number code. The input may be a secondary structure String code, a secondary structure name ("310 helix", "alpha helix", ..., "coil"), or a SSData object.

The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.

Example

julia> using ProteinSecondaryStructures

julia> ss_number("H")
2

julia> ss_number('B')
7

julia> ss_number("beta bridge")
7

julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)

julia> ss_number(ss)
2
source

These functions can be used to obtain arrays of codes, by broadcasting over the vector of secondary structure data. For example:

julia> using ProteinSecondaryStructures

julia> using ProteinSecondaryStructures.Testing: examples

julia> ss = stride_run(examples[1].filename);

julia> ss_name.(ss)[1:5]
5-element Vector{String}:
 "coil"
 "coil"
 "coil"
 "310 helix"
 "310 helix"

julia> join(ss_code.(ss)[1:15])
"CCCGGGGCTTTTEEE"

In the last case, the sequence of secondary structure elements of the first 15 residues is shown.