Compute secondary structures
First, install the package, according to the Installation instructions.
Load the package with:
julia> using ProteinSecondaryStructures
Here, we illustrate the computation of the secondary structure of a PDB file provided as an example:
julia> pdbfile = ProteinSecondaryStructures.Testing.examples[1].filename
"/home/user/.julia/dev/ProteinSecondaryStructures/test/data/pdb/pdb1fmc.pdb"
Then, to compute the secondary structure using the STRIDE
algorithm, use:
julia> ss = stride_run(pdbfile)
510-element Vector{SSData}:
SSData("MET", "A", 1, "C", 360.0, 150.62)
SSData("PHE", "A", 2, "C", -69.01, 138.78)
⋮
SSData("ASN", "B", 255, "C", -130.75, 360.0)
The output is a vector of SSData
elements, which contain the residue name, chain, residue number, secondary structure code, and phi and psi angles of the residue. The list of codes follow the DSSP convention, described in Secondary structure classes.
Alternativelly to the stride_run
function, the dssp_run
function can be used, to compute the secondary structures as defined by DSSP.
The details of the SSData
structure that contain the output for each residue are described in the Data structure section.
Reference functions:
ProteinSecondaryStructures.stride_run
— Functionstride_run(pdb_file::AbstractString; adjust_pdb=false)
Run stride on the PDB file and return a vector containing the stride
detailed secondary structure information for each residue.
adjust_pdb=true
is used to adjust the format of the PDB file before runningstride
, which requires a specific header and a specific empty-chain identifier. In this case, only the ATOM lines are kept in the PDB file.- If
adjust_pdb=false
, the PDB file provided is used as is.
Note that STRIDE
will fail if residue or atoms types not recognized or if the header of the PDB file does not follow the necessary pattern.
- STRIDE might ignore some residues in the PDB file if they are not recognized, or incomplete.
- STRIDE does not support structures in mmCIF format.
ProteinSecondaryStructures.dssp_run
— Functiondssp_run(input_file::String; adjust_pdb=false)
Run DSSP on the PDB or mmCIF file provided and return a vector containing the detailed secondary structure information for each residue.
adjust_pdb
option is used to fix the header of PDB files before runningdssp
, which is a common problem for computing the secondary structure from PDB files. In this case, only the ATOM lines are kept in the pdb file.adjust_pdb=false
, the PDB file provided is used as is. This (default) option must be used when the input file is in mmCIF format.
Note that DSSP
will fail if residue or atoms types not recognized or if the header of the PDB file does not follow the necessary pattern.
Secondary structure composition
Given the ss
output of the stride_run
or dssp_run
functions, an overview of content of the secondary structure can be obtained with the ss_composition
function:
julia> comp = ss_composition(ss)
Dict{String, Int64} with 12 entries:
"bend" => 0
"kappa helix" => 0
"beta strand" => 77
"strand" => 81
"loop" => 0
"310 helix" => 21
"turn" => 70
"helix" => 263
"beta bridge" => 4
"alpha helix" => 242
"pi helix" => 0
"coil" => 96
julia> comp["alpha helix"]
242
The output is a dictionary containing the number of residues that were classified in each class. As shown above, this number can be retrieved individually.
Reference function
ProteinSecondaryStructures.ss_composition
— Functionss_composition(data::AbstractVector{<:SSData})
Calculate the secondary structure composition of the data. Returns a dictionary of the secondary structure types and their counts.
Retrieving names, codes, and numeric codes
The name, single-character codes, or numeric codes of the secondary structure of each residue can be retrieved with the ss_name
, ss_code
, and ss_number
functions. The input of these functions can be an instance of SSData
or one of the other two secondary structure classification types (name, code, or number):
ProteinSecondaryStructures.ss_name
— Functionss_name(ss::Union{SSData, Integer, String, Char})
Return the secondary structure name. The input may be a SSData
object, a secondary structure Integer
code (1-10) or a secondary structure code (G, H, ..., C
).
The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.
Example
julia> using ProteinSecondaryStructures
julia> ss_name("H")
"alpha helix"
julia> ss_name(1)
"310 helix"
julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)
julia> ss_name(ss)
"alpha helix"
ProteinSecondaryStructures.ss_code
— Functionss_code(code::Union{SSData,String,Integer})
Returns the one-letter secondary structure code. The input may be a secondary structure Integer
code, a secondary structure name ("310 helix"
, "alpha helix"
, ..., "coil"
), or a SSData
object.
The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.
Example
julia> using ProteinSecondaryStructures
julia> ss_code(2)
"H"
julia> ss_code("beta bridge")
"B"
julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)
julia> ss_code(ss)
"H"
ProteinSecondaryStructures.ss_number
— Functionss_number(code::Union{SSData,AbstractString,AbstractChar})
Returns the secondary structure number code. The input may be a secondary structure String
code, a secondary structure name ("310 helix"
, "alpha helix"
, ..., "coil"
), or a SSData
object.
The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.
Example
julia> using ProteinSecondaryStructures
julia> ss_number("H")
2
julia> ss_number('B')
7
julia> ss_number("beta bridge")
7
julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)
julia> ss_number(ss)
2
These functions can be used to obtain arrays of codes, by broadcasting over the vector of secondary structure data. For example:
julia> using ProteinSecondaryStructures
julia> using ProteinSecondaryStructures.Testing: examples
julia> ss = stride_run(examples[1].filename);
julia> ss_name.(ss)[1:5]
5-element Vector{String}:
"coil"
"coil"
"coil"
"310 helix"
"310 helix"
julia> join(ss_code.(ss)[1:15])
"CCCGGGGCTTTTEEE"
In the last case, the sequence of secondary structure elements of the first 15 residues is shown.