Compute secondary structures
First, install the package, according to the Installation instructions.
Load the package with:
julia> using ProteinSecondaryStructuresHere, we illustrate the computation of the secondary structure of a PDB file provided as an example:
julia> pdbfile = ProteinSecondaryStructures.Testing.examples[1].filename
"/home/user/.julia/dev/ProteinSecondaryStructures/test/data/pdb/pdb1fmc.pdb"Then, to compute the secondary structure using the STRIDE algorithm, use:
julia> ss = stride_run(pdbfile)
510-element Vector{SSData}:
SSData("MET", "A", 1, "C", 360.0, 150.62)
SSData("PHE", "A", 2, "C", -69.01, 138.78)
⋮
SSData("ASN", "B", 255, "C", -130.75, 360.0)The output is a vector of SSData elements, which contain the residue name, chain, residue number, secondary structure code, and phi and psi angles of the residue. The list of codes follow the DSSP convention, described in Secondary structure classes.
Alternativelly to the stride_run function, the dssp_run function can be used, to compute the secondary structures as defined by DSSP.
The details of the SSData structure that contain the output for each residue are described in the Data structure section.
Reference functions:
ProteinSecondaryStructures.stride_run — Functionstride_run(pdb_file::AbstractString; adjust_pdb=false)Run stride on the PDB file and return a vector containing the stride detailed secondary structure information for each residue.
adjust_pdb=trueis used to adjust the format of the PDB file before runningstride, which requires a specific header and a specific empty-chain identifier. In this case, only the ATOM lines are kept in the PDB file.- If
adjust_pdb=false, the PDB file provided is used as is.
Note that STRIDE will fail if residue or atoms types not recognized or if the header of the PDB file does not follow the necessary pattern.
- STRIDE might ignore some residues in the PDB file if they are not recognized, or incomplete.
- STRIDE does not support structures in mmCIF format.
ProteinSecondaryStructures.dssp_run — Functiondssp_run(input_file::String; adjust_pdb=false)Run DSSP on the PDB or mmCIF file provided and return a vector containing the detailed secondary structure information for each residue.
adjust_pdboption is used to fix the header of PDB files before runningdssp, which is a common problem for computing the secondary structure from PDB files. In this case, only the ATOM lines are kept in the pdb file.adjust_pdb=false, the PDB file provided is used as is. This (default) option must be used when the input file is in mmCIF format.
Note that DSSP will fail if residue or atoms types not recognized or if the header of the PDB file does not follow the necessary pattern.
Secondary structure composition
Given the ss output of the stride_run or dssp_run functions, an overview of content of the secondary structure can be obtained with the ss_composition function:
julia> comp = ss_composition(ss)
Dict{String, Int64} with 12 entries:
"bend" => 0
"kappa helix" => 0
"beta strand" => 77
"strand" => 81
"loop" => 0
"310 helix" => 21
"turn" => 70
"helix" => 263
"beta bridge" => 4
"alpha helix" => 242
"pi helix" => 0
"coil" => 96
julia> comp["alpha helix"]
242The output is a dictionary containing the number of residues that were classified in each class. As shown above, this number can be retrieved individually.
Reference function
ProteinSecondaryStructures.ss_composition — Functionss_composition(data::AbstractVector{<:SSData})Calculate the secondary structure composition of the data. Returns a dictionary of the secondary structure types and their counts.
Retrieving names, codes, and numeric codes
The name, single-character codes, or numeric codes of the secondary structure of each residue can be retrieved with the ss_name, ss_code, and ss_number functions. The input of these functions can be an instance of SSData or one of the other two secondary structure classification types (name, code, or number):
ProteinSecondaryStructures.ss_name — Functionss_name(ss::Union{SSData, Integer, String, Char})Return the secondary structure name. The input may be a SSData object, a secondary structure Integer code (1-10) or a secondary structure code (G, H, ..., C).
The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.
Example
julia> using ProteinSecondaryStructures
julia> ss_name("H")
"alpha helix"
julia> ss_name(1)
"310 helix"
julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)
julia> ss_name(ss)
"alpha helix"ProteinSecondaryStructures.ss_code — Functionss_code(code::Union{SSData,String,Integer})Returns the one-letter secondary structure code. The input may be a secondary structure Integer code, a secondary structure name ("310 helix", "alpha helix", ..., "coil"), or a SSData object.
The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.
Example
julia> using ProteinSecondaryStructures
julia> ss_code(2)
"H"
julia> ss_code("beta bridge")
"B"
julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)
julia> ss_code(ss)
"H"
ProteinSecondaryStructures.ss_number — Functionss_number(code::Union{SSData,AbstractString,AbstractChar})Returns the secondary structure number code. The input may be a secondary structure String code, a secondary structure name ("310 helix", "alpha helix", ..., "coil"), or a SSData object.
The classification follows the DSSP standard classes, described in the ProteinSecondaryStructures.jl documentation.
Example
julia> using ProteinSecondaryStructures
julia> ss_number("H")
2
julia> ss_number('B')
7
julia> ss_number("beta bridge")
7
julia> ss = SSData("ARG", "A", 1, "H", 0.0, 0.0)
SSData("ARG", "A", 1, "H", 0.0, 0.0)
julia> ss_number(ss)
2
These functions can be used to obtain arrays of codes, by broadcasting over the vector of secondary structure data. For example:
julia> using ProteinSecondaryStructures
julia> using ProteinSecondaryStructures.Testing: examples
julia> ss = stride_run(examples[1].filename);
julia> ss_name.(ss)[1:5]
5-element Vector{String}:
"coil"
"coil"
"coil"
"310 helix"
"310 helix"
julia> join(ss_code.(ss)[1:15])
"CCCGGGGCTTTTEEE"
In the last case, the sequence of secondary structure elements of the first 15 residues is shown.