Clustering.jl
PopGen.jl/src/KMeans.jl
📦 not exported | 🔵 exported by PopGen.jl |
---|
🔵 cutree
cutree(::PopData, hcres::Hclust; krange::UnitRange{Int64}, height::Union{Int64, Nothing} = nothing)
cutree(::PopData, hcres::Hclust; krange::Vector{Int64}, height::Union{Int64, Nothing} = nothing)
An expansion to the Clustering.cutree
method (from Clustering.jl) that performs cluster assignments over krange
on the Hclust
output from hclust()
. Returns a DataFrame
of sample names and columns corresponding to assignments
per k in krange
. The PopData
object is used only for retrieving the sample names.
Keyword Arguments
krange
: the number of desired clusters, given as a vector (ex.[2,4,5]
) or range (2:5
)h::Integer
: the height at which the tree is cut (optional)
🔵 cluster
cluster(::PopData, method::Function ; kwargs)
A convenience wrapper to perform clustering on a PopData
object determined by a designated method
(see below). The
chosen method must also be supplied with the appropriate keyword arguments for that method. For more information on
a specific method, see its docstring with ?methodname
Clustering Methods
kmeans
: K-means++ clustering- kwargs:
k
,iterations
,matrixtype
- kwargs:
kmedoids
: K-medoids clustering- kwargs:
k
,iterations
,distance
,matrixtype
- kwargs:
hclust
: Hierarchical clustering- kwargs:
linkage
,branchorder
,distance
,matrixtype
- kwargs:
fuzzycmeans
: Fuzzy C-means lustering- kwargs:
c
,fuzziness
,iterations
,matrixtype
- kwargs:
dbscan
: Density-based Spatial Clustering of Applications with Noise (DBSCAN)- kwargs:
radius
,minpoints
,distance
,matrixtype
- kwargs:
🔵 dbscan
dbscan(::PopData; radius::Float64, minpoints::Int64 = 2, distance::PreMetric = euclidean, matrixtype::Symbol = :pca)
An expansion of Clustering.dbscan
(from Clustering.jl) to perform Density-based Spatial Clustering of Applications with Noise (DBSCAN)
on a PopData object. This is a convenience method which converts the PopData
object to either an allele frequency or PCA matrix, and performs
DBSCAN clustering on the distance matrix of that. Returns a DbscanResult
object, which contains the assignments in the
.assignments
field.
Keyword Arguments
radius::Float64
: the radius of a point neighborhoodminpoints::Int
: the minimum number of a core point neighbors (default:2
)distance
: type of distance matrix to calculate onmatrixtype
(default:euclidean
)- see Distances.jl for a list of options (e.g. sqeuclidean, etc.)
matrixtype
: type of input matrix (default::pca
):pca
: matrix of Principal Components:freq
: matrix of allele frequencies
🔵 fuzzycmeans
fuzzycmeans(data::PopData; c::Int64, fuzziness::Int64 = 2, iterations::Int64 = 100, matrixtype::Symbol = :pca)
An expansion of Clustering.fuzzy_cmeans
(from Clustering.jl) to perform Fuzzy C-means clustering on a PopData object. This is a convenience method
which converts the PopData
object to either an allele frequency or PCA matrix, and performs Fuzzy C-means
clustering on the Euclidean distance matrix of that. Returns a FuzzyCMeansResult
object, which contains the assignment weights in the
.weights
field.
Keyword Arguments
c
: the number of desired clusters, given as anInteger
fuzziness::Integer
: clusters' fuzziness, must be >1 (default:2
)- a fuzziness of 2 is common for systems with unknown numbers of clusters
iterations::Int64
: the maximum number of iterations to attempt to reach convergence (default:100
)matrixtype
: type of input matrix to compute (default::pca
):pca
: matrix of Principal Components:freq
: matrix of scaled allele frequencies
🔵 hclust
hclust(data::PopData; linkage::Symbol = :single, branchorder::Symbol = :r, distance::PreMetric = euclidean, matrixtype::Symbol = :pca)
An expansion of Clustering.hclust
(from Clustering.jl) to perform hierarchical clustering on a PopData object. This is a convenience method
which converts the PopData
object to either an allele frequency or PCA matrix, converts that into a distance matrix, and performs hierarchical
clustering on that distance matrix. Returns an Hclust
object, which contains many metrics but does not include cluster assignments. Use
cutree(::PopData, ::Hclust; krange...)
to compute the sample assignments for a range of k
clusters.
Keyword Arguments
linkage
: defines how the distances between the data points are aggregated into the distances between the clusters (default::single
):single
: use the minimum distance between any of the cluster members:average
: use the mean distance between any of the cluster members:complete
: use the maximum distance between any of the members:ward
: the distance is the increase of the average squared distance of a point to its cluster centroid after merging the two clusters:ward_presquared
: same as:ward
, but assumes that the distances in the distance matrix are already squared.
branchorder
: algorithm to order leaves and branches (default::r
):r
: ordering based on the node heights and the original elements order (compatible with R's hclust):optimal
: branches are ordered to reduce the distance between neighboring leaves from separate branches using the "fast optimal leaf ordering" algorithm
distance
: type of distance matrix to calculate onmatrixtype
(default:euclidean
)- see Distances.jl for a list of options (e.g. sqeuclidean, etc.)
matrixtype
: type of input matrix (default::pca
):pca
: matrix of Principal Components:freq
: matrix of allele frequencies
🔵 kmeans
kmeans(data::PopData; k::Int64, iterations::Int64 = 100, matrixtype::Symbol = :pca)
Perform Kmeans clustering (using Kmeans++) on a PopData
object. Returns a KmeansResult
object. Use the keyword argument iterations
(default: 100) to set the maximum number of iterations allowed to
achieve convergence. Interally, kmeans clustering is performed on either the principal components of the scaled allele frequencies,
or just the scaled allele frequencies themselves. In both cases, missing
values are replaced by the global mean allele frequency.
Keyword Arguments
k
: the number of desired clusters, given as anInteger
iterations::Int64
: the maximum number of iterations to attempt to reach convergence (default:100
)matrixtype
: type of input matrix to compute (default::pca
):pca
: matrix of Principal Components:freq
: matrix of scaled allele frequencies
Example
julia> cats = @nancycats ;
julia> km = kmeans(cats, k = 2)
🔵 kmedoids
kmedoids(data::PopData; k::Int64, iterations::Int64 = 100, distance::PreMetric = euclidean, matrixtype::Symbol = :pca)
Perform Kmedoids clustering on a PopData
object. Returns a KmedoidsResult
object. Use the keyword argument iterations
(default: 100) to set the maximum number of iterations allowed to
achieve convergence. Interally, kmeans clustering is performed on either the principal components of the scaled allele frequencies,
or just the scaled allele frequencies themselves. In both cases, missing
values are replaced by the global mean allele frequency.
Keyword Arguments
k
: the number of desired clusters, given as anInteger
iterations::Int64
: the maximum number of iterations to attempt to reach convergence (default:100
)distance
: type of distance matrix to calculate onmatrixtype
(default:euclidean
)- see Distances.jl for a list of options (e.g. sqeuclidean, etc.)
matrixtype
: type of input matrix to compute (default::pca
):pca
: matrix of Principal Components:freq
: matrix of scaled allele frequencies
🔵 show
Base.show(io::IO, data::KMeansResults)