DBSCAN Clustering
Cluster detection.
Description
The DBSCAN Clusterer block performs clustering of signal data against noise using a spatial point density based algorithm (DBSCAN). The DBSCAN Clusterer block can cluster any type of data. The block can also define a clustering threshold (epsilon) and removes data ambiguity in two dimensions.
Ports
Input
X - input data
real matrix N on P
Input data given as a real matrix N on P, where N is the number of data points to cluster. P is the number of feature dimensions.
DBSCAN algorithm can cluster any type of data with appropriate settings Minimum number of points in a cluster and Cluster threshold epsilon.
Data types: Float16
, Float32
, Float64
, Int8
, Int16
, Int32
, Int64
, UInt8
, UInt16
, UInt32
, UInt64
.
Update - enable automatic epsilon update
false (by default)
| true
Enables automatic updating of epsilon estimation, set as false
or true
.
-
true
- the epsilon threshold is first estimated as the average of all curvatures of the k-NN search curve. The estimate is then added to a buffer of size L specified by the Length of cluster threshold epsilon history parameter. The final epsilon value is calculated as the average of the epsilon history buffer of length L. If the Length of cluster threshold epsilon history parameters is set to one, the estimation is memoryless. No memory means that each epsilon estimate is used immediately and no smoothing of the moving average takes place. -
false
- the previous epsilon estimate is used. Epsilon estimation is computationally intensive and is not recommended for large datasets.
Dependencies
To use this port, set the Source of cluster threshold epsilon parameter to Auto
and set the Maximum number of points for 'Auto' epsilon parameters.
Data types: bool
.
AmbLims - ambiguity bounds
real vector-string 1 by 2
| real matrix 2 by 2
Ambiguity bounds given as a real vector-string 1 by 2 or a real matrix 2 by 2.
For one degree of freedom, specify the limits as a 1-by-2 vector [MinAmbiguityLimitDimension1,MaxAmbiguityLimitDimension1]
. For two degrees of freedom, specify the limits as a 2-by-2 matrix [MinAmbiguityLimitDimension1, MaxAmbiguityLimitDimension1; MinAmbiguityLimitDimension2, MaxAmbiguityLimitDimension2]
.
Clustering can occur across boundaries to ensure that ambiguous detections are appropriately clustered for the two dimensions. The columns of interest of the X port input data are defined using the Indices of ambiguous dimensions parameters. The AmbLims parameters define the minimum and maximum ambiguity limits in the same units used in the Indices of ambiguous dimensions columns of the X input data.
Dependencies
To use this port, select the Enable disambiguation of dimensions checkbox.
Data types: Float16
, Float32
, Float64
, Int8
, Int16
, Int32
, Int64
, UInt8
, UInt16
, UInt32
, UInt64
.
Output
Idx - cluster indices
numeric vector-column N by 1
Cluster indices returned as an integer vector-column of N by 1. The cluster identifiers represent the clustering results of the DBSCAN algorithm. A value equal to '-1' indicates a DBSCAN noise point. Positive values Idx correspond to clusters satisfying the DBSCAN clustering criteria.
Dependencies
To use this port, set the Define outputs for block parameters to Index
or Index and ID
.
Data types: Float16
, Float32
, Float64
, Int8
, Int16
, Int32
, Int64
, UInt8
, UInt16
, UInt32
, UInt64
.
Clusters - alternative identifiers of clusters
Enumeric vector-string 1 by N
Alternative cluster identifiers returned as an integer vector of 1 by N positive integers. Each value represents a unique identifier pointing to a hypothetical target cluster. This argument contains unique positive cluster identifiers for all points, including noise. In contrast to this argument, the output argument Idx labels the noise points with the value -1
.
Dependencies
To use this port, set the Define outputs for block parameters to Cluster ID
or Index and ID
.
Data types: Float16
, Float32
, Float64
, Int8
, Int16
, Int32
, Int64
, UInt8
, UInt16
, UInt32
, UInt64
.
Parameters
Define outputs for Engee block - cluster data output type
Index and ID (by default)
| Cluster ID
| Index
The cluster data output type, specified as:
-
Index and ID
- includes Idx and Clusters output ports. -
Cluster ID
- includes only the Clusters output port. -
Index
- includes only the Idx output port.
Source of cluster threshold epsilon - source of epsilon
Property (by default)
| Auto
Source of epsilon for cluster threshold:
-
Property
- the source of the epsilon is the Cluster threshold epsilon parameters. -
Auto
- the epsilon is calculated automatically using a k-nearest neighbours (k-NN) search. The search is calculated with k ranging from the value of the parameter Minimum number of points in a cluster minus one to the value of the parameter Maximum number of points for 'Auto' epsilon minus one. Subtraction of one is necessary because the neighbourhood of a point includes the point itself.
Cluster threshold epsilon - cluster neighbourhood size
10.0 (by default)
| positive scalar
| positive real vector-string 1 on P
The size of the cluster neighbourhood for the search query, given as a positive scalar or real vector of strings 1 on P. P is the number of clustering dimensions in the input data X.
Epsilon defines the radius around a point within which to count the number of detections. If epsilon is a scalar, the same value is applied to all clustering feature measurements. You can specify different epsilon values for different clustering measurements by specifying a real vector-string of 1 on P. Using a vector-string creates a multidimensional elliptical search area, which is useful when data columns have different physical values such as range and Doppler.
Minimum number of points in a cluster - the minimum number of points required for a cluster
3 (By default)
| positive integer
The minimum number of points required for a cluster is specified as a positive integer. This parameters defines the minimum number of points in a cluster when determining whether a point is a reference point.
Maximum number of points for 'Auto' epsilon - maximum number of points required for a cluster
10 (By default)
| positive integer
.
The maximum number of points in a cluster, specified as a positive integer. This property is used to estimate epsilon when the object performs a k-NN search.
Dependencies
To use this parameter, set the Source of cluster threshold epsilon
parameter to Auto
.
Length of cluster threshold epsilon history - length of cluster threshold (epsilon) history
10 (By default)
| `positive integer `
The length of the stored cluster threshold (epsilon) history, specified as a positive integer. If set to 1
, the history requires no memory. Then each epsilon score is used immediately and no moving average smoothing takes place. If the value is greater than one, the epsilon value is averaged over the specified history length.
Example: 5
Enable disambiguation of dimensions - enable unambiguous measurement
enable (by default)
| `enable'.
A checkbox to enable disambiguation of dimensions.
If checked, clustering is performed on boundaries defined by the input port AmbLims values at runtime. Ambiguous detections are clustered accordingly.
Use the Indices of ambiguous dimensions parameter to specify the X column indices where ambiguities may occur. Up to two ambiguous dimensions are allowed. It is not recommended to enable the mismatch function for large datasets.
Indices of ambiguous dimensions - indices of ambiguous dimensions
1 (By default)
| positive integer
| vector of positive integers 1 by 2
.
Indices of ambiguous dimensions, specified as a positive integer or a vector of positive integers 1 by 2.
This property specifies the indices of the X input port data columns where mismatch may occur. A positive integer corresponds to one ambiguous dimension in the input data matrix X. A 1-by-2 string vector corresponds to two ambiguous measurements. The size and order of the Indices of ambiguous dimensions parameters must match the value of the AmbLims input port.
*Example: [3 4]
Dependencies
To use this parameter, select the Enable disambiguation of dimensions checkbox.