by Alexander Selb
Abstract:
This diploma thesis presents the use of the \emphminimum description length (\emphMDL) principle for unsupervised clustering. Two frameworks are presented which determine the right size of neural networks used for \emphcrisp/hard and \emphfuzzy clustering purposes. The optimization procedure starts with a network consisting of a large number of network nodes and iteratively reduces the net size until a complexity criterion is met. The \emphMDL principle is used to measure the complexity of the network. Different instantiations of the \emphMDL algorithm, depending on the coding procedure, are possible and derived in the thesis. In addition, we demonstrate how the \emphMDL principle increases the robustness of the training algorithm by providing a criterion for \emphoutlier detection. Also a \emphgrowing component based on \emphMDL is presented to cope with the formation of new data clusters due to non-stationary data. In order to overcome the problem of initialization the \emphgrowing neural gas (GNG) algorithm is used which thereby also increases the computational efficiency of the \emphMDL algorithm. Furthermore, we present additional methods to increase the computational efficiency. Typical applications, like 2 dimensional clustering, data compression, \emphRGB color image segmentation and learning the \emphhand-eye coordination for robot control, are used to demonstrate the behaviour of the \emphMDL algorithm.
Reference:
Modellselektion von Clusteringverfahren mittels minimaler Beschreibungslaenge (Alexander Selb), Technical report, PRIP, TU Wien, 2000.
Bibtex Entry:
@TechReport{TR062,
author = "Alexander Selb",
institution = "PRIP, TU Wien",
number = "PRIP-TR-062",
title = {Modellselektion von {C}lusteringverfahren mittels
minimaler {B}eschreibungslaenge},
year = "2000",
url = "https://www.prip.tuwien.ac.at/pripfiles/trs/tr62.pdf",
abstract = "This diploma thesis presents the use of the
\emph{minimum description length} (\emph{MDL})
principle for unsupervised clustering. Two
frameworks are presented which determine the right
size of neural networks used for \emph{crisp/hard}
and \emph{fuzzy} clustering purposes. The
optimization procedure starts with a network
consisting of a large number of network nodes and
iteratively reduces the net size until a complexity
criterion is met. The \emph{MDL} principle is used
to measure the complexity of the network. Different
instantiations of the \emph{MDL} algorithm,
depending on the coding procedure, are possible and
derived in the thesis. In addition, we demonstrate
how the \emph{MDL} principle increases the
robustness of the training algorithm by providing a
criterion for \emph{outlier} detection. Also a
\emph{growing} component based on \emph{MDL} is
presented to cope with the formation of new data
clusters due to non-stationary data. In order to
overcome the problem of initialization the
\emph{growing neural gas} (GNG) algorithm is used
which thereby also increases the computational
efficiency of the \emph{MDL} algorithm. Furthermore,
we present additional methods to increase the
computational efficiency. Typical applications, like
2 dimensional clustering, data compression,
\emph{RGB} color image segmentation and learning the
\emph{hand-eye} coordination for robot control, are
used to demonstrate the behaviour of the \emph{MDL}
algorithm.",
}