Modellselektion von Clusteringverfahren mittels minimaler Beschreibungslaenge

Selb, Alexander

by Alexander Selb

Abstract:

This diploma thesis presents the use of the \emphminimum description length (\emphMDL) principle for unsupervised clustering. Two frameworks are presented which determine the right size of neural networks used for \emphcrisp/hard and \emphfuzzy clustering purposes. The optimization procedure starts with a network consisting of a large number of network nodes and iteratively reduces the net size until a complexity criterion is met. The \emphMDL principle is used to measure the complexity of the network. Different instantiations of the \emphMDL algorithm, depending on the coding procedure, are possible and derived in the thesis. In addition, we demonstrate how the \emphMDL principle increases the robustness of the training algorithm by providing a criterion for \emphoutlier detection. Also a \emphgrowing component based on \emphMDL is presented to cope with the formation of new data clusters due to non-stationary data. In order to overcome the problem of initialization the \emphgrowing neural gas (GNG) algorithm is used which thereby also increases the computational efficiency of the \emphMDL algorithm. Furthermore, we present additional methods to increase the computational efficiency. Typical applications, like 2 dimensional clustering, data compression, \emphRGB color image segmentation and learning the \emphhand-eye coordination for robot control, are used to demonstrate the behaviour of the \emphMDL algorithm.

View PDF

Reference:

Modellselektion von Clusteringverfahren mittels minimaler Beschreibungslaenge (Alexander Selb), Technical report, PRIP, TU Wien, 2000.

Bibtex Entry:

@TechReport{TR062,
  author =	 "Alexander Selb",
  institution =	 "PRIP, TU Wien",
  number =	 "PRIP-TR-062",
  title =	 {Modellselektion von {C}lusteringverfahren mittels
                  minimaler {B}eschreibungslaenge},
  year =	 "2000",
  url =		 "https://www.prip.tuwien.ac.at/pripfiles/trs/tr62.pdf",
  abstract =	 "This diploma thesis presents the use of the
                  \emph{minimum description length} (\emph{MDL})
                  principle for unsupervised clustering. Two
                  frameworks are presented which determine the right
                  size of neural networks used for \emph{crisp/hard}
                  and \emph{fuzzy} clustering purposes. The
                  optimization procedure starts with a network
                  consisting of a large number of network nodes and
                  iteratively reduces the net size until a complexity
                  criterion is met. The \emph{MDL} principle is used
                  to measure the complexity of the network. Different
                  instantiations of the \emph{MDL} algorithm,
                  depending on the coding procedure, are possible and
                  derived in the thesis. In addition, we demonstrate
                  how the \emph{MDL} principle increases the
                  robustness of the training algorithm by providing a
                  criterion for \emph{outlier} detection. Also a
                  \emph{growing} component based on \emph{MDL} is
                  presented to cope with the formation of new data
                  clusters due to non-stationary data. In order to
                  overcome the problem of initialization the
                  \emph{growing neural gas} (GNG) algorithm is used
                  which thereby also increases the computational
                  efficiency of the \emph{MDL} algorithm. Furthermore,
                  we present additional methods to increase the
                  computational efficiency. Typical applications, like
                  2 dimensional clustering, data compression,
                  \emph{RGB} color image segmentation and learning the
                  \emph{hand-eye} coordination for robot control, are
                  used to demonstrate the behaviour of the \emph{MDL}
                  algorithm.",
}