-->
Home » , » AutoClass solves the problem of automatic discovery of classes in data.
Saturday
1 March 2014

AutoClass solves the problem of automatic discovery of classes in data.

AutoClass solves the problem of automatic discovery of classes in data.

AutoClass solves the problem of automatic discovery of classes in data (sometimes called clustering or unsupervised learning), as distinct from the generation of class descriptions from labeled examples (called supervised learning). It aims to discover the 'natural' classes in the data.

The AutoClass project is applicable to observations of things that can be described by a set of attributes, without referring to other things.

The data values corresponding to each attribute are limited to be either numbers or the elements of a fixed set of symbols. With numeric data, a measurement error must be provided.

Introduction.

In previous years, the Bayes group at Ames Research Center developed the basic theory and associated algorithms for various kinds of general data analysis techniques. Our earliest efforts were applied to the problem of automatic classification of data. We implemented this theory in the Autoclass series of programs.

autoclass1

AutoClass takes a database of cases described by a combination of real and discrete valued attributes, and automatically finds the natural classes in that data. It does not need to be told how many classes are present or what they look like -- it extracts this information from the data itself.

The classes are described probabilistically, so that an object can have partial membership in the different classes, and the class definitions can overlap. AutoClass generates reports on the classes it has found at the end of its search. AutoClass has been used and tested on many data sets, both within NASA and by industry, academia and other agencies. These applications typically find surprising classifications that show patterns in the data unknown to the user.

Examples include: discovery of new classes of infra-red stars in the IRAS Low Resolution Spectral catalogue (see figure below; and see here and here for more information), new classes of airports in a database of all USA airports, discovery of classes of proteins, introns and other patterns in DNA/protein sequence data, and others.

Key features are:

  • determines the number of classes automatically;
  • can use mixed discrete and real valued data;
  • can handle missing values;
  • processing time is roughly linear in the amount of the data;
  • cases have probabilistic class membership;
  • allows correlation between attributes within a class;
  • generates reports describing the classes found; and
  • predicts "test" case class memberships from a "training" classification.

autoclass

If you liked this article, subscribe to the feed by clicking the image below to keep informed about new contents of the blog:

0 commenti:

Post a Comment

Random Posts

  • SlimBoat Guide: Kill distractive ads with Ad Blocker.
    27.02.2014 - 0 Comments
    A highly effective Ad Blocker (similar to the Ad Blocker Plus plugin for Chrome and Firefox) is integrated in SlimBoat Web Browser to block all the annoying and obtrusive ads (images or flash animation solely for the purpose of advertisement). The…
  • Dot-to-Dot Adventure Free Learn Numbers and Letters.
    24.12.2015 - 0 Comments
    This fun lesson on letters and numbers can be enjoyed by the whole family, with fully customizable options making it suitable for toddlers and older kids alike. Go from one dot to the next in the correct sequence in order to reveal the mystery…
  • Using HP2XX a HP-GL Converter: Advanced Subjects.
    13.04.2013 - 0 Comments
    The coordinate range. The natural unit of length in HP-GL is 1/40 mm = 0.025 mm, so a typical A4 page covers roughly 11000 x 7500 natural units. Typically, coordinates in HP-GL commands will be found in the range 0 ... 12000. hp2xx will tell you…
  • MeeGo is an open source, Linux project which brings together the Moblin project.
    29.10.2010 - 0 Comments
    MeeGo is an open-source Linux project which brings together the Moblin project, headed up by Intel, and Maemo, by Nokia, into a single open-source activity.It includes performance optimisations and features which enable the development of…
  • Midge, midi sequencing from the comfort of your text editor.
    19.03.2010 - 0 Comments
    Midge, for midi generator, is a text to midi translator. It creates type 1 (ie multitrack) midi files from text descriptions of music.The source language used is documented in the man page, and demonstrated in the source files in the examples…
  • ADMesh is a program for processing triangulated solid meshes.
    28.02.2014 - 0 Comments
    ADMesh is a program for processing triangulated solid meshes. Currently, ADMesh only reads the STL file format that is used for rapid prototyping applications, although it can write STL, VRML, OFF, and DXF files. ADMesh is a free software, it is…
  • Mixxx User Manual: Hardware Setup.
    19.09.2014 - 0 Comments
    This chapter describes the most common hardware setups to serve as examples. We will go over the setup of timecode records/CDs, MIDI control and keyboard control. Audio Output. Headphone cueing, or just cueing, is previewing the next track you…
  • New Compiz Fusion plugins and updates.
    03.03.2008 - 0 Comments
    To name a few: better KDE 4 integration, several magnification options (magnifying glass, anyone?), a "showmouse" plugin, a Rubik's cube, a maze and extensive Wiimote integration. Lots of screenshots and videos included. The changes have taken…
  • What Percentage of the Web Uses Open-Source? Infographic
    22.09.2012 - 0 Comments
    What Percentage of the Web Uses Open-Source? As it turns out, open-source products push the majority of websites on the internet today. Below are some fast facts about open-source products and the web infrastructure: Approximately 60% of all…
  • Friendica 2019.09 released
    25.10.2019 - 0 Comments
    We are proud to release the new version of Friendica ‘Dalmatian Bellflower‘ 2019.09. As usual this release contains many bug fixes (we closed some 60 issues from the list) and code enhancements alongside of some new features. Some highlights are:A…

Recent Posts

Recent Posts Widget

Popular Posts

Labels

Archive

page counter follow us in feedly
 
Copyright © 2014 Linuxlandit & The Conqueror Penguin
-->