Most of the information in the original Weka documentation applies to AstroWeka, this website mainly deals with the differences.
When loading VOTables there are a few things to take into consideration:
- All numerical data will be converted to double precision floating point values.
- Weka needs to be specifically told if an attribute is categorical.
Point one generally isn't a problem when working with data inside Weka, it only becomes a problem when Weka is used as part of a work flow involving other tools.For example, floating point numbers can't be reliably tested for exact equality, they have to be tested over an interval; if you try to process a VOTable using AstroWeka and then extract points based on a long id integer, you're going to have problems. Currently the best way to deal with ids is to edit the VOTable header so that they are red in as strings.
AstroWeka has three graphical user interfaces, which extend Weka's GUIs with Virtual Observatory tools. They are accessed from the GUIChooser.
The AstroExplorer GUI
The AstroExplorer provides an interactive way to extracting data from AstroGrid and experimenting with different machine learning tools on them.
All of Weka's machine learning tools are available from the AstroExplorer, and
can be accessed and configured using menus and forms. It provides the most
convenient way to quickly set up and evaluate a machine learning task.
The Experimenter GUI
Often, finding the best learning scheme for a given task is a matter of trial and error. Several techniques will need to be tested with different parameters, and their results analyzed to find the most suitable one. The Experimenter is used to automate this process, it can queue up multiple machine learning algorithms, to be run on multiple data sets and collect statistics on their performance.
The Knowledge Flow GUI
The Knowledge Flow provides a work flow type environment for AstroWeka. It provides an alternative way of using AstroWeka for those who like to think in terms of data flowing through a system. In addition, this interface can sometimes be more efficient than the Experimenter, as it can be used to perform some tasks on data sets one record at a time without loading the entire set into memory.