Additional insight into this database is provided in this note from NASA Ames' John Stutz to UCI's John Gennari: ====================================================================== Date: Wed 25 Apr 90 18:27:22-PST From: JOHN STUTZ To: gennari@ICS.UCI.EDU Subject: RE: [Resent-MM Mail] Astral data. John There was an error in the IRAS-LRS data documentation. We ignored the presence of 5 fields of scaling information used to decompress the original data. This probably occurred because, once we had done the decompression, we never again considered these fields. The corrected documentation has been sent to Dave Aha and placed in the database. I include the changes below. With respect to the choice of attributes for clustering, the information is all in the location and spectral intensity values. The 'standard IRAS-LRC classification was based on spectral features. These include the peaks, plateaus and valleys, and their locations and relative magnitudes. They also considered the trend in intensity at each end of each spectra. If you could extract these features, you might expect to find classes fairly close to the official ones. But that could be a fairly complex modeling process that would be very specific to the field of spectral analysis. The plain truth is that this is not a good database for making predictions from a small set of features. AutoClass did well, despite it's vastly oversimplified probability model, because it used all 93 spectral attributes and 10 times as many instances. I doubt you can get anything interesting from the sky locations alone. However, the greatest variations in our classes were in the region of 8 to 11 microns. So this is a good region to concentrate on. You might also look for variations of the ratio of the higher and lower wavelength intensities. Given that the high magnitude declination sources mostly represent those which are relatively close, you might find declination dependencies. AutoClass found several pairs in which the two classes had very similar class mean spectra and quite different declination distributions. Since we did not use the location attributes there must be a relation between spectrum and position. Wishing you the best of success with your work. John Stutz - ------------------------------------------------------------------- 4. Relevant Information: The Infra-Red Astronomy Satellite (IRAS) was the first attempt to map the full sky at infra-red wavelengths. This could not be done from ground observatories because large portions of the infra-red spectrum is absorbed by the atmosphere. The primary observing program was the full high resolution sky mapping performed by scanning at 4 frequencies. The Low Resolution Observation (IRAS-LRS) program observed high intensity sources over two continuous spectral bands. This database derives from a subset of the higher quality LRS observations taken between 12h and 24h right ascension. This database contains 531 high quality spectra derived from the IRAS-LRS database. The original data contained 100 spectral measurements in each of two overlapping bands. Of these, 44 blue band and 49 red band channels contain usable flux measurements. Only these are included here. The original spectral intensities values are compressed to 4-digits, and each spectrum includes 5 rescaling parameters. We have used the LRS specified algorithm to rescale these to units of spectral intensity (Janskys). Total intensity differences have been eliminated by normalizing each spectrum to a mean value of 5000. This database was originally obtained for use in development and testing of our AutoClass system for Bayesian classification. We have not retained any results from this development, having concentrated our efforts of a 5425 element version of the same data. Our classifications were based upon simultaneous modeling of all 93 spectral intensities. With the larger database we were able to find classes that correspond well with known spectral types associated with particular stellar types. We also found classes that match with the spectra expected of certain stellar processes under investigation by Ames astronomers. These classes have considerably enlarged the set of stars being investigated by those researchers. 5. Number of Instances: 531 6. Number of Attributes: 103 (including the 10-attribute "header") 7. Attribute Information: 1. LRS-name: (Suspected format: 5 digits, "+" or "-", 4 digits) 2. LRS-class: integer - The LRS-class values range from 0 - 99 with the 10's digit giving the basic class and the 1's digit giving the subclass. These classes are based on features (peaks, valleys, and trends) of the spectral curves. 3. ID-type: integer 4. Right-Ascension: float - Astronomical longitude. 1h = 15deg 5. Declination: float - Astronomical lattitude. -90 <= Dec <= 90 6. Scale Factor: float - Proportional to source strength 7. Blue base 1: integer - linear rescaling coefficient 8. Blue base 2: integer - linear rescaling coefficient 9. Red base 1: integer - linear rescaling coefficient 10. Red base 2: integer - linear rescaling coefficient 11-54: fluxes from the following 44 blue-band channel wavelengths: 55-103: fluxes from the following 49 red-band channel wavelengths: