Supplementary data for
Kalman filtering for
disease-state estimation
from microarray data
Abstract
Motivation:
In
this paper we propose using the Kalman filter as a pre-processing step in
microarray-based molecular diagnosis. Incorporating the expression covariance
between genes is important in such classification problems, since this
represents the functional relationships that govern tissue state. Failing to
fulfil such requirements may result in biologically implausible class
prediction models. Here we show that employing the Kalman filter to remove
noise (while retaining meaningful covariance and thus being able to estimate
the underlying biological state from microarray measurements) yields linearly
separable data suitable for most classification algorithms.
Results:
We
demonstrate the utility and performance of the Kalman filter as a robust
disease-state estimator on publicly available binary and multiclass microarray
datasets in combination with the most widely used classification methods to
date. Moreover, using popular graphical representation schemes we show that our
filtered datasets also have an improved visualization capability.
Contact:
kelli@nucleus.szbk.u-szeged.hu
Code (in Matlab)
Download source Matlab code (.zip)
Datasets
A short description about the employed datasets is presented in [datasets.xsl] excel file.
Results
SVM results
ANN results
1NN results
RF results
Most performance measures (e.g. ROC, Specificity, Recall, true positive rate, etc.) are defined for two-class classification problems, and the scores are calculated for each class in a multi-class dataset. To measure the performance for a dataset we computed the average of these scores and here ` - mean` denotes the average of the performance scores. The training time and the testing time for a whole dataset are both given in seconds.
A short description about the employed performance measure is can be found, for example, in
t-test results
The significance test for all performance measures also downloadable
[t-test.xls] in excel file.
Name of selected features
The name of each feature selected by RFE is also downloadable in [features.zip] file (a list).
Figures
figures for ALL-AML dataset (Golub et al., 1999)
figures for Breast Cancer (BC)
dataset (van`t Veer et al., 2002)
figures for Leukeamia dataset (Yeoh et al., 2002)
figures for Lung Cancer (LC) dataset
(Gordon et al., 2002)
figures
for MLL dataset (Armstrong et
al., 2002)
figures for SRBCT dataset (Khan et al., 2001)
figures
for Tumours (Various Tumour Types, VTT) dataset (Ramaswamy et al.,
2001)
2 features






3 features






5 features






7 features






10 features






15 features






20 features




30 features




50 features




75 features




100 features




all (7129)
features




2 features






3 features






5 features






7 features






10 features






15 features






20 features




30 features




50 features




75 features




100 features




all (24188)
features




2 features






3 features






5 features






7 features






10 features






15 features






20 features




30 features




50 features




75 features




100 features




all (10342)
features




2 features






3 features






5 features






7 features






10 features






15 features






20 features




30 features




50 features




75 features




100 features




all (12533)
features




2 features






3 features






5 features






7 features






10 features






15 features






20 features




30 features




50 features




75 features




100 features




all (12582)
features




2 features






3 features






5 features






7 features






10 features






15 features






20 features




30 features




50 features




75 features




100 features




all (2308)
features




2 features






3 features






5 features






7 features






10 features






15 features






20 features




30 features




50 features




75 features




100 features




all (16063)
features



