Datasets of the allergen and non-allergen amino acid sequences used in the manuscript “Computational Detection of Allergenic Proteins Reaches a New Level of Accuracy with In Silico Variable-length Peptide Extraction and Machine Learning” by Soeria-Atmadja et al. (2006) Nucl Acids Res 34:3779-3793.
The zip-compressed file named “Datasets.zip” contains a README file, as well as four separate catalogues with different sequence files, named "Training Datasets For SwissProt Estimation", "Datasets For Performance Evaluation (Holdout)", "Datasets For Parameter selection (CV)" and "Datasets For Family Evaluation", respectively. More information on sequence files included in these catalogues are found in the README file.