Datasets of the allergen and non-allergen amino acid sequences used in the manuscript “Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins” by Björklund et al. (2005) Bioinformatics 21:39-50.
The zip-compressed file named “Sequence Data Files.zip” contains three separate sequence data files:
- The file named “Allergens.txt”, which consists of allergen amino acid sequences.
The two files named “NonTrain.txt” and ”NonTest.txt”, which consist of non-allergen sequences for training and validation respectively.
- The sequence names are given with SwissProt, Trembl or Entrez entries and the entries are downloaded from either the SWALL or Entrez database.
A link to the Sequence Data Files.zip (WinZip) is found on the right of this page.
A substantially improved method including new datasets are available here: Computational Detection of Allergenic ...