Predicting protein functional association in microbial and viral genomes by analysis of conservation of genomic context.

Functional associations of proteins can be predicted by conservation of the genomic neighbourhood surrounding the gene encoding the protein of interest. Our tool FlaGs clusters neighbourhood-encoded proteins into homologous groups and outputs the identity of the groups, a graphical visualization of the gene neighbourhood and its conservation, and optionally, a phylogenetic tree annotated with flanking gene conservation.

Source code and documentation are available at the GitHub Page.

Output examples


If you find webFlaGs useful, please cite:
Chayan Kumar Saha, Rodrigo Sanches Pires, Harald Brolin, Maxence Delannoy, Gemma Catherine Atkinson, FlaGs and webFlaGs: discovering novel biology through the analysis of gene neighbourhood conservation, Bioinformatics, Volume 37, Issue 9, 1 May 2021, Pages 1312–1314; doi:10.1093/bioinformatics/btaa788

Authors & Contact

FlaGs was first made by Chayan Kumar Saha and Gemma C. Atkinson. It is now developed by the Atkinson FlaGs team: Jose Nakamoto, Artyom Egorov and Veda Bojar at the Department of Experimental Medical Science, Lund University, Sweden.

We are open for suggestions of how we can extend and improve webFlaGs functionality. Please don't hesitate to share your ideas or feature requests.

Please contact us by e-mail or use GitHub Issues to report any technical problems related to FlaGs.

Submission form

You can try an example with a set of proteins.

For creating an input file for FlaGs from the results of an online BlastP or PSI-Blast search at the NCBI, you can use this guideline.

Mandatory arguments

Select only one*of the allowed input formats.

RefSeq protein accession number.
In this case, BlastP will be run to find homologues in the RefSeq database. BlastP parameters can be changed below. (Pattern: ^[ANYXW]P_[0-9]+.[0-9]$)
Amino acid sequence in fasta format or just as a sequence (without >id line; can be multiple lines). BlastP will be run to find homologues in the RefSeq database.
File with list of accession numbers.
File format: .txt or .tsv, one accession number, or space separated assembly id and accession number per line, no header.
Linebreak separated list of proteins accession numbers.
If you want to specify an assembly for each protein use space separated format: assembly_id protein_id in each line. Max number of values is 200.
Optional arguments
Blastp database to perform blastp searching.
Our reduced BLAST database contains around 52 million protein sequences which were found in 13548 bacterial, 467 archaeal and 10449 viral genomes.
Max number of Blastp hits in homologue searching. Allowed values: [2:200]
E-value cutoff for Blastp searching to your query protein. This parameter makes sense only in case of short query protein.
Allowed values: [1-5]
Allowed values: [1-15]
For options with phylogenetic tree input accessions should be homologous.
We're sorry, this option currently is not available.
Run parameters
Please, enter your e-mail address to receive your results link.
Will be used in output folder name.
Your key is accepted 😜
Perform calculations in our high-priority queue 🚀