uORF4u is a bioinformatics tool for conserved uORF annotation in 5′ upstream sequences of a user-defined protein of interest or a set of protein homologues. It can also be used to find small ORFs within a set of nucleotide sequences.

The output includes publication-quality figures with multiple sequence alignments, sequence logos and locus annotation of the predicted uORFs in graphical vector format.

Source code available at the GitHub Page.

We also recommend visiting the tool's detailed documentation website that provides an example-driven guide and documentation to the command-line version and python API.


If you find uorf4u useful, please, cite:
Artyom A. Egorov, Gemma C. Atkinson, uORF4u: a tool for annotation of conserved upstream open reading frames bioRxiv 2022.10.27.514069; doi: 10.1101/2022.10.27.514069

Authors & Contact

uORF4u is developed by Artyom Egorov at the Atkinson Lab, Department of Experimental Medical Science, Lund University, Sweden.
We are open for suggestions of how we can extend and improve uorf4u functionality. Please don't hesitate to share your ideas or feature requests.

Please contact us by e-mail or use GitHub Issues to report any technical problems related to uORF4u. You can also use Discussions section for sharing your ideas or feature requests!

Submission form

You can try an example with list of 6 ErmC homologues.

Mandatory arguments
Select only one*of the allowed input formats.
RefSeq protein accession number. (Pattern: ^[ANYXW]P_[0-9]+.[0-9]$).
File with list of accession numbers.
File format: .txt or .tsv, one accession number per line, no header.
Fasta file with upstream sequences.
File format: .fa or .fasta; no more than 200 sequences with max sequence length: 1000 nt.
Space (or linebreak) separated list of proteins accession numbers. Max number of values is 200.
Optional arguments
Max number of blastp hits in homologue searching. Allowed values: [10:200].
Identity cutoff for Blastp searching to your query protein. Allowed values: [0:1].
Max number of assemblies to analyse where the protein sequence is identical. Allowed values: [0-3].
Assemblies list file. Filtered table with assemblies generated by uorf4u run.
Length of upstream sequences to retrieve. Allowed values: [50:1000].
Length of downstream sequences (gene's CDS) to retrieve. Allowed values: [0:300].
Retrieve sequence annotation (to show when a predicted ORF overlaps with a previously annotated ORF).
Include alternative start codons in uORF annotation. List based on selected NCBI genetic code. (Standard: ATG, TTG, CTG).
Annotation is based on the calculation of the SD-antiSD interaction Gibbs free energy. (Will be auto-deactivated if 'eukaryotes mode' is selected).
Alignment type used by in conservation analysis step.
Fraction of sequences that should contain the ORF for it to be called conserved. Allowed values: [0-1].
Configuration file with other parameters. You can choose a premade config file or upload yours based on a template.
Upload file if 'Uploaded' option is selected for Config file type. Use one of the premade config files as a template: Bacteria, Eukaryotes
Run parameters
Please, enter your e-mail address to receive your results link.
Will be used in output folder name.
Your key is accepted 😜
Perform calculations in our high-priority queue 🚀