Skip to content

GFF3 coordinate offsets don't make sense #112

@jblachly

Description

@jblachly

When looking at this unit test:

unittest{
import dhtslib.coordinates;
auto rec = GTFRecord("chr1\tHAVANA\tgene\t11869\t14409\t.\t+\t.\tID \"ENSG00000223972.5\" ; gene_id ENSG00000223972.5 ; gene_id ENSG00000223972.5 ; gene_type transcribed_unprocessed_pseudogene ; gene_name DDX11L1 ; level 2 ; havana_gene OTTHUMG00000000961.2"); // @suppress(dscanner.style.long_line)
auto rec_neg= GTFRecord("chr1\tHAVANA\tgene\t11869\t14409\t.\t-\t.\tID \"ENSG00000223972.5\" ; gene_id ENSG00000223972.5 ; gene_id ENSG00000223972.5 ; gene_type transcribed_unprocessed_pseudogene ; gene_name DDX11L1 ; level 2 ; havana_gene OTTHUMG00000000961.2"); // @suppress(dscanner.style.long_line)
assert(rec.seqid=="chr1");
assert(rec.source=="HAVANA");
assert(rec.type=="gene");
assert(rec.start==11_869);
assert(rec.end==14_409);
assert(rec.score==-1.0);
assert(rec.strand()=='+');
assert(rec.phase==-1);
assert(rec["ID"] == "ENSG00000223972.5");
assert(rec["gene_id"] == "ENSG00000223972.5");
assert(rec["gene_type"] == "transcribed_unprocessed_pseudogene");
assert(rec["gene_name"] == "DDX11L1");
assert(rec["level"] == "2");
assert(rec["havana_gene"] == "OTTHUMG00000000961.2");
assert(rec.length == 2541);
assert(rec.relativeStart == 1);
assert(rec.relativeEnd == 2541);
// Test forward and backward offsets
assert(rec.coordinateAtOffset(2) == 11_870);
assert(rec_neg.coordinateAtOffset(2) == 14_408);

I was confused that an offset of 2 would translate from 11869 -> 11870

Looking at the code its clear that this is a mistake:

    /// Genomic coordinate at offset into feature, taking strandedness into account
    @property coordinateAtOffset(long offset) const
    {
        // GFF3 features are 1-based coordinates
        assert(offset > 0);
        offset--;   // necessary in a 1-based closed coordinate system
        
        // for - strand count from end; for +, ., and ? strand count from start
        immutable auto begin = (this.strand == '-' ? this.end : this.start);


        // for - strand count backward; for +, ., and ? strand count forward
        immutable auto direction = (this.strand == '-' ? -1 : 1);


        return OB(begin + (direction * offset));
    }

Agree that GFF3 is 1-based, and agree that GFF3 features are 1-based coordinates, but since you are dealing with OFFSETS (deltas), zero is still a reasonable input value.

Function should be revised to take >= 0 values, and should not offset--. Tests will then have to be updated.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions