When looking at this unit test:
|
unittest{ |
|
import dhtslib.coordinates; |
|
auto rec = GTFRecord("chr1\tHAVANA\tgene\t11869\t14409\t.\t+\t.\tID \"ENSG00000223972.5\" ; gene_id ENSG00000223972.5 ; gene_id ENSG00000223972.5 ; gene_type transcribed_unprocessed_pseudogene ; gene_name DDX11L1 ; level 2 ; havana_gene OTTHUMG00000000961.2"); // @suppress(dscanner.style.long_line) |
|
auto rec_neg= GTFRecord("chr1\tHAVANA\tgene\t11869\t14409\t.\t-\t.\tID \"ENSG00000223972.5\" ; gene_id ENSG00000223972.5 ; gene_id ENSG00000223972.5 ; gene_type transcribed_unprocessed_pseudogene ; gene_name DDX11L1 ; level 2 ; havana_gene OTTHUMG00000000961.2"); // @suppress(dscanner.style.long_line) |
|
|
|
assert(rec.seqid=="chr1"); |
|
assert(rec.source=="HAVANA"); |
|
assert(rec.type=="gene"); |
|
assert(rec.start==11_869); |
|
assert(rec.end==14_409); |
|
assert(rec.score==-1.0); |
|
assert(rec.strand()=='+'); |
|
assert(rec.phase==-1); |
|
assert(rec["ID"] == "ENSG00000223972.5"); |
|
assert(rec["gene_id"] == "ENSG00000223972.5"); |
|
assert(rec["gene_type"] == "transcribed_unprocessed_pseudogene"); |
|
assert(rec["gene_name"] == "DDX11L1"); |
|
assert(rec["level"] == "2"); |
|
assert(rec["havana_gene"] == "OTTHUMG00000000961.2"); |
|
|
|
assert(rec.length == 2541); |
|
assert(rec.relativeStart == 1); |
|
assert(rec.relativeEnd == 2541); |
|
|
|
// Test forward and backward offsets |
|
assert(rec.coordinateAtOffset(2) == 11_870); |
|
assert(rec_neg.coordinateAtOffset(2) == 14_408); |
I was confused that an offset of 2 would translate from 11869 -> 11870
Looking at the code its clear that this is a mistake:
/// Genomic coordinate at offset into feature, taking strandedness into account
@property coordinateAtOffset(long offset) const
{
// GFF3 features are 1-based coordinates
assert(offset > 0);
offset--; // necessary in a 1-based closed coordinate system
// for - strand count from end; for +, ., and ? strand count from start
immutable auto begin = (this.strand == '-' ? this.end : this.start);
// for - strand count backward; for +, ., and ? strand count forward
immutable auto direction = (this.strand == '-' ? -1 : 1);
return OB(begin + (direction * offset));
}
Agree that GFF3 is 1-based, and agree that GFF3 features are 1-based coordinates, but since you are dealing with OFFSETS (deltas), zero is still a reasonable input value.
Function should be revised to take >= 0 values, and should not offset--. Tests will then have to be updated.
When looking at this unit test:
dhtslib/source/dhtslib/gff/package.d
Lines 21 to 47 in ca928a7
I was confused that an offset of 2 would translate from 11869 -> 11870
Looking at the code its clear that this is a mistake:
Agree that GFF3 is 1-based, and agree that GFF3 features are 1-based coordinates, but since you are dealing with OFFSETS (deltas), zero is still a reasonable input value.
Function should be revised to take >= 0 values, and should not
offset--. Tests will then have to be updated.