Detect whether an audio file is actually a song or just a podcast, interview, lecture, or random YouTube rip. Returns a confidence score so you can gate your pipeline.
Built as a companion to anysong — since anysong can pull from YouTube, not everything that comes back is music.
notasong check ~/music/lollipop_by_lil_wayne.mp3
# => song confidence=0.94
notasong check ~/downloads/joe_rogan_clip.mp3
# => not_song confidence=0.12notasong analyzes audio files using ffmpeg/ffprobe and scores them across multiple signals that distinguish music from speech and non-music content.
| Signal | What it measures | Music vs Speech |
|---|---|---|
| Spectral spread | How energy is distributed across frequencies | Music uses the full spectrum; speech clusters in 300Hz-3kHz |
| Beat regularity | Presence of a consistent rhythmic pattern | Songs have steady BPM; podcasts don't |
| Harmonic ratio | Tonal vs noisy content | Instruments are harmonic; speech is mixed |
| Dynamic range | Loudness variation over time | Produced music is compressed; speech has wide swings |
| Silence ratio | Percentage of near-silent segments | Podcasts have conversational pauses; songs don't |
| Duration | Track length | Short songs are still songs; podcasts/lectures run long |
| Zero-crossing rate | How often the waveform crosses zero | Speech has higher ZCR variability than music |
Each signal produces a sub-score between 0.0 and 1.0. The final confidence is a weighted average. A file scoring >= 0.80 is classified as a song.
notasong/
cmd/ # CLI (cobra)
root.go
check.go # `notasong check <file>` command
batch.go # `notasong batch <dir>` command
analyzer/
analyzer.go # Orchestrates all detectors, produces final score
spectral.go # Spectral spread + harmonic ratio via ffprobe
rhythm.go # Beat detection via onset analysis
dynamics.go # Dynamic range + silence ratio
zcr.go # Zero-crossing rate analysis
duration.go # Duration heuristic
types.go # Report struct, signal weights, thresholds
ffutil/
probe.go # Wrapper around ffprobe JSON output
extract.go # Raw PCM / stats extraction via ffmpeg
main.go
go.mod
git clone https://github.com/damoahdominic/notasong.git
cd notasong
go build -o notasong .
sudo mv notasong /usr/local/bin/- ffmpeg + ffprobe —
apt install ffmpegorbrew install ffmpeg
No Python. No ML models. Just ffmpeg and a single Go binary.
notasong check song.mp3
# song confidence=0.91 file=song.mp3
notasong check --json song.mp3
# {"file":"song.mp3","classification":"song","confidence":0.91,"signals":{...}}notasong batch ~/music/
# song 0.94 lollipop_by_lil_wayne.mp3
# song 0.89 wild_thoughts_by_rihanna.mp3
# not_song 0.23 random_yt_clip.mp3
notasong batch ~/music/ --threshold 0.80 --fail
# exits non-zero if any file scores below thresholdimport "github.com/damoahdominic/notasong/analyzer"
report, err := analyzer.Check("track.mp3")
if err != nil { ... }
if report.IsSong(0.80) {
fmt.Println("it's a song")
}The end goal — anysong calls notasong after downloading to verify:
anysong download "Lil Wayne Lollipop"
# downloads → ~/music/lollipop_by_lil_wayne.mp3
# notasong check → 0.94 → keep
anysong download "some weird query"
# downloads → ~/music/some_weird_query.mp3
# notasong check → 0.35 → warn/skip/deleteTested against a library of real audio files spanning hip-hop, pop, jazz, electronic, and ambient genres — both full-length tracks and 30-second clips.
| Artist | Track | Duration | Score | Result |
|---|---|---|---|---|
| MF DOOM | Absolutely | 2:43 | 0.88 | song |
| Sarkodie | CEO Flow | 4:13 | 0.88 | song |
| Meduza | Don't Wanna Go Home | 2:38 | 0.88 | song |
| Migos | Hannah Montana | 3:33 | 0.88 | song |
| Lil Wayne | Love Me | 4:13 | 0.88 | song |
| Static Garden | Nightfall | 4:10 | 0.88 | song |
| George Benson | Breezin' | 5:42 | 0.81 | song |
| Midnight Mind | Sibilance | 3:32 | 0.80 | song |
8/8 full-length songs correctly classified (100%)
A 30-second clip of music is still music. Duration doesn't decide what's a song — the audio content does. Only recordings of people talking (podcasts, interviews, lectures) should be classified as not_song.
| Artist | Track | Duration | Result |
|---|---|---|---|
| Michael Jackson | Black or White | 0:30 | song |
| Drake | Hotline Bling | 0:30 | song |
| 50 Cent | Just a Lil Bit | 0:30 | song |
| 2Pac | Life Goes On | 0:30 | song |
| Lil Wayne | Lollipop | 0:30 | song |
| Lil Wayne | Mrs. Officer | 0:30 | song |
| Robyn | Sucker for Love | 0:30 | song |
| Stat Quo | Billion Bucks | 0:30 | song |
| Jay-Z | Renegade | 0:30 | song |
| George Benson | People Get Ready | 0:30 | song |
| Michael Jackson | Heal the World | 0:30 | song |
- Short songs are songs. A 30-second clip of Lollipop is still Lollipop.
- Jazz (George Benson) scores slightly lower due to wider dynamic range and less compressed mastering — but still passes.
- The 0.80 default threshold separates music from speech/non-music content across all tested genres and durations.
spectral_spread 0.20
beat_regularity 0.25
harmonic_ratio 0.20
dynamic_range 0.10
silence_ratio 0.10
duration 0.10
zcr 0.05
Beat regularity gets the highest weight — it's the strongest single indicator that something is a produced song vs spoken word.
MIT