This repository provides a trained aesthetic evaluation toolkit based on SongEval, the first large-scale, open-source dataset for human-perceived song aesthetics. The toolkit enables automatic scoring of generated song across five perceptual aesthetic dimensions aligned with professional musician judgments.
- 🧠 Pretrained neural models for perceptual aesthetic evaluation
- 🎼 Predicts five aesthetic dimensions:
- Overall Coherence
- Memorability
- Naturalness of Vocal Breathing and Phrasing
- Clarity of Song Structure
- Overall Musicality
- 🎧 Accepts full-length songs (vocals + accompaniment) as input
- ⚙️ Simple inference interface
- 📦 Installable Python package for easy integration
Install the package in development mode:
git clone https://github.com/ASLP-lab/SongEval.git
cd SongEval
pip install -e .Clone the repository and install dependencies:
git clone https://github.com/ASLP-lab/SongEval.git
cd SongEval
pip install -r requirements.txtimport songeval
# Evaluate a single song (model loaded automatically on first use)
scores = songeval.evaluate_song("path/to/your/song.wav")
print(scores)
# Output: {'Coherence': 3.2456, 'Musicality': 3.1234, 'Memorability': 2.9876, 'Clarity': 3.4567, 'Naturalness': 3.1111}
# Evaluate multiple songs efficiently (model loaded once and reused)
evaluator = songeval.get_evaluator()
results = evaluator.evaluate_songs(["song1.wav", "song2.mp3", "song3.wav"])
print(results)
# Force CPU mode if needed
scores = songeval.evaluate_song("path/to/song.wav", use_cpu=True)- Evaluate a single audio file:
songeval -i /path/to/audio.mp3 -o /path/to/output- Evaluate a list of audio files:
songeval -i /path/to/audio_list.txt -o /path/to/output- Evaluate all audio files in a directory:
songeval -i /path/to/audio_directory -o /path/to/output- Force evaluation on CPU (
⚠️ CPU evaluation may be significantly slower):
songeval -i /path/to/audio.wav -o /path/to/output --use_cpupython eval.py -i /path/to/audio.mp3 -o /path/to/outputEvaluate a single song file efficiently.
Parameters:
audio_path(str): Path to the audio file (.wav or .mp3)use_cpu(bool): Force CPU mode even if GPU is available
Returns:
dict: Dictionary with scores for each dimension:- 'Coherence': Overall Coherence score
- 'Musicality': Overall Musicality score
- 'Memorability': Memorability score
- 'Clarity': Clarity of Song Structure score
- 'Naturalness': Naturalness of Vocal Breathing and Phrasing score
Get a global evaluator instance. The model is loaded only once and reused.
Parameters:
use_cpu(bool): Force CPU mode even if GPU is available
Returns:
SongEvaluator: The evaluator instance
Evaluate multiple song files efficiently.
Parameters:
audio_paths(list): List of paths to audio files
Returns:
dict: Dictionary mapping file IDs to their scores
This project is mainly organized by the audio, speech and language processing lab (ASLP@NPU).
We sincerely thank the Shanghai Conservatory of Music for their expert guidance on music theory, aesthetics, and annotation design. Meanwhile, we thank AISHELL to help with the orgnization of the song annotations.
This project is released under the CC BY-NC-SA 4.0 license.
You are free to use, modify, and build upon it for non-commercial purposes, with attribution.
If you use this toolkit or the SongEval dataset, please cite the following:
@article{yao2025songeval,
title = {SongEval: A Benchmark Dataset for Song Aesthetics Evaluation},
author = {Yao, Jixun and Ma, Guobin and Xue, Huixin and Chen, Huakang and Hao, Chunbo and Jiang, Yuepeng and Liu, Haohe and Yuan, Ruibin and Xu, Jin and Xue, Wei and others},
journal = {arXiv preprint arXiv:2505.10793},
year={2025}
}