Toggle Main Menu Toggle Search

Open Access padlockePrints

Variational Autoencoder-Based Synthesis of Marmoset Vocalizations Using Linear Spectrograms

Lookup NU author(s): Emma Woolgar, Dr Yuki KikuchiORCiD

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

Synthetic voice generation for socially assistive robotics requires biologically validated approaches to ensure effective human-robot interaction. This paper presents a Variational Autoencoder (VAE) based system for generating species-specific vocalizations with behavioral validation using marmoset. Our approach processes linear spectrograms through a symmetric encoder-decoder architecture with Kullback-Leibler divergence regularization and adaptive KL annealing. The system was trained on 18 marmoset ‘twitter’ calls and validated through controlled behavioral experiments with three adult female marmosets. Generated vocalizations achieved 86.79% Mel-Frequency Cepstrum Coefficients (MFCC) similarity to natural calls and had a significant main effect on two marmoset behavior (stationary behavior: χ2 = 11.47, p = 0.04; leg-stand contact behavior: χ2 = 12.12, p = 0.03), although behavioral responses were different to those seen in the equivalent natural call type. Results demonstrate the feasibility of VAE-based vocalization synthesis while highlighting the importance of biological validation for developing emotionally appropriate synthetic voices in assistive robotics applications.


Publication metadata

Author(s): Du Y, Woolgar E, Kikuchi Y, Ogawa T

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: IEEE Cyber Science and Technology Congress (CyberSciTech 2025)

Year of Conference: 2025

Pages: 806-810

Online publication date: 14/01/2026

Acceptance date: 21/10/2025

Publisher: IEEE

URL: https://doi.org/10.1109/CyberSciTech68397.2025.00123

DOI: 10.1109/CyberSciTech68397.2025.00123

Library holdings: Search Newcastle University Library for this item

ISBN: 9798331590963


Share