Browse by author
Lookup NU author(s): Emma Woolgar, Dr Yuki KikuchiORCiD
Full text for this publication is not currently held within this repository. Alternative links are provided below where available.
Synthetic voice generation for socially assistive robotics requires biologically validated approaches to ensure effective human-robot interaction. This paper presents a Variational Autoencoder (VAE) based system for generating species-specific vocalizations with behavioral validation using marmoset. Our approach processes linear spectrograms through a symmetric encoder-decoder architecture with Kullback-Leibler divergence regularization and adaptive KL annealing. The system was trained on 18 marmoset ‘twitter’ calls and validated through controlled behavioral experiments with three adult female marmosets. Generated vocalizations achieved 86.79% Mel-Frequency Cepstrum Coefficients (MFCC) similarity to natural calls and had a significant main effect on two marmoset behavior (stationary behavior: χ2 = 11.47, p = 0.04; leg-stand contact behavior: χ2 = 12.12, p = 0.03), although behavioral responses were different to those seen in the equivalent natural call type. Results demonstrate the feasibility of VAE-based vocalization synthesis while highlighting the importance of biological validation for developing emotionally appropriate synthetic voices in assistive robotics applications.
Author(s): Du Y, Woolgar E, Kikuchi Y, Ogawa T
Publication type: Conference Proceedings (inc. Abstract)
Publication status: Published
Conference Name: IEEE Cyber Science and Technology Congress (CyberSciTech 2025)
Year of Conference: 2025
Pages: 806-810
Online publication date: 14/01/2026
Acceptance date: 21/10/2025
Publisher: IEEE
URL: https://doi.org/10.1109/CyberSciTech68397.2025.00123
DOI: 10.1109/CyberSciTech68397.2025.00123
Library holdings: Search Newcastle University Library for this item
ISBN: 9798331590963