Technical Program

Technical Program

Oral and Poster Sessions


Tuesday, September 13

Keynote Session 1 (KN1)

Tuesday, September 13, 9:30 - 10:30

Large-scale finite element simulations of the physics of voice
Oriol Guasch

Oral Session 1

Tuesday, September 13

Automatic, model-based detection of pause-less phrase boundaries from fundamental frequency and duration features
Mahsa Sadat Elyasi Langarani, Jan van Santen
Synthesising Filled Pauses: Representation and Datamixing
Rasmus Dall, Marcus Tomalin, Mirjam Wester
Emphasis recreation for TTS using intonation atoms
Pierre-Edouard Honnet, Philip N. Garner
Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis
Eva Vanmassenhove, João P. Cabral, Fasih Haider

Poster Session 1

Tuesday, September 13

Non-filter waveform generation from cepstrum using spectral phase reconstruction
Yasuhiro Hamada, Nobutaka Ono, Shigeki Sagayama
Investigating Spectral Amplitude Modulation Phase Hierarchy Features in Speech Synthesis
Alexandros Lazaridis, Milos Cernak, Pierre-Edouard Honnet, Philip N. Garner
Multidimensional scaling of systems in the Voice Conversion Challenge 2016
Mirjam Wester, Zhizheng Wu, Junichi Yamagishi
An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion
and Speaker Similarity
Dong-Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian,
Shaofei Zhang, Chuang Ding, Mei Li, Quy Hy Nguyen, Minghui Dong, Haizhou LI
Nonaudible murmur enhancement based on statistical voice conversion
and noise suppression with external noise monitoring
Yusuke Tajiri, Tomoki Toda
Prosodic and Spectral iVectors for Expressive Speech Synthesis
Igor Jauk, Antonio Bonafonte
Development of a statistical parametric synthesis system for operatic singing in German
Michael Pucher, Fernando Villavicencio, Junichi Yamagishi
DNN-based Speech Synthesis for Indian Languages from ASCII text
Srikanth Ronanki, Siva Reddy, Bajibabu Bollepalli, Simon King
Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text
Sunayana Sitaram, Sai Krishna Rallabandi, Shruti Rijhwani, Alan W. Black
Jerk Minimization for Acoustic-To-Articulatory Inversion
Avni Rajpal, Hemant A. Patil
How to select a good voice for TTS
Sunhee Kim
WikiSpeech – enabling open source text-to-speech for Wikipedia
John Andersson, Sebastian Berlin, André Costa, Harald Berthelsen, Hanna Lindgren,
Nikolaj Lindberg, Jonas Beskow, Jens Edlund, Joakim Gustafson

Wednesday, September 14

Keynote Session 2 (KN2)

Wednesday, September 14, 9:30 - 10:30

Siri’s voice gets deep learning
Alex Acero

Oral Session 2

Wednesday, September 14

Parallel and cascaded deep neural networks for text-to-speech synthesis
Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi
Temporal modeling in neural network based statistical parametric speech synthesis
Keiichi Tokuda, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku
Multi-output RNN-LSTM for multiple speaker speech synthesis with α-interpolation model
Santiago Pascual, Antonio Bonafonte
A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems
Trained on Very Large Speaker-Dependent Corpora
Xin Wang, Shinji Takaki, Junichi Yamagishi

Poster Session 2

Wednesday, September 14

Non-intrusive Quality Assessment of Synthesized Speech using Spectral Features and Support Vector Regression
Meet H. Soni, Hemant A. Patil
Novel Pre-processing using Outlier Removal in Voice Conversion
Sushant V. Rao, Nirmesh J Shah, Hemant A. Patil
Emotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform
Zhaojie Luo, Tetsuya Takiguchi, Yasuo Ariki
Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech
Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, Junichi Yamagishi
Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis
Shinji Takaki, SangJin Kim, Junichi Yamagishi
Mandarin Prosodic Phrase Prediction based on Syntactic Trees
Zhengchen Zhang, Fuxiang Wu, Chenyu Yang, Minghui Dong, Fugen Zhou
Investigating Very Deep Highway Networks for Parametric Speech Synthesis
Xin Wang, Shinji Takaki, Junichi Yamagishi
Contextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis
Sivanand Achanta, Rambabu Banoth, Ayushi Pandey, Anandaswarup Vadapalli, Suryakanth V Gangashetty
Wide Passband Design for Cosine-Modulated Filter Banks in Sinusoidal Speech Synthesis
Nobuyuki Nishizawa, Tomonori Yazaki
Utterance Selection Techniques for TTS Systems Using Found Speech
Pallavi Baljekar, Alan W. Black
Open-Source Consumer-Grade Indic Text To Speech
Andrew Wilkinson, Alok Parlikar, Sunayana Sitaram, Tim White, Alan W. Black, Suresh Bazaj
On the impact of phoneme alignment in DNN-based speech synthesis
Mei Li, Zhizheng Wu, Lei Xie
Merlin: An Open Source Neural Network Speech Synthesis System
Zhizheng Wu, Oliver Watts, Simon King

Thursday, September 15

Keynote Session 3 (KN3)

Thursday, September 15, 9:30 - 10:30

End-to-end Learning for Text and Speech
Quoc V. Le

Oral Session 3

Thursday, September 15

A hybrid harmonics-and-bursts modelling approach to speech synthesis
Jonas Beskow, Harald Berthelsen
A Pulse Model in Log-domain for a Uniform Synthesizer
Gilles Degottex, Pierre Lanchantin, Mark Gales
Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis
Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen
Wideband Harmonic Model: Alignment and Noise Modeling for High Quality Speech Synthesis
Slava Shechtman, Alex Sorin
ISCA

International Speech Communication Association.

SynSIG: promoting the study of Speech Synthesis