An open source implementation of Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning


This page provides audio samples for the open source implementation of Deep Voice 3. Samples from single speaker and multi-speaker models follow.

Single speaker

Scientists at the CERN laboratory say they have discovered a new particle.

(74 chars, 13 words)

There’s a way to measure the acute emotional intelligence that has never gone out of style.

(91 chars, 18 words)

President Trump met with other leaders at the Group of 20 conference.

(69 chars, 13 words)

The Senate’s bill to repeal and replace the Affordable Care Act is now imperiled.

(81 chars, 16 words)

Generative adversarial network or variational auto-encoder.

(59 chars, 7 words)

The buses aren’t the problem, they actually provide a solution.

(63 chars, 13 words)

Multi-speaker

Same text with 12 different speakers

Some have accepted this as a miracle without any physical explanation

(69 chars, 11 words)

225, 23, F, English, Southern, England (ID, AGE, GENDER, ACCENTS, REGION)

226, 22, M, English, Surrey

227, 38, M, English, Cumbria

228, 22, F, English, Southern England

229, 23, F, English, Southern England

230, 22, F, English, Stockton-on-tees

231, 23, F, English, Southern England

232, 23, M, English, Southern England

233, 23, F, English, Staffordshire

234, 22, F, Scottish, West Dumfries

236, 23, F, English, Manchester

237, 22, M, Scottish, Fife

Five unknown texts with two (male/female) speakers with attention plot

Scientists at the CERN laboratory say they have discovered a new particle.

(74 chars, 13 words)

There’s a way to measure the acute emotional intelligence that has never gone out of style.

(91 chars, 18 words)

President Trump met with other leaders at the Group of 20 conference.

(69 chars, 13 words)

The Senate’s bill to repeal and replace the Affordable Care Act is now imperiled.

(81 chars, 16 words)

Generative adversarial network or variational auto-encoder.

(59 chars, 7 words)

The buses aren’t the problem, they actually provide a solution.

(63 chars, 13 words)

Single speaker (arXiv:1710.08969 [cs.SD])

This is not the result of DeepVoice3, but there’s a very similar approach I also implemented.

Scientists at the CERN laboratory say they have discovered a new particle.

(74 chars, 13 words)

There’s a way to measure the acute emotional intelligence that has never gone out of style.

(91 chars, 18 words)

President Trump met with other leaders at the Group of 20 conference.

(69 chars, 13 words)

The Senate’s bill to repeal and replace the Affordable Care Act is now imperiled.

(81 chars, 16 words)

Generative adversarial network or variational auto-encoder.

(59 chars, 7 words)

The buses aren’t the problem, they actually provide a solution.

(63 chars, 13 words)

References


  1. I’m afraid I don’t remember correctly, I may have trained a bit more. [return]