Modelling prosody in the task of human speech synthesis with the use of machine learning
dc.contributor.advisor | Крюкова, Галина | |
dc.contributor.author | Процик, Олексій | |
dc.date.accessioned | 2020-12-05T21:08:26Z | |
dc.date.available | 2020-12-05T21:08:26Z | |
dc.date.issued | 2020 | |
dc.description.abstract | Generating high fidelity speech using a text-to-speech (TTS) system remains a challenging task despite the decades of research and investigations. Modern TTS systems are very complex. For example, it is a common practice for a statistical TTS system to have a linguistic extractor in the front, which extracts different linguistic features. It is followed by a duration model to estimate the speech length in time of a given text and an acoustic feature prediction model. Given these features, it is all fed into a vocoder, which synthesizes speech out of acoustic features. All these components are trained independently and require extensive field knowledge to be sophisticated enough and produce considerable results. Because it has a modular design, it is prone to errors which will proceed in the following modules and can accumulate. | uk_UA |
dc.identifier.uri | https://ekmair.ukma.edu.ua/handle/123456789/18999 | |
dc.language.iso | en | uk_UA |
dc.status | first published | uk_UA |
dc.subject | modelling prosody | uk_UA |
dc.subject | the task of human speech | uk_UA |
dc.subject | synthesis | uk_UA |
dc.subject | the use | uk_UA |
dc.subject | machine learning | uk_UA |
dc.subject | бакалаврська робота | uk_UA |
dc.title | Modelling prosody in the task of human speech synthesis with the use of machine learning | uk_UA |
dc.type | Other | uk_UA |