About me

I’m a third-year PhD student in UKRI Centre for Doctoral Training in Artificial Intelligence and Music programme at C4DM, Queen Mary University of London, supervised by Dr. György Fazekas. My current PhD theme is about leveraging deep generative models and signal processing techniques for controllable and expressive voice synthesis. The projects I made for achieving this goal are:

  • diffwave-sr: unsupervised speech super-resolution (bandwidth extension) using posterior sampling in diffusion models.
  • duet-svs-diffusion: unsupervised duet singing voice separation using posterior sampling in diffusion models.
  • golf: a light-weight neural voice synthesiser with glottal-flow models and differentiable time-varying linear prediction synthesis.
  • torchlpc: efficient implementation of differentiable time-varying linear prediction synthesis in PyTorch.

Before I started my PhD, I was doing software engineering and working with HTC and Rayark Inc.. I do independent research in my spare time. My research interests include audio signal processing, music information retrieval, binaural audio, and machine learning.

I love making contributions on GitHub. Some of my most starred projects are:

I’m also the main contributor to making torchaudio.functional.lfilter differentiable. To know what I’m working on at the moment, you can check out my Kanban on GitHub.

My hobbies are music, manga, anime, and vtubers. My music taste is very broad and can be ranged from Heavy Metal, Djent, Post-Hardcore, to Happy Hardcore, Trance, Future Funk, Ani-Song, Lo-Fi, etc. I also made some music (see my portfolio page) under the name Y.C.Y.

If you want to contact me for any reason, you can reach me via email or Twitter. If you’re interested in what I’m doing and want to support me, you can consider buying me a coffee on GitHub, where you can also book me for a 1-on-1 code review session.