This is a path I walk through almost every day, where the Audio Information Research Lab is located. I'm currently down to my last few weeks in college at University of Rochester, studying Audio & Music Engineering with a minor in Computer Science. My academic advisor Professor Sarah Smith and research advisor Professor Zhiyao Duan, along with the amazing faculty and student members here helped me greatly along this journey.
My Background with audio Deep Learning Link to heading
I feel extremely lucky to have the opportunity of being drown in audio research at audio information research lab here at University of Rochester. On music projects, I’ve worked with Christodoulos Benetatos, an incredible engineer with a background in classical guitar, to create Euterpe, an interactive web framework. On speech, I’ve worked with Ge Zhu, You (Neil) Zhang and Meiying (Melissa) Chen together on various speech-related projects, exploring how to leverage computer-generated speech artifacts to tell them apart from human-produced speech for more trustworthy AI.
All of this work is possible because of my incredible research advisor Zhiyao Duan, who provides support and insight to whatever project that I’m interested to work on.
My Background with traditional DSP and acoustics Link to heading
The curriculum for the AME major at University of Rochester focuses very heavily towards traditional signal processing from Day 1 (literally, the intro course AME140 talks about fast fourier transform on day 1). I’ve been exposed to a wide range of topics in this domain, primarily thanks to Professor Michael Heilemann, my advisor Sarah Smith, and Tre DiPassio.
My Story Link to heading
From the moment I laid my fingers on that cheap Yamaha electric piano my dad gifted me at five, I was captivated by the magic of music. Although I hated practicing (and I still do today - no one should endure 8-hour practice sessions), I adored playing the instrument and dreamt of constructing one myself someday.
This passion for music initially led me to begin my career as a music producer. However, I soon realized that my true enthusiasm lay not only in producing pop songs but in developing innovative tools and software that revolutionize the way we perceive and interact with sound. My journey led me to discover PianoGenie, a remarkable innovation that simplifies the piano keyboard, enabling even those without formal training to experience the joy of playing.
As I delved deeper into the world of audio development, I became fascinated with digital signal processing, the vast toolkit we have in the frequency domain. Using frequency domain representations, I was able to build a guitar FX tone transfer AI system that helps guitarists to better recreate their favorite guitar tones. With the rapid development of Web in the last few years, I was able to build Euterpe, an interactive web-based framework, to help researchers quickly deploy their algorithms for end users to use. This curiosity also drove me to investigate the often-overlooked phase domain, where I sought clues to distinguish between computer-generated speech and human-produced speech for the development of trustworthy AI.
My grandpa turned 73 this year. He helped me learn math during my childhood, and I’ve always been grateful for his support. As he grew older, he began to lose his hearing. I joined Neosensory as Intern during my last semester in college to explore multi-modal potentials - maybe we can remedy high-frequency hearing loss with other modal stimuli? I built a hybrid deep learning + DSP solution to do on-device speech enhancement, then map high-frequency content to motors for their Clarify project. This drastically helped elderly people with hearing loss to understand speech better without hearing better. I’m trying to preserve his voice using TTS systems at the moment as well.
Throughout my journey, I never stopped working as a music producer. By continuing to work in this capacity, I’ve been able to maintain a strong connection with musicians, truly understanding their needs and experiences. Similar to copiloting tools in software development and design industries, I believe music copiloting is more than possible, and is currently only suffering from a lack of data. I’m hoping to utilize the power of large language models to learn the symbolic underlying relationship in this domain.
My ambition is to create the next generation of audio solutions that will augment the way human beings perceive and interact with sound. The academia is great at developing new algorithms, and I would love to see those solutions to not only live in a NVIDIA A100-powered research paper, but actually live in the modern equivalent of my childhood home, where a five-year-old wrote his first song on a cheap plastic piano.