Wang Rui - Research CV

Birth: 30th October 1993 Email: wangrui1win@g.ecc.u-tokyo.ac.jp Languages: Chinese (Native), Japanese, English Scholar: Google Scholar Profile

Summary

I have several years of research experience in speech signal processing, focusing on spatial hearing and speech enhancement in challenging environments. At JAIST, I worked on monaural 3D sound localization using HRTF features under Prof. Masashi Unoki. For my Ph.D. at Nagoya University with Prof. Tomoki Toda, my main topic was directional target speaker extraction (TSE) in noisy and underdetermined conditions, resulting in publications such as TASLP. My future goal is to extend statistical signal processing (such as independent/low-rank and spatial covariance modeling) by coupling it with DNN priors and latest LLM-based context, aiming for identifiable, sample-efficient, and real-time/low-latency streaming speech enhancement.

Research

Research Areas: Spatial audio, Speech signal processing, Speech enhancement/separation, Target speaker extraction, Text-to-speech, LLM, Deep learning

Work Experience

The University of Tokyo, Graduate School of Information Science and Technology

2026.4- | Tokyo, Japan

Specially Appointed Researcher: Research on speech enhancement-related topic

Midea Group, AI Research Institute

2025.5-2026.4 | Shanghai, China

Research Engineer: Research on robust multi-task speech interaction system in challenge environments; Research on speech llm and device agent

Nippon Telegraph and Telephone Corporation (NTT), CS lab

2022.3-2022.4 | Tokyo, Japan

Winter internship: Research on robust speech separation

National Institute of Information and Communications Technology (NICT), ASTREC

2021.8-2021.10 | Kyoto, Japan

Summer internship: Research on robust speech recognition

Education & Research

Nagoya University

2021.4-2025.3 | Nagoya, Japan

Doctor's degree: Computer Science, focus on target speaker extraction in challenge environments

Toda Laboratory of speech

Japan Advanced Institute of Science and Technology (JAIST)

2018.10-2021.3 | Ishikawa, Japan

Master's degree: Computer Science, focus on HRTF-based DOA estimation and spatial hearing

Akagi & Unoki Laboratory of speech

National Institute of Metrology, China

2016.9-2018.8 | Beijing, China

Master's course: Fluid Mechanics (Dropout due to lack of interest)

China Jiliang University

2012.9-2016.6 | Hangzhou, China

BS degree: Measurement and Control Technology and Instruments

Publications & Awards

Journal Papers

[2025] R. Wang, T. Fujimura, and T. Toda, "Target Speaker Extraction under Noisy Underdetermined Conditions Using Conditional Variational Autoencoder, Global Style Token, and Neural Postfilter," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e2, pp. 1-26, Jan. 2025.
[2024] R. Wang, L. Li, T. Toda, "Dual-channel target speaker extraction based on conditional variational autoencoder and directional information," IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 32, pp. 1968-1979, Mar. 2024.
[2023] R. Wang, B. N. Khanh, D. Morikawa, and M. Unoki, "Method of estimating three dimensional direction-of-arrival based on monaural modulation spectrum," Applied Acoustics, 203, 109215, 9 pages, Feb. 2023.

Conference Papers

[2021] N. Li, L. Wang, M. Unoki, S. Li, R. Wang, Me. Ge, J. Dang, "Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network," in 2021 IEEE ICASSP, pp. 6828-6832, Jun. 2021.
[2021] R. Wang, B. N. Khanh, D. Morikawa, and M. Unoki, "Method of Estimating 3D DOA based on Monaural Modulation Spectrum," In: 2021 RISP NCSP, pp. 137-140, Mar. 2021.
[2022] R. Wang, L. Li, and T. Tomoki, "Direction-aware target speaker extraction with a dual-channel system based on conditional variational autoencoders under underdetermined conditions," in Proc. IEEE Asia-Pacific Signal Inf. Process. Assoc. Annu. Summit Conf., 2022, pp. 347-354.
[2023] R. Wang, T. Toda, "Directional target speaker extraction under noisy underdetermined conditions through conditional variational autoencoder with global style tokens," Proc. IEEE WASPAA, New Paltz, USA, Oct. 2023, pp. 1-5.
[2026] R. Wang, Z. Zhang, Y. Gao, X. Mou, and Y. Xu, "End-to-End Direction-Aware Keyword Spotting with Spatial Priors in Noisy Environments," arXiv preprint arXiv:2603.09505, 2026.

Other Papers

[2022] R. Wang, Li Li, and T. Toda, "Direction-aware target speaker extraction with conditional variational autoencoders and its sensitivity to direction-of-arrival error," 日本音学会春季研究表会演文集, 2-2-6, pp. 195-196, Sep. 2022.
[2022] R. Wang, L. Li, T. Toda, "Target speaker extraction based on conditional variational autoencoder and directional information in underdetermined condition", Technical Report of IEICE, Vol. 121, No. 383, EA2021-76, pp. 76-81, Mar. 2022.
[2021] R. Wang, B. N. Khanh, D. Morikawa, and M. Unoki, "Method of estimating DOA based on monaural modulation spectrum," 日本音学会春季研究表会演文集, 3-1-21, pp. 321-324, Mar. 2021.

Awards

[2023] IEEE WASPAA 2023 Travel Grants.
[2022] Acoustical Society of Japan (ASJ)-Student paper award.
[2021] Acoustical Society of Japan (ASJ)-Student paper award (Hokuriku branch).
[2021] RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP)-Student paper award.