[1]
|
S. Wisdom, A. Jansen, R. J. Weiss, H. Erdogan, and J. R. Hershey.
Sparse, Efficient, and Semantic Mixture Invariant Training:
Taming In-the-Wild Unsupervised Sound Separation.
In Proc. IEEE Workshop on Applications of Signal Processing to
Audio and Acoustics (WASPAA), October 2021.
[ bib |
arxiv ]
|
[2]
|
N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, N. Dehak, and
W. Chan.
WaveGrad 2: Iterative Refinement for Text-to-Speech
Synthesis.
In Proc. Interspeech, August 2021.
[ bib |
arxiv |
web ]
|
[3]
|
P. Wang, T. N. Sainath, and R. J. Weiss.
Multitask Training with Text Data for End-to-End Speech
Recognition.
In Proc. Interspeech, August 2021.
[ bib |
arxiv ]
|
[4]
|
R. J. Weiss, R. J. Skerry-Ryan, E. Battenberg, S. Mariooryad, and D. P.
Kingma.
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech
synthesis.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), June 2021.
[ bib |
DOI |
video |
arxiv |
web |
poster |
slides ]
|
[5]
|
I. Elias, H. Zen, J. Shen, Y. Zhang, Y. Jia, R. J. Weiss, and
Y. Wu.
Parallel Tacotron: Non-Autoregressive and Controllable
TTS.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), June 2021.
[ bib |
DOI |
arxiv |
web ]
|
[6]
|
N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan.
WaveGrad: Estimating Gradients for Waveform
Generation.
In Proc. International Conference on Learning Representations
(ICLR), May 2021.
[ bib |
reviews |
arxiv |
web ]
|
[7]
|
S. Wisdom, E. Tzinis, H. Erdogan, R. J. Weiss, K. Wilson, and J. R.
Hershey.
Unsupervised Sound Separation Using Mixture Invariant
Training.
In Advances in Neural Information Processing Systems (NeurIPS),
December 2020.
[ bib |
reviews |
arxiv |
web ]
|
[8]
|
S. Wisdom, E. Tzinis, H. Erdogan, R. J. Weiss, K. Wilson, and J. R.
Hershey.
Unsupervised Speech Separation Using Mixtures of
Mixtures.
In ICML 2020 Workshop on Self-supervision in Audio and Speech,
July 2020.
[ bib |
reviews |
web ]
|
[9]
|
G. Sun, Y. Zhang, R. J. Weiss, Y. Cao, H. Zen, A. Rosenberg, B. Ramabhadran,
and Y. Wu.
Generating diverse and natural text-to-speech samples using a
quantized fine-grained VAE and auto-regressive prosody prior.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), pages 6699--6703, May 2020.
[ bib |
DOI |
arxiv |
web ]
|
[10]
|
G. Sun, Y. Zhang, R. J. Weiss, Y. Cao, H. Zen, and Y. Wu.
Fully-hierarchical fine-grained prosody modeling for
interpretable speech synthesis.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), pages 6264--6268, May 2020.
[ bib |
DOI |
arxiv |
web ]
|
[11]
|
T. N. Sainath, R. Pang, R. J. Weiss, Y. He, C.-C. Chiu, and
T. Strohman.
An Attention-Based Joint Acoustic and Text on-Device End-To-End
Model.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), pages 7039--7043, May 2020.
[ bib |
DOI |
.pdf ]
|
[12]
|
J. Chorowski, R. J. Weiss, S. Bengio, and A. van den Oord.
Unsupervised speech representation learning using WaveNet
autoencoders.
IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 27(12):2041--2053, December 2019.
[ bib |
DOI |
arxiv ]
|
[13]
|
Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Z. Chen, R. J. Skerry-Ryan, Y. Jia,
A. Rosenberg, and B. Ramabhadran.
Learning to Speak Fluently in a Foreign Language: Multilingual
Speech Synthesis and Cross-Language Voice Cloning.
In Proc. Interspeech, Graz, Austria, September
2019.
[ bib |
DOI |
arxiv |
web ]
|
[14]
|
H. Zen, V. Dang, R. Clark, Y. Zhang, R. J. Weiss, Y. Jia, Z. Chen, and
Y. Wu.
LibriTTS: A Corpus Derived from LibriSpeech for
Text-to-Speech.
In Proc. Interspeech, Graz, Austria, September
2019.
[ bib |
DOI |
arxiv |
web ]
|
[15]
|
F. Biadsy, R. J. Weiss, P. J. Moreno, D. Kanvesky, and Y. Jia.
Parrotron: An End-to-End Speech-to-Speech Conversion Model and
its Applications to Hearing-Impaired Speech and Speech
Separation.
In Proc. Interspeech, Graz, Austria, September
2019.
[ bib |
DOI |
arxiv |
web ]
|
[16]
|
Y. Jia, R. J. Weiss, F. Biadsy, W. Macherey, M. Johnson, Z. Chen, and
Y. Wu.
Direct Speech-to-Speech Translation with a Sequence-to-Sequence
Model.
In Proc. Interspeech, Graz, Austria, September
2019.
[ bib |
DOI |
arxiv |
web ]
|
[17]
|
Q. Wang, H. Muckenhirn, K. Wilson, P. Sridhar, Z. Wu, J. Hershey, R. A.
Saurous, R. J. Weiss, Y. Jia, and I. Lopez-Moreno.
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned
Spectrogram Masking.
In Proc. Interspeech, Graz, Austria, September
2019.
[ bib |
DOI |
arxiv |
web ]
|
[18]
|
J. M. Antognini, M. Hoffman, and R. J. Weiss.
Audio Texture Synthesis with Random Neural Networks: Improving
Diversity and Quality.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Brighton, UK, May 2019.
[ bib |
DOI |
web |
poster |
.pdf ]
|
[19]
|
J. Guo, T. N. Sainath, and R. J. Weiss.
A Spelling Correction Model for End-to-End Speech
Recognition.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Brighton, UK, May 2019.
[ bib |
DOI |
arxiv |
slides ]
|
[20]
|
Y. Jia, M. Johnson, W. Macherey, R. J. Weiss, Y. Cao, C.-C. Chiu, N. Ari,
S. Laurenzo, and Y. Wu.
Leveraging Weakly Supervised Data to Improve End-to-End
Speech-to-Text Translation.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Brighton, UK, May 2019.
[ bib |
DOI |
arxiv |
slides ]
|
[21]
|
W. N. Hsu, Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Y. Wang, Y. Cao, Y. Jia,
Z. Chen, J. Shen, P. Nguyen, and R. Pang.
Hierarchical Generative Modeling for Controllable Speech
Synthesis.
In Proc. International Conference on Learning Representations
(ICLR), New Orleans, USA, May 2019.
[ bib |
reviews |
arxiv |
web ]
|
[22]
|
W. N. Hsu, Y. Zhang, R. J. Weiss, Y. A. Chung, Y. Wang, Y. Wu, and
J. Glass.
Disentangling Correlated Speaker and Noise for Speech Synthesis
via Data Augmentation and Adversarial Factorization.
In NeurIPS 2018 Workshop on Interpretability and Robustness in
Audio, Speech, and Language, Montréal, Canada, December
2018.
also at ICASSP
2019.
[ bib |
reviews |
web ]
|
[23]
|
Y. Jia, Y. Zhang, R. J. Weiss, Q. Wang, J. Shen, F. Ren, Z. Chen, P. Nguyen,
R. Pang, I. Lopez-Moreno, and Y. Wu.
Transfer Learning from Speaker Verification to Multispeaker
Text-To-Speech Synthesis.
In Advances in Neural Information Processing Systems (NeurIPS),
Montréal, Canada, December 2018.
[ bib |
reviews |
arxiv |
web |
poster ]
|
[24]
|
R. J. Skerry-Ryan, E. Battenberg, Y. Xiao, Y. Wang, D. Stanton, J. Shor, R. J.
Weiss, R. Clark, and R. A. Saurous.
Towards End-to-End Prosody Transfer for Expressive Speech
Synthesis with Tacotron.
In Proc. International Conference on Machine Learning (ICML),
Stockholm, Sweden, July 2018.
[ bib |
arxiv |
web ]
|
[25]
|
J. Antognini, M. Hoffman, and R. J. Weiss.
Synthesizing Diverse, High-Quality Audio Textures.
arXiv preprint arXiv:1806.08002, June 2018.
[ bib |
arxiv |
web ]
|
[26]
|
C.-C. Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen,
A. Kannan, R. J. Weiss, K. Rao, K. Gonina, N. Jaitly, B. Li, J. Chorowski,
and M. Bacchiani.
State-of-the-art Speech Recognition With Sequence-to-Sequence
Models.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Calgary, Canada, April 2018.
[ bib |
arxiv |
web ]
|
[27]
|
S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, and
K. Rao.
Multilingual Speech Recognition With A Single End-To-End
Model.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Calgary, Canada, April 2018.
[ bib |
arxiv |
web ]
|
[28]
|
J. Chorowski, R. J. Weiss, R. A. Saurous, and S. Bengio.
On Using Backpropagation for Speech Texture Generation and Voice
Conversion.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Calgary, Canada, April 2018.
[ bib |
arxiv |
web ]
|
[29]
|
J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen,
Y. Zhang, Y. Wang, R. J. Skerry-Ryan, R. A. Saurous, Y. Agiomyrgiannakis, and
Y. Wu.
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
Predictions.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Calgary, Canada, April 2018.
[ bib |
arxiv |
web ]
|
[30]
|
J. P. Bello, P. Grosche, M. Müller, and R. Weiss.
Content-Based Methods for Knowledge Discovery in
Music.
In Springer Handbook of Systematic Musicology, pages 823--840.
Springer, March 2018.
[ bib |
DOI ]
|
[31]
|
B. Li, T. N. Sainath, A. Narayanan, J. Caroselli, M. Bacchiani, A. Misra,
I. Shafran, H. Sak, G. Pundak, K. Chin, K. C. Sim, R. J. Weiss, K. Wilson,
E. Variani, C. Kim, O. Siohan, M. Weintraub, E. McDermott, R. Rose, and
M. Shannon.
Acoustic Modeling for Google Home.
In Proc. Interspeech, Stockholm, Sweden, August
2017.
[ bib |
DOI |
.pdf ]
|
[32]
|
R. J. Weiss, J. Chorowski, N. Jaitly, Y. Wu, and Z. Chen.
Sequence-to-Sequence Models Can Directly Translate Foreign
Speech.
In Proc. Interspeech, Stockholm, Sweden, August
2017.
[ bib |
DOI |
arxiv |
slides ]
|
[33]
|
Y. Wang, R. J. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang,
Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, and R. A.
Saurous.
Tacotron: Towards End-To-End Speech Synthesis.
In Proc. Interspeech, Stockholm, Sweden, August
2017.
[ bib |
DOI |
arxiv ]
|
[34]
|
C. Raffel, T. Luong, P. J. Liu, R. J. Weiss, and D. Eck.
Online and Linear-Time Attention by Enforcing Monotonic
Alignments.
In Proc. International Conference on Machine Learning (ICML),
Sydney, Australia, August 2017.
[ bib |
arxiv |
http ]
|
[35]
|
S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C.
Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J.
Weiss, and K. Wilson.
CNN Architectures for Large-Scale Audio
Classification.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), New Orleans, USA, March 2017.
[ bib |
DOI |
arxiv |
.pdf ]
|
[36]
|
T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan, E. Variani,
M. Bacchiani, I. Shafran, A. Senior, K. W. Chin, A. Misra, and
C. Kim.
Multichannel Signal Processing with Deep Neural Networks for
Automatic Speech Recognition.
IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 25(5):965--979, February 2017.
[ bib |
DOI |
.pdf ]
|
[37]
|
T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan, E. Variani,
M. Bacchiani, I. Shafran, A. Senior, K. W. Chin, A. Misra, and
C. Kim.
Raw Multichannel Processing Using Deep Neural
Networks.
In New Era for Robust Speech Recognition: Exploiting Deep
Learning. Springer, 2017.
[ bib |
DOI |
.pdf ]
|
[38]
|
T. N. Sainath, A. Narayanan, R. J. Weiss, E. Variani, K. W. Wilson,
M. Bacchiani, and I. Shafran.
Reducing the Computational Complexity of Multimicrophone
Acoustic Models with Integrated Feature Extraction.
In Proc. Interspeech, San Francisco, USA, September
2016.
[ bib |
DOI |
.pdf ]
|
[39]
|
B. Li, T. N. Sainath, R. J. Weiss, K. W. Wilson, and M. Bacchiani.
Neural Network Adaptive Beamforming for Robust Multichannel
Speech Recognition.
In Proc. Interspeech, San Francisco, USA, September
2016.
[ bib |
DOI |
.pdf ]
|
[40]
|
T. N. Sainath, R. J. Weiss, K. W. Wilson, A. Narayanan, and
M. Bacchiani.
Factored Spatial and Spectral Multichannel Raw Waveform
CLDNNs.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Shanghai, China, March 2016.
[ bib |
DOI |
.pdf ]
|
[41]
|
T. N. Sainath, R. J. Weiss, K. W. Wilson, A. Narayanan, M. Bacchiani, and
A. Senior.
Speaker Location and Microphone Spacing Invariant Acoustic
Modeling from Raw Multichannel Waveforms.
In Proc. IEEE Automatic Speech Recognition and Understanding
Workshop (ASRU), Scottsdale, USA, December 2015.
[ bib |
DOI |
.pdf ]
|
[42]
|
T. N. Sainath, R. J. Weiss, A. Senior, K. W. Wilson, and
O. Vinyals.
Learning the Speech Front-End with Raw Waveform
CLDNNs.
In Proc. Interspeech, Dresden, Germany, September
2015.
[ bib |
.pdf ]
|
[43]
|
Y. Hoshen, R. J. Weiss, and K. W. Wilson.
Speech Acoustic Modeling from Raw Multichannel
Waveforms.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), Brisbane, Australia, April
2015.
[ bib |
DOI |
.pdf ]
|
[44]
|
J. Weston, R. Weiss, and H. Yee.
Affinity Weighted Embedding.
In Proc. International Conference on Machine Learning (ICML),
pages 1215--1223, Beijing, China, June 2014.
[ bib |
http |
.pdf ]
|
[45]
|
J. Weston, H. Yee, and R. J. Weiss.
Learning to Rank Recommendations with the k-order Statistic
Loss.
In Proc. ACM Conference on Recommender Systems (RecSys),
pages 245--248, Hong Kong, October 2013.
[ bib |
DOI |
.pdf ]
|
[46]
|
J. Weston, R. J. Weiss, and H. Yee.
Nonlinear Latent Factorization by Embedding Multiple User
Interests.
In Proc. ACM Conference on Recommender Systems (RecSys),
pages 65--68, Hong Kong, October 2013.
[ bib |
DOI |
.pdf ]
|
[47]
|
J. Weston, R. Weiss, and H. Yee.
Affinity Weighted Embedding.
In Proc. International Conference on Learning Representations
(ICLR), Scottsdale, USA, May 2013.
[ bib |
arxiv |
http |
.pdf ]
|
[48]
|
J. Weston, C. Wang, R. Weiss, and A. Berenzweig.
Latent Collaborative Retrieval.
In Proc. International Conference on Machine Learning (ICML),
Edinburgh, Scotland, June 2012.
[ bib |
arxiv |
http |
.pdf ]
|
[49]
|
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay.
scikit-learn: Machine Learning in Python.
Journal of Machine Learning Research, 12:2825--2830, October
2011.
[ bib |
arxiv |
http |
.pdf ]
|
[50]
|
R. J. Weiss and J. P. Bello.
Unsupervised Discovery of Temporal Structure in
Music.
IEEE Journal of Selected Topics in Signal Processing,
5(6):1240--1251, October 2011.
[ bib |
DOI |
.pdf ]
|
[51]
|
T. Bertin-Mahieux, G. Grindlay, R. J. Weiss, and D. P. W. Ellis.
Evaluating Music Sequence Models Through Missing
Data.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), pages 177--180, Prague, Czech Republic,
May 2011.
[ bib |
DOI |
.pdf ]
|
[52]
|
R. J. Weiss, M. I. Mandel, and D. P. W. Ellis.
Combining Localization Cues and Source Model Constraints for
Binaural Source Separation.
Speech Communication, 53(5):606--621, May 2011.
Special issue on Perceptual and Statistical Audition.
[ bib |
DOI |
.pdf ]
|
[53]
|
T. Bertin-Mahieux, R. J. Weiss, and D. P. W. Ellis.
Clustering Beat-Chroma Patterns in a Large Music
Database.
In Proc. International Society for Music Information Retrieval
Conference (ISMIR), pages 111--116, Utrecht, Netherlands, August
2010.
[ bib |
web |
.pdf ]
|
[54]
|
R. J. Weiss and J. P. Bello.
Identifying Repeated Patterns in Music Using Sparse Convolutive
Non-Negative Matrix Factorization.
In Proc. International Society for Music Information Retrieval
Conference (ISMIR), pages 123--128, Utrecht, Netherlands, August
2010.
Best Paper Award.
[ bib |
web |
slides |
.pdf ]
|
[55]
|
T. Cho, R. J. Weiss, and J. P. Bello.
Exploring Common Variations in State of the Art Chord
Recognition Systems.
In Proc. Sound and Music Computing Conference (SMC), pages
1--8, Barcelona, Spain, July 2010.
[ bib |
.pdf ]
|
[56]
|
M. I. Mandel, R. J. Weiss, and D. P. W. Ellis.
Model-Based Expectation-Maximization Source Separation and
Localization.
IEEE Transactions on Audio, Speech, and Language Processing,
18(2):382--394, February 2010.
[ bib |
DOI |
web |
.pdf ]
|
[57]
|
R. J. Weiss and D. P. W. Ellis.
Speech Separation Using Speaker-Adapted Eigenvoice Speech
Models.
Computer Speech and Language, 24(1):16--29, January
2010.
Speech Separation and Recognition Challenge.
[ bib |
DOI |
.pdf ]
|
[58]
|
R. J. Weiss and D. P. W. Ellis.
A Variational EM Algorithm for Learning Eigenvoice Parameters
in Mixed Signals.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), pages 113--116, Taipei, Taiwan, April
2009.
[ bib |
DOI |
poster |
.pdf ]
|
[59]
|
R. J. Weiss.
Underdetermined Source Separation Using Speaker Subspace
Models.
PhD thesis, Department of Electrical Engineering, Columbia
University, 2009.
[ bib |
slides |
.pdf ]
|
[60]
|
R. J. Weiss and T. Kristjansson.
DySANA: Dynamic Speech and Noise Adaptation for Voice
Activity Detection.
In Proc. Interspeech, pages 127--130, Brisbane, Australia,
September 2008.
[ bib |
http |
poster |
.pdf ]
|
[61]
|
R. J. Weiss, M. I. Mandel, and D. P. W. Ellis.
Source Separation Based on Binaural Cues and Source Model
Constraints.
In Proc. Interspeech, pages 419--422, Brisbane, Australia,
September 2008.
[ bib |
http |
poster |
.pdf ]
|
[62]
|
R. J. Weiss and D. P. W. Ellis.
Monaural Speech Separation Using Source-Adapted
Models.
In Proc. IEEE Workshop on Applications of Signal Processing to
Audio and Acoustics (WASPAA), pages 114--117, New Paltz, USA, October
2007.
[ bib |
DOI |
web |
slides |
.pdf ]
|
[63]
|
R. J. Weiss and D. P. W. Ellis.
Estimating Single-Channel Source Separation Masks: Relevance
Vector Machine Classifiers vs. Pitch-Based Masking.
In Proc. ISCA Tutorial and Research Workshop on Statistical
Perceptual Audition (SAPA), pages 31--36, Pittsburgh, USA, September
2006.
[ bib |
http |
slides |
.pdf ]
|
[64]
|
D. P. W. Ellis and R. J. Weiss.
Model-Based Monaural Source Separation Using a Vector-Quantized
Phase-Vocoder Representation.
In Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), pages V--957--960, Toulouse, France, May
2006.
[ bib |
DOI |
.pdf ]
|