Monaural Speech Separation Using Source-Adapted Models
R. J. Weiss and D. P. W. Ellis, "Monaural Speech Separation Using Source-Adapted Models", in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2007. pp. 114-117
We proposed a system for speech separation based on source-adapted speech signal models. We evaluated the system on the test data from the 2006 Speech Separation Challenge and compared its performance to that of speaker dependent and speaker independent model based systems. Below you can find some performance numbers and audio examples of the separated signals. You can also mouse over the spectrograms to the right to see listen to the corresponding audio.
Performance
We report the word accuracy on the letter and digit spoken by the speaker that says "white" for all three systems. See the paper for a detailed discussion.
Separation using speaker adapted (SA) models (after 5 iterations):
SNR | Same Talker | Same Gender | Diff Gender | Avg. |
---|---|---|---|---|
6 dB | 38.96% | 50.56% | 62.50% | 50.25% |
3 dB | 33.56% | 47.21% | 59.25% | 46.17% |
0 dB | 26.80% | 42.46% | 52.50% | 40.02% |
-3 dB | 23.42% | 33.52% | 49.75% | 35.19% |
-6 dB | 17.79% | 25.14% | 33.75% | 25.29% |
-9 dB | 13.06% | 21.51% | 26.00% | 19.88% |
Separation using speaker adapted (SA) models (after 15 iterations):
SNR | Same Talker | Same Gender | Diff Gender | Avg. |
---|---|---|---|---|
6 dB | 41.89% | 63.41% | 71.00% | 57.99% |
3 dB | 32.43% | 58.38% | 71.25% | 53.08% |
0 dB | 29.05% | 53.35% | 64.25% | 48.00% |
3 dB | 22.07% | 43.02% | 56.50% | 39.77% |
-6 dB | 19.59% | 39.39% | 40.25% | 32.36% |
-9 dB | 14.64% | 24.30% | 30.25% | 22.71% |
Separation using speaker dependent (SD) models:
SNR | Same Talker | Same Gender | Diff Gender | Avg. |
---|---|---|---|---|
6 dB | 38.29% | 78.49% | 74.25% | 62.23% |
3 dB | 37.84% | 74.58% | 77.75% | 62.06% |
0 dB | 28.60% | 72.07% | 76.00% | 57.32% |
-3 dB | 22.75% | 62.29% | 66.00% | 48.92% |
-6 dB | 15.32% | 46.93% | 51.25% | 36.69% |
-9 dB | 9.01% | 27.93% | 27.50% | 20.80% |
Separation using speaker independent (SI) models:
SNR | Same Talker | Same Gender | Diff Gender | Avg. |
---|---|---|---|---|
6dB | 31.08% | 34.08% | 35.50% | 33.44% |
3dB | 26.80% | 30.45% | 31.50% | 29.45% |
0dB | 24.55% | 26.26% | 31.00% | 27.20% |
-3dB | 18.02% | 22.35% | 21.75% | 20.55% |
-6dB | 14.19% | 18.44% | 18.75% | 16.97% |
-9dB | 9.46% | 9.50% | 11.75% | 10.23% |
Audio examples
. | 6dB | 3dB | 0dB | m3dB | m6dB | m9dB |
---|---|---|---|---|---|---|
Mixture | mixture | mixture | mixture | mixture | mixture | mixture |
SA | target masker | target masker | target masker | target masker | target masker | target masker |
SD | target masker | target masker | target masker | target masker | target masker | target masker |
SI | target masker | target masker | target masker | target masker | target masker | target masker |