This study aims to investigate the transmission characteristics of BC speech focusing on how the OC sound pressure originating from the glottal source affects the BC transmission to the RT and EC, for further understanding the transmission process related to sound pressure from the glottal source to the soft tissue/skull bone and the EC. The transfer functions from the OC to RT (|H ocrt(f)|) and those from the OC to EC (|H ocec(f)|) are physically measured, to explore whether/how the transmission characteristics assayed at the RT or in the EC are strongly related to the perceptual low-pass effect of BC speech.
Discussion
Transmission characteristics of the RT vibration and the EC sound radiation induced by the excitation in the OC were measured, hypothesizing the transmission model shown in
Figure 1. Regarding the excitation setup shown in
Figure 2, effects of the rubber enclosure on the derived results have not been taken into account in this study: The rubber enclosure itself may partly transmit the mechanical vibration of the loudspeaker to the palate, which is a secondary source of excitation of the skull bone. Hence, the obtained transfer functions may differ from those during natural speaking. Overall, as shown in
Figure 7, the OC sound pressure was relatively stable in the frequency band below 3 kHz but partially decreased and had two troughs by approximately 20 dB in the frequency band between 3 and 5 kHz, which may correspond to the common peaks at 3.5 kHz shown in the derived transfer functions (see
Figure 8). The troughs might come from some interaction between the rubber enclosure and the ER-10C probe. The standing waves in the enclosure may have partially altered the probe output and canceled the desired probe response to the sound pressure inside the enclosure due to phase interaction. The rubber cavity length was 39 mm, but it could be slightly increased/decreased because of its flexibility. Considering this point, the troughs in the OC sound pressure may correspond to the first eigenfrequency (approximately 3.5–4.5 kHz) of the standing waves. Additionally, the rubber enclosure may have altered the transmission and affected the transfer function slightly, since the enclosure touched each participant's palate. The hardness of the rubber was Shore 30 A, which is assumed to be more flexible than their palate. Hence, the rubber could affect the soft tissue/skull bone transmission at lower frequencies. This also might cause the nonflat characteristics of the OC SPLs especially below 0.35 kHz, as shown in
Figure 7. In this case, not only the sound from the loudspeaker but also the mechanical vibration of the loudspeaker or the enclosure could influence the RT vibration and EC sound pressure. The concerns above should be further investigated in future measurements.
There could be other factors of the nonflat characteristics of the OC SPL shown in
Figure 7. Although the rubber enclosure in the OC for each participant was individually designed and molded so that it was well fitted to their hard palate, the individual differences of their palate shapes could influence the characteristics of the sound fields. Even some slight movements of their mouths/bodies may also have caused the change in OC SPL during the measurements, especially for the Participant P05. If the nonflat characteristics in the thin-darkened frequency bands as shown in
Figure 7 were offset, the accurate transfer functions might be estimated. Improvement of the excitation setup (see
Figure 2) may enable the nonflat frequency bands shown in
Figure 7 to be reduced. Additionally, intersubject variability of the OC SPL should be considered. One of the possible reasons for the variability is assumed to be the intersubject variability of the strength (force) of holding the enclosure with the upper/lower jaws. The differences of the strength may influence the volumes of the sound fields or the vibration modes of the speaker/enclosure, which might cause the intersubject variability of the sound pressure in the rubber enclosure.
In this study, the RT vibration was characterized as a change in displacement, while the EC sound radiation was characterized as a change in sound pressure. Basically, these two characteristics should be difficult to compare directly. However, it has been pointed out that the capacitor microphone responses to body vibration indicate the displacement characteristics (
Watanabe et al., 2001). Hence, in this study, the displacement characteristics of the RT vibration and the sound-pressure characteristics of the EC sound radiation (which is derived from a capacitor microphone response) were compared. It should be noted that this manipulation just enables the shapes of RT and EC transfer functions to be compared globally or relatively but not quantitatively. The correspondence of the derived transfer function |
H ocrt(
f)| to perceptual properties of BC speech should be clarified.
The measurement results of the RT vibration were regarded as the vibrations of the skull bone (or the soft tissue). Naturally, the vibration characteristics of the temporal bone are not identical to those of the whole skull bone. In this study, the BC microphone was fixed on a specific position in the left RT as shown in
Figure 4. In general, measurements at different BC microphone locations during BC stimulation cause different characteristics (
Stenfelt et al., 2000). In the case of recording BC speech during vocalization, obvious effects of BC microphone placement on the intensity and spectral characteristics of the recorded BC speech have also been pointed out (
Tran et al., 2013). However, the present aim of the RT vibration measurement was to obtain the BC transmission characteristics from the OC to the EC via soft tissue/skull bone and not to explore the detailed vibration characteristics of the whole skull bone.
Stenfelt et al. (2002) stimulated a cored temporal bone specimen to simulate the middle ear part of BC transmission. Likewise, the RT vibration characteristics obtained in this study (|
H ocrt(
f)|) may be the main factors contributing to the middle or inner ear parts of BC speech transmission.
Another concern is the EC occlusion during the measurement. The purpose of the occlusion was to attenuate the AC sound leaking from the excitation setup to the EC microphone. During the experiment, the EC SPL due to the OC excitation was approximately 70–80 dB. At that time, the SPL of the AC leakage at the OC entrance was less than 64 dB. Assuming that the ER10C-14A foam ear tips provided approximately 15 dB of attenuation, the AC SPL leaked from the OC entrance to the occluded EC is assumed to be less than 50 dB. The effect of the AC leakage on the measured EC sound pressure should be carefully considered. Here, the present measurement results of the EC sound radiation include the compensation of the mean occlusion effect derived from the EC sound pressure induced by the mastoid BC stimulation (
Stenfelt, Wild, et al., 2003). This value was reported to be derived from nine normal intact ECs and BC stimulation from a transducer on the skull. This is the case for an insertion depth of 8-mm of probes. In fact, the actual magnitude of the occlusion effect is reported to strongly depend on the probe insertion depth (
Stenfelt & Reinfeldt, 2007). In this study, the probe insertion depth was approximately from 7 to 14 mm. Considering this, the compensation in this study is assumed to be typical for a shallower placement. Additionally, frequency characteristics of the occlusion effect induced by the frontal BC stimulation (
Fagelson & Martin, 1998;
Goldstein & Hayes, 1965) are reported to be different. In the case of BC speech transmission during vocalization, the frequency characteristics of the occlusion effect are likely to change, too. Instead of the EC occlusion and the occlusion effect compensation, using large earmuffs is another method to attenuate AC sound without causing the occlusion effect (
Pörschmann, 2000;
Reinfeldt et al., 2010). The influence of the occlusion effect compensation on the measurement results in this study should be further investigated.
The global magnitude attenuation of around −78 dB/octave as the frequency increases, seen in the averaged |
H ocrt(
f)|, is assumed to be derived especially from the low-pass effect of the soft tissue. Although the local peaks between 0.7 and 1 kHz in |
H ocrt(
f)| for all participants may correspond to the peaks shown in the OC SPL, those peaks in |
H ocrt(
f)| might also be affected by the first resonance of the skull bone itself, which is reported to be 0.8–1.2 kHz (
Håkansson et al., 1994). On the other hand, the relative increase of magnitude around 2–2.5 kHz in |
H ocec(
f)| for most participants is assumed to be related to the open EC resonance due to the occlusion effect compensation. The EC transfer function for AC sound is known to have a peak around 2–3 kHz (
Mehrgardt & Mellert, 1977). A similar trend is also shown in the EC sound pressure during BC stimulation (
Stenfelt, Wild, et al., 2003). However, the range of the relative magnitude between |
H ocrt(
f)| and |
H ocec(
f)| (45–50 dB) was greater than that of the reported EC transfer functions (
Mehrgardt & Mellert, 1977: around 15 dB). Thus, this suggests that the EC transmission characteristics in BC and AC speech transmission differ.
As shown in
Figure 9, the previous measurement of transmission characteristics from the excitation in the larynx to the RT vibration and EC sound radiation showed global trends similar to the current characteristics of |
H ocrt(
f)| and |
H ocec(
f)|, except the trends of the frequency components below 400 Hz and around 2000 Hz (
Toya et al., 2019). In this article, the currently and previously measured transfer functions could not be quantitatively compared. However, the local difference between two measurement results may suggest the effect of the mechanical coupling of the vocal folds and soft tissue. Although it is currently still unclear whether the BC speech transmission via OC sound pressure is more effective than that via the mechanical coupling of the vocal folds and soft tissue, the OC sound pressure is hypothesized to be a meaningful contributor to BC speech perception. The reasons for this hypothesis are as follows:
1.
The delayed auditory feedback presented through BC causes stuttering speech as well as through AC (
Toya et al., 2016). This suggests that the BC speech includes phoneme information (which comes from the vocal tract filter), and one reason for the phenomenon may be the temporal mismatch of the phoneme information between production and perception.
2.
The phoneme-related characteristic is observed in BC speech signals recorded by a BC microphone (
Rahman & Shimamura, 2019).
Future studies should further investigate which of two pathways is more effective.
Perceptual properties of BC speech during vocalization have been investigated on the basis of the hearing threshold.
Pörschmann (2000) measured the masked threshold for AC and BC speech to determine the perceptual relationship between AC and BC speech as a function of frequency. He showed the transfer function of the BC part of one's own voice, in which gradual attenuation of amplitude (around 20 dB) was found as the frequency exceeds 1 kHz. This behavior is similar to that of the transfer function |
H ocrt(
f)| measured in our study. Considering that his perceptually obtained transfer functions should be influenced by all (the outer/middle/inner ear) parts of BC speech transmission, own-voice perception is likely to be strongly affected by the low-pass effect of the tissue/skull bone vibration, as assumed in
Figure 1.
Reinfeldt et al. (2010) measured the physical and perceptual relationship between AC and BC speech during vocalization on the basis of EC sound pressure at hearing thresholds for AC and BC stimulations. Overall, they showed that the BC relative to AC sound pressure is greater in the 1–2 kHz frequency region but less in the frequency region below 500 Hz. Theoretically, their physically measured characteristics of EC sound pressure should be affected by both the soft tissue/skull bone vibration and EC sound radiation, without effects by the middle or inner ear parts of the transmission. In this study, the low-attenuation and high-emphasis effects of |
H ocec(
f)| are likely to account for their results. In this study, the change of the transmission characteristics depending on the articulation was not examined, while previous measurements (
Reinfeldt et al., 2010) showed phoneme-dependent transmission characteristics. Since the amplitude behavior of recorded BC speech is reported to be sensitive to the first formant locations (
Rahman & Shimamura, 2019), the transmission characteristics related to BC speech are likely to change depending on the size of the OC or tongue height.
One's own perceived voice (i.e., the combination of AC and BC speech) has a perceptually low-pass effect relative to AC speech (
Nakayama, 1997). Therefore, BC speech obviously has a greater magnitude in the lower frequency region and less magnitude in the higher frequency region. Considering the 2- to 3-kHz bandpass effect of EC sound radiation or BC relative to AC sensitivity due to EC sound pressure reported in
Reinfeldt et al. (2010), it is suggested that the BC transmission to the outer ear may not be a dominant contributor to BC speech perception during vocalization.
Stenfelt and Goode (2005) hypothesize that the inertia of the cochlea fluids is the most important contributor to BC hearing. Frequency dependency of the fluid displacement ratio at the oval and the round window due to BC stimuli (
Stenfelt, Hato, & Goode, 2003) may suggest the nonflat frequency response in the cochlea. However, vibration of the soft tissue and the skull bone is likely to affect the inner ear directly due to inertial forces in the cochlea fluid. Hence, BC speech, initially characterized by the low-pass effect (i.e.,|
H ocrt(
f)|) in the soft tissue and the skull bone, can be assumed to be transmitted to the inner ear as a main contributor to BC speech perception especially in the lower frequency region. Unfortunately, no physical method has been developed for measuring the inner ear part of BC speech transmission in vivo. Instead, for example, subjective modification of the AC speech spectrum by using the |
H ocrt(
f)| and |
H ocec(
f)| transfer functions to simulate the outer/inner ear parts of BC speech transmission may be useful to further investigate perceptual properties. At the same time, transmission characteristics in the middle ear part of BC should also be considered. The middle ear velocity level is reported to be greater in the frequency region around 2 kHz than in the other regions (
Stenfelt et al., 2002). The effect of these characteristics on BC speech perception should be further investigated.
During the measurement of transfer functions |
H ocrt(
f)| and |
H ocec(
f)| for each participant (see
Figure 8), both the soft tissue/skull bone vibration and the EC sound radiation were assumed to be linear systems, as shown in
Figure 1. The skull bone sound transmission is reported to behave linearly in the frequency region between 0.1 and 10 kHz (
Håkansson et al., 1996). In this study, a nonlinear relationship between the soft tissue/skull bone vibration and the EC sound radiation was not considered. For example, a nonlinear relationship should be considered if the middle ear muscle reflex (MEMR) is activated due to the loud excitation in the OC. Although the EC SPL during the experiments was approximately 70–80 dB, the experimental stimuli (i.e., the excitation signal in the OC) could be transmitted to not only the EC but also the middle ear. Therefore, the total energy reaching the auditory system could be greater. An impedance change due to the muscle reflex as the stimulus SPL exceeds 90 dB was long ago reported to be obvious (
Møller, 1962). If the MEMR were activated, the hypothesis that each system is linear would not be true. However, even if the ossicular chain were stiffened due to the MEMR, this would not be an important factor affecting the transfer function between the OC to the EC via soft tissue/skull bone. It is argued that the EC sound radiation due to BC excitation is produced mainly because of the deformation of the EC itself (
Stenfelt, 2011). Considering this point, the impedance changes of the ossicular chain or the tympanic membrane could barely affect the transfer functions derived in this article.
In the model of the transmission in
Figure 1, |
H ocec(
f)| was assumed to be a serial composite system of the |
H ocrt(
f)| and the EC sound radiation effect. Here, the trends of the relative magnitude difference between |
H ocrt(
f)| and |
H ocec(
f)| shown in
Figure 10 were similar to those of the open EC sound pressure due to BC stimulation derived from
Stenfelt, Wild, et al. (2003). If the difference between |
H ocrt(
f)| and |
H ocec(
f)| represents a modifier from the soft tissue/skull bone vibration to the EC sound radiation, the assumption of the serial composite system may account for the similarity stated above. However, this is only a simple interpretation and much uncertainty remains. For example, the relative velocity between the RT vibration and the middle ear ossicle vibration could cause the sound radiation in the EC via the tympanic membrane (
Stenfelt et al., 2002). If the effect of this transmission is strong, the assumption of the serial composite system is not entirely valid. To further investigate this point, subjective evaluations of the voice quality of the modified AC speech spectrum using the |
H ocrt(
f)| and |
H ocec(
f)| may be useful.
In this study, the amplitude characteristics of the transfer function related to BC speech were measured. However, in real life, speakers perceive both their own AC and BC speech at the same time. Hence, this measurement does not entirely represent the speakers' real-life situation enough. For example, there could be phase interactions between AC and BC speech perceived by speakers themselves. In this article, phase characteristics of the transfer functions were not investigated because of several compensations and manipulation for the amplitude characteristics. However, considering the effect of phase interaction, not only amplitude characteristics but also phase characteristics of the transfer functions for both AC and BC speech transmission need to be investigated. In future studies, the amplitude/phase characteristics of the EC sound pressure induced by the OC sound pressure could be measured using the nonoccluding microphones. Here, the transfer function measured in this study would be useful to determine whether/how the perception of one's own voice is affected by the phase interaction between AC and BC speech.