June - September 2004: Clinical implications of OAE generation theory for the prediction of behavioral hearing thresholds




Summarized from Shaffer, Withnell, Dhar, Lilly, Harmon and Goodman (2003), Ear and Hearing, 24, 367-379

Lauren A. Shaffer1 and Sumitrajit Dhar2

1Department of Speech Pathology and Audiology, Ball State University, Muncie, IN 47306

2Department of Speech and Hearing Sciences, Indiana University, Bloomington, IN 47405



The clinical utility of otoacoustic emissions (OAEs) is limited because hearing thresholds cannot be accurately or reliably predicted from OAE tests results.  We recently published a paper that discusses this limitation in light of current theories of how OAEs are generated.  The following summarizes some of the main points from the paper.  For further detail, additional figures, and analysis methods, the reader is referred to Shaffer et al. (2003 ) (please see footnote1).


Theory of the generation mechanisms and sources of OAEs


Over the last decade, a coherent theory has developed suggesting that the different OAE types share common mechanisms of generation (Talmadge, Tubis, Long & Piskorki, 1998; Zweig & Shera, 1995).  Shera and Guinan (1999) suggested a re-classification of OAEs based on two mechanisms of generation, nonlinear distortion and linear coherent reflection.  Nonlinear distortion and linear coherent reflection can also be thought of in the terms suggested by Kemp (1986), “wave-fixed” and “place fixed”.  Nonlinear distortion is believed to arise from physiological nonlinearities associated with the action of the cochlear amplifier as it injects energy into the basilar membrane motion.  Therefore, OAE energy arising from nonlinear distortion is “fixed” to the traveling wave.  Reflection, on the other hand, may occur anywhere along the cochlear partition that an irregularity causes energy to be turned around.  There is still much uncertainty about what types of irregularities actually cause reflection in the cochlea.  In fact, Kemp (2002) suggested a further branching of generation terminology to distinguish between potentially passive mechanisms, such as an irregularity in hair cell number, and active mechanisms, such as variation in cochlear amplifier gain. 

While the question of what causes reflection remains unanswered, what is known is that the irregularities are randomly distributed (Zwieg & Shera, 1995) such that energy can be reflected at sites all along the cochlear partition.  Most of the reflected wavelets will occur with random phase relations causing the energy to be cancelled, however, energy that is reflected under the peak of the traveling wave will have a coherent phase that allows the positive summation of energy, creating a substantial reflection component of the OAE that is recorded in the ear canal. 

One problem with the classification system suggested by Shera and Guinan (1999) is that most emissions are not generated purely from one mechanism, but are a mix of the two different generation mechanisms, particularly as stimulus level is increased.  For example, stimulus frequency emissions, which are believed to arise from linear coherent reflection at low stimulus levels, may involve both nonlinear distortion and linear coherent reflection at higher stimulus levels (Long, Talmadge, & Thorp, 2001; Shera & Guinan, 1999).  Distortion product emissions and high-level transient evoked emissions also arise from a mix of nonlinear distortion and linear coherent reflection (Shera & Guinan, 1999; Yates & Withnell, 1999). 


The term ‘source’ can cause some confusion in a discussion of OAE generation.   The term is sometimes used synonymously with generation mechanism.  In Shaffer et al. (2003) we use the term ‘source’ to refer to the location or site where OAE energy arises regardless of the mechanism that generates the emission.  The 2f1-f2 DPOAE arises predominantly from two sources (e.g. Brown, Harris & Beveridge,1996; Gaskill & Brown, 1996; Kemp & Brown, 1983; Kummer, Janssen & Arnold,1995; Talmadge et al., 1996, 1997).  Energy of the 2f1-f2 DPOAE first arises at the site on the basilar membrane where the stimulus induced traveling waves interact.  For simplicity, we refer to this region as the “overlap” region.  The overlap region under many stimulus conditions can be approximated as the characteristic frequency location of f2.  Energy arising in this location is generated predominantly by nonlinear distortion. Thus, we refer to this energy as the nonlinear component.  The energy of the nonlinear component, which has a frequency of 2f1-f2, travels bi-directionally, basally toward the ear canal, and apically toward the characteristic frequency location of 2f1-f2 (CFdp).  At the CFdp location the energy undergoes linear coherent reflectionThis reflection component of the 2f1-f2 DPOAE then travels basally toward the ear canal.  At the stapes, some distortion product energy will pass on to the middle ear, while some energy will be reflected at this boundary causing a standing wave pattern to be developed in the cochlea and having dramatic effects on the level of the emission recorded in the ear canal (Talmadge et al., 1998; Dhar, Talmadge, Long & Tubis, 2002). 



Systematic variation in the amplitude of the composite DPOAE


The 2f1-f2 DPOAE measured in the ear canal is a vector sum of the nonlinear and reflection components that originate from two different cochlear sources and can be thought of as a composite of these different components.  The energy from these different sources interacts as it travels in the cochlea arising first at the overlap region, then being reflected at the Cfdp region and again at the middle ear boundary.  The resulting interference pattern leads to variation in the sound pressure level and phase of the composite DPOAE that is measured in the ear canal.  This variation in the sound pressure level of the DPOAE is quasi-periodic with frequency and is known as “fine structure” (see Figure 1 and footnote [2]
          All evoked OAEs show evidence of amplitude and phase fine structure when observed with fine frequency resolution.  In fact, the commonality in the frequency spacing of fine structure amplitude peaks across different emission types (including the minimum spacing between spontaneous emission peaks) provided early evidence of common mechanisms of generation (reviewed in Talmadge et al., 1998).




Figure 1.  Amplitude and phase fine structure for the 2f1-f2 DPOAE of a subject (KG) with normal hearing.  Note that peak to valley amplitude variation can be greater than 20 dB, and that valleys in the amplitude fine structure sometimes drop into the noise floor.   Talmadge, Long, Tubis and Dhar (1999) used a phasor model to show that characteristic phase patterns (ramp and sawtooth patterns) result from the interaction of the nonlinear and reflection components of the 2f1-f2 DPOAE.  Which pattern arises depends on the relative strength of the two components, which can vary with frequency.  In these data, ramp patterns appear in the frequency range from approximately1500 to 2000 Hz and sawtooth patterns appear above 2000 Hz.


Fine structure is not observed in clinical measurements of DPOAEs because common protocols call for recording the DPOAEs at 1/3 octave intervals.  This resolution is too coarse to observe fine structure.  The typical spacing between amplitude peaks in the fine structure varies with frequency, but in the mid audiometric frequencies ranges from approximately 100-200 Hz. Peak-to-valley variation in amplitude can be 20 dB or greater in normal hearing subjects (He & Schmeidt, 1993).  So, while fine structure is not observed in clinical testing, its presence can greatly complicate the interpretation of the clinical DP-gram and clinical DP input/output (I/O) function (Heitmann, Waldmann & Plinkert, 1996).

The problem in interpreting individual test results is that valleys or “dips” in the amplitude fine structure may drop below the confidence intervals of normative data and can also drop below the noise floor (see Figure 1).   Such points will be interpreted as evidence of dysfunction in the frequency region where the dip occurs, when, in fact, higher resolution recordings would show that such dips are part of a normal pattern of fine structure.

Problems equally arise in attempting to interpret DP I/O functions.  The peaks and valleys of amplitude fine structure shift with stimulus level, causing changes in the shape and slope of the I/O function.  This problem is illustrated schematically in Figure 2.  Shifts in fine structure with stimulus level may partly explain why in recent studies attempting to extrapolate thresholds from DPOAE I/O data, 30% and 60% of the I/O functions did not meet the criteria for inclusion (Boege & Janssen, 2002; Gorga, Neely, Dorn & Hoover, 2003).  Among normal hearing subjects, the majority did not meet the slope criterion (Gorga et al., 2003).  While fine structure was not considered in these studies, it may well explain the variability in I/O function slope among normal hearing subjects.



Figure 2.  Schematic of shift in fine structure with increasing stimulus level.  The colors represent different L2 levels for a constant L1.  Resulting I/O functions will vary in slope depending on where in the fine structure the sample is taken.  Frequencies near a “dip” or on the slope of fine structure can produce very different I/O functions.  Frequency shifts in DPOAE fine structure were first described by He and Schmeidt (1993).  The change in the shape or slope of I/O functions resulting from fine structure shifts with level illustrates that I/O functions are affected by the interaction of multiple source components.


At present the only way to deal with the clinical problems associated with low-resolution recording is to record with high stimulus resolution around any frequency in question to ascertain whether fine structure is influencing the DP-gram or I/O function.


Predicting behavioral thresholds from the composite DPOAE


So, how can theory of generation mechanisms and cochlear sources shed light on the limitations of the composite OAE in predicting behavioral hearing thresholds?  First, studies that have attempted to relate OAE amplitude to hearing thresholds typically correlate a single emission frequency to a single audiometric frequency (in the case of distortion products, the DPOAE amplitude at a given f2 frequency is correlated to the behavioral threshold at the same audiometric frequency).   From generation theory and experimental data, we now know that for DPOAEs and high level TEOAEs there are multiple cochlear sources that contribute to the emission (Avan, Bonfils, Loth & Wit, 1993; Withnell, Yates & Kirk, 2000), therefore the amplitude of the composite OAE at a single frequency represents the sensitivity of all the cochlear sources that have contributed to the emission, not just the sensitivity at a single frequency.  This “mismatch” between the cochlear locations that give rise to the emission and the emission test frequency may be responsible for some of the variability observed in simple correlations.

Amplitude fine structure also plays a roll in limiting the prediction of hearing thresholds from DPOAE level.  When data from large numbers of subjects are averaged to obtain normative values, the amplitude variation is essentially low-pass filtered and the fine structure pattern is no longer obvious in the averaged data.  What is obvious in the averaged data, however, is the large amplitude variance (and correspondingly large confidence interval range) resulting from fine structure (Gorga, Neely, Ohlrich, Hoover, Redner, & Peters, 1997).  For this reason, fine structure amplitude variation contributes to the overlap in amplitude distributions between normal hearing and impaired ears (Gorga et al., 1997), and probably also contributes to the tremendous variability seen in correlations of DPOAE level and hearing thresholds.


“Single source” or “component” DP-grams and DP I/O functions


Two different methods have been used to separate the nonlinear and reflection components of the composite DPOAE.   Waldmann, Heitmann, & Plinkert (1997) first conceptualized the “single-source” DPOAE (sgDPOAE), later showing that a suppressor tone close in frequency to 2f1-f2 could be used to reduce the amplitude variation in fine structure (Heitmann, Waldmann, Schnitzler, Plinkert & Zenner, 1998) by suppressing the reflection component from the Cfdp region.

The other technique used to separate the nonlinear and reflection components involves digital signal processing methods that exploit the observation that the phases of the two components exhibit different behaviors as a function of frequency. In fact, the two components have quite distinct phase behaviors.  Because the nonlinear component is associated with the stimulus traveling waves (wave-fixed), its phase is relatively invariant with frequency.  This behavior arises because the traveling wave exhibits an approximately constant number of cycles of vibration to the peak regardless of the frequency of the stimulus.  The result is that the nonlinear distortion produced always shares the same phase relation to the stimulus phase producing a flat phase versus frequency function.  The reflection component, on the other hand, which arises from fixed locations (place-fixed) on the cochlear partition has a rapidly rotating phase that accumulates across frequency.  The resulting phase versus frequency function has a steep slope.  From this discussion, it may be much clearer how two components, one with approximately constant phase and one with rapidly changing phase, sum together to produce the quasi-periodic pattern of amplitude and phase fine structure.

Stover, Neely, and Gorga (1996) first showed that inverse fast Fourier transform (IFFT) of the DP-gram could be used to separate the components in the time domain.  Because of their phase properties, the nonlinear and reflection components resolve as independent peaks when an IFFT is applied to high resolution DP-gram data.  Time-windowing can then be used to isolate the component peaks, and FFT used to convert the time-windowed data back into “component” DP-grams.  Figure 3 illustrates the nonlinear and reflection component DP-grams that result from using this analysis. Kalluri and Shera (2001) showed that a suppression paradigm and the IFFT/time-windowing analysis produce equivalent results in isolating the nonlinear component. For further details about this analysis or about the possible errors associated with the analysis, readers are referred to Kalluri and Shera (2001) and Shaffer et al., (2003).



Figure 3.  The IFFT and time-windowing analysis were applied to the fine structure data given in Figure 1.  After IFFT, two dominant peaks result in the time domain, the nonlinear component (blue) and the reflection component (light purple).  For the time domain data, axes are given at the top and to the right.  The dominant nonlinear and reflection component peaks were then isolated by time-windowing and converted back into the frequency domain by FFT.  The nonlinear component DP-gram is shown in red and the reflection component DP-gram is shown in green.  Axes for the frequency domain data appear at the bottom and to the left.  Note that while some amplitude variation remains in the component DP-grams, it does not show the periodicity of the original fine structure suggesting that it is not related to two-source interference.  Such residual amplitude variation may come from the unmixing of the sources during analysis (Kalluri & Shera, 2001) or may represent natural variation in the components across frequency.



The appeal of a “single-source” or  “component” DP-gram is that limiting the DP-gram to a single component removes the fine structure-related amplitude variation that arises from multiple components.  Therefore, normative data based on single-source DP-grams may exhibit less amplitude variance.  To the extent that fine structure is responsible for the tremendous variability noted in correlations of behavioral threshold to DPOAE level, correlations based on a component DP-gram might improve threshold predictions.  Toward this end, we correlated DPOAE level obtained using a suppression paradigm to behavioral thresholds.  Results suggest that correlation coefficients while statistically significant at some frequencies, in general, do not improve and are still quite variable when suppression is used to remove the reflection component.  A manuscript of these findings and how they can be interpreted in light of generation theory was recently submitted for review.




OAE generation theory suggests that DPOAEs arise from two different generation mechanisms and may involve multiple generation sources.  The interaction of multiple components leads to amplitude variation in the composite ear canal signal.  Clinical interpretation of OAE test results is complicated by the presence of this amplitude variation, which is known as fine structure.  Not only do multiple sources challenge the idea that the test frequency is assessing the sensitivity of a single cochlear location, but the amplitude variation resulting from multiple sources yields normative data with a large range of variance causing overlap of response distributions between normal and hearing impaired populations, and limiting the potential of correlational analyses to consistently predict hearing thresholds.  Methods that allow the isolation of DPOAE components may have some clinical utility.   Studies are needed to determine whether component DP-grams and component DP-I/O functions can be used to better predict behavioral thresholds.




Avan, P., Bonfils, P., Loth, D., & Wit, H.P. (1993). Temporal patterns of transient-evoked otoacoustic emissions in normal and impaired cochleae.  Hearing Research, 70, 109-120.


Boege, P., & Janssen, T. (2002).  Pure-tone threshold estimation from extrapolated distortion product emission I/O functions in normal and cochlear hearing loss ears. Journal of the Acoustical Society of America, 111, 1810-1818.


Brown, A.M., Harris, F.P., & Beveridge, H.A. (1996).  Two sources of acoustic distortion products from the human cochlea. Journal of the Acoustical Society of America, 100, 3260-3267.


Dhar, S., Talmadge, C.L., Long, G.R., & Tubis, A. (2002).  Multiple internal reflections in the cochlea and their effect on DPOAE fine structure.  Journal of the Acoustical Society of America, 112, 2883-2897.


Gaskill, S.A., & Brown, A.M. (1996). Suppression of human acoustic distortion product: dual origin of 2f1-f2. Journal of the Acoustical Society of America, 100, 3268-3274.


Gorga, M.P., Neely, S.T., Ohlrich, B., Hoover, B., Redner, J., & Peters, J. (1997). From the laboratory to clinic:  A large scale study of distortion product otoacoustic emissions in ears with normal hearing and ears with hearing loss.  Ear & Hearing, 18, 440-455.


Gorga, M., Neely, S., Dorn, P., & Hoover, B. (2003).  Further efforts to predict pure-tone thresholds from distortion product otoacoustic emission input/ouput functions.  Journal of the Acoustical Society of America, 113, 3275-3284.


He, N.H., & Schmiedt, R.A. (1993).  Fine structure of the 2f1-f2 acoustic distortion product:  Changes with primary level.  Journal of the Acoustical Society of America, 94, 2659-2669.


Heitmann, J., Waldmann, B., & Plinkert, P.K. (1996).  Limitations in the use of distortion product otoacoustic emissions in objective audiometry as the result of fine structure. European Archives of Otorhinolaryngology, 253, 167-171.


Heitmann, J., Waldmann, B., Schnitzler, H.P., Plinkert, P.K., & Zenner, H.P. (1998). Suppression of distortion product otoacoustic emissions (DPOAE) near 2f1-f2 removes DP-gram fine structure - Evidence for a second generator. Journal of the Acoustical Society of America, 103, 1527-1531.


Kalluri R., & Shera, C.A. (2001). Distortion-product source unmixing:  A test of the two-mechanism model for DPOAE generation. Journal of the Acoustical Society of America, 109, 622-637.


Kemp, D.T. (1986). Otoacoustic emissions, traveling waves and cochlear mechanisms. Hearing Research, 22, 95-104.


Kemp, D. T. (2002).  Exploring cochlear status with otoacoustic emissions:  The potential for new clinical applications, In M.S. Robinette and T.J. Glattke (Eds), Otoacoustic Emissions:  Clinical Applications, New York: Thieme.


Kemp, D.T., & Brown, A.M. (1983). An integrated view of the cochlear mechanical nonlinearities observable in the ear canal. In E. de Boer & M.A. Viergever  (Eds) Mechanics of Hearing (pp. 75-82).  The Hague, The Netherlands:  Martinus Nijhoff.


Kummer, P., Janssen, T., & Arnold, W. (1995). Suppression tuning characteristics of the 2f1-f2 distortion product otoacoustic emission in humans. Journal of the Acoustical Society of America, 98, 197-210.


Long, G.R., Talmadge, C.L., & Thorpe, C.A. (2001). Experimental measurement of level dependence of stimulus frequency otoacoustic emissions fine structure.  Association for Research in Otolaryngology:  Abstracts of the Twenty-fourth Midwinter Meeting, 46, 13.


Shaffer, Withnell, Dhar, Lilly, Harmon, & Goodman (2003). Sources and Mechanisms of DPOAE Generation:  Implications for the Prediction of Auditory Sensitivity Ear and Hearing, 24, 367-379.


Shera, C.A., & Guinan, J.J. Jr. (1999). Evoked otoacoustic emissions arise by two fundamentally different mechanisms:  A taxonomy for mammalian OAEs. Journal of the Acoustical Society of America, 105, 782-798.


Stover, L.J., Neely, S.T., & Gorga, M.P. (1996). Latency and multiple sources of distortion product emissions. Journal of the Acoustical Society of America, 99, 1016-1024.


Talmadge, C.L., Long, G.R., Tubis, A., & Dhar, S. (1999). Experimental confirmation of the two-source interference model for the fine structure of distortion product otoacoustic emissions. Journal of the Acoustical Society of America, 105, 275-292.


Talmadge, C.L., Tubis, A., Long, G.R., & Piskorski, P. (1998). Modeling otoacoustic and hearing threshold fine structure.Journal of the Acoustical Society of America, 104, 1517-1543.


Talmadge, C.L., Tubis, A., Piskorski, P., & Long, G.R. (1997).  Modeling otoacoustic emissions fine structure.  In E. Lewis, G. Long, R. Lyon, P. Narins and C. Steele (Eds), Diversity in Auditory Mechanics, Singapore:  World Scientific, pp. 462-471.


Talmadge, C.L., Tubis, A., Long, G.R., & Piskorski, P. (1996).  Evidence for the spatial origins, of the fine structure of distortion product otoacoustic emissions in humans, and its implications:  Experimental and modeling results, Abstracts of the Nineteeth Midwinter Meeting of the Association for Research in Otolaryngology, p. 94.


Waldmann B.,  Heitmann, J., & Plinkert, P. (1997).  Distorsionsproducte (sgDPOAE):  Entwicklung eines neuen prazisionsme systems.  Audiologische Akustic, 1, 22-31.


Withnell, R.H., Yates, G.K., & Kirk, D.L. (2000).  Changes to low-frequency components of the TEOAE following acoustic trauma to the base of the cochlea.  Hearing Research, 139, 1-12.


Yates, G.K., & Withnell, R.H. (1999). The role of intermodulation distortion in transient-evoked otoacoustic emissions, Hearing Research, 136, 49-64.


Zweig, G., &  Shera, C. (1995). The origins of periodicity in the spectrum of evoked otoacoustic emissions. Journal of the Acoustical Society of America, 98, 2018-2047.



1 The publication entitled “Sources and Mechanisms of DPOAE Generation:  Implications for the Prediction of Auditory Sensitivity” contains an error in publishing.  References to “2f2-f2” are in error and should read “2f1-f2”.

[2] Data used for illustration in this paper were collected with the approval of the Indiana University Bloomington Campus Committee


   About the Author  ( Lauren Shaffer Ph.D.)

Current Position:  Assistant Professor of Audiology, Ball State University
2000  Ph.D.  Hearing Science   Purdue University
1991  M.S.   Biology                Ball State University
1986  B.A.    Biology                Wilmington College