A different way of looking at /ay/

In October of last year, I gave a talk at New Ways of Analyzing Variation, with some of my dissertation findings. One of the things that was most interesting to the audience there was the question of what exactly is happening with /ay/ monophthongization in my data, and in African American English more broadly. This post gives a quick summary of a new way I devised to analyze that variation (see what I did there?), which I employ in my dissertation, and my findings.

What is /ay/ monophthongization?

As much as I have extolled the virtues of the International Phonetic Alphabet (IPA), I am actually using what’s often called Labovian notation here, and which Bill Labov calls binary notation, which represents vowel classes more abstractly. In this system, /ay/ represents the sound in words like night and time. In many varieties of English, this is a diphthong: two vowels pronounced together as a single vowel. In this case, it’s /a/ as in cot, and something between /i/ as in bit or /iy/ as in beat. However, some varieties of English simplify this to just /a/ as in cot. This reduction of two sounds to one is referred to as monophthongization (mono as in ‘one’).

For many white southerners, /ay/ is unconditionally monophthongized, meaning it happens in all words that have “that vowel” and for those speakers “that vowel” is now [ɑ] or something like it. So both night and time would have [ɑ] for those speakers (sounding like “Not Tom” in some accents). However, in the literature on African American English, it’s generally characterized as a conditioned shift, meaning this change only happens under certain conditions, and not others.

The general consensus on AAE is that it retains the diphthong, so /ay/ is pronounced [ɑ͡ɪ] before voiceless consonants, and it is monophthongized to something like [ɑ] elsewhere. So night has [ɑ͡ɪ], but time has [ɑ].

My data

For my dissertation, I gathered a sample of native AAE speakers reading a reading passage called “Junebug Goes to the Barber” to measure regional differences in AAE accents. For my final data analysis, I used 181 of these recordings. The reading passage has a decent number of words that contain /ay/ in both conditions.

The challenge

Historically, the way that some linguists evaluated /ay/ monophthongization was simply to listen, and to determine whether a given pronunciation sounded more like ah or ay. Later, linguists supplemented this with instrumental measurements, and would read a spectrogram, looking for a characteristic change in a particular measure called a formant. Neither of these is a particularly satisfying way of doing things, in part because it’s very time intensive, and in part because it always relies on a lot of researcher decision-making during the process.

Another challenge was that when I look at vowel plots for the trajectory of formant measures for my speakers, there’s a lot of different stuff happening. At NWAV I asked for some ideas of how to pursue this further, and after NWAV I had some good conversations with Mark Liberman who suggested the method I ultimately used.

The New Approach

Instead of using formants — higher energy peaks in spectrum that correspond pretty well to resonances in the mouth, affected by the position of the tongue — I chose to use Mel Frequency Cepstral Coefficients (MFCCs).

Just as the “spectrum” is a Fourier transformation of the observed sound wave, the “cepstrum” is essentially a Fourier transform of the spectrum — a spectrum of a spectrum, so to speak. Using a Mel transformation — a transformation of the audio that mimics human hearing — we can then gather 12 measurements instead of 2. MFCCs are often used in various speech to text and automatic speech recognition (ASR) contexts, but weirdly, not used much by linguists.

Because the difference between /a/ and /ay/ is where the vowel ends up, not where it starts, I measured the 12 MFCCs at roughly 80% of the duration of the vowel.

I took this measure for all instances of /iy/ as in beat in words like he, she, we, see, treat, meet, etc, and for all instances of /a/ as in bot in words like cot, ma, etc. I then performed dimension reduction using Principal Components Analysis, and reduced the 12 dimensional space to 3 dimensions. In these 3, there is a clear separation between /iy/ and /a/.

Then, I found the means of each of these clouds of vowel measurements, and drew a line between them (not really, but I did the code equivalent). I then scaled and transformed the data so that the mean of /iy/ was at the origin — that is, at (0,0,0) in my coordinate system — and the mean of /a/ was at 1 on the x axis. I now have a single measure of how /iy/ like or /a/ like something is, if I just project it down to that line.

Lastly, I projected all the /ay/s down to that line. For each speaker, I then took a mean /ay/ before C (any voiceless consonant), and /ay/ before 0 (another vowel, a voiced segment, or the end of a word).

The findings

Broadly, my findings correspond to what was expected. /ay/ in words like time is generally monophthongized, and /ay/ before voiceless consonants in words like night is generally not. However, this pattern was not universal. In fact, for some speakers, there was a fair amount of monophthongization before voiceless consonants. This was new to me.

Monophthongization by speaker and word class. Red are ayV — that’s /ay/ before voiced consonants, vowels, or word finally, and Blue are ayC — that’s /ay/ before voiceless consonants.

Monophthongization by speaker and word class. Red are ayV — that’s /ay/ before voiced consonants, vowels, or word finally, and Blue are ayC — that’s /ay/ before voiceless consonants.

Monophthongization by speaker and word class, matched by speaker. Red are ayV — that’s /ay/ before voiced consonants, vowels, or word finally, and Blue are ayC — that’s /ay/ before voiceless consonants.

Monophthongization by speaker and word class, matched by speaker. Red are ayV — that’s /ay/ before voiced consonants, vowels, or word finally, and Blue are ayC — that’s /ay/ before voiceless consonants.

Interestingly, there is a regional pattern to this — it happens more in the North, especially in Michigan, than it does elsewhere. It’s not happening in the South, where white folks have unconditional monophthongization, but rather, in the North, where monophthongization is rare to begin with.

aae_ayC_mono_gi_dots.jpeg

Dennis Prestion suggested this may be a result of redlining and other policies of segregation, suggesting that (white) Appalachian English speakers who had moved to the North pursuing factory work were from white Northerners, and lived in close proximity to black folks who had moved up during the Great Migration. He pointed me toward Bridget Anderson’s 2002 paper on dialect leveling in Detroit.

What this means, ultimately, is that my methods were solid enough, and my sample good enough, that I picked up on a regionally constrained sociolinguistic phenomenon that had been previously described, but which I did not know about, and couldn’t have made up if I tried.

-----

©Taylor Jones 2020

Have a question or comment? Share your thoughts below!

/