In this post, I'm going to explain why knowing the International Phonetic Alphabet is like seeing the matrix.

A common misconception that linguists often have to deal with, be it from students in Intro to Linguistics or from family members at holiday gatherings, is that Language (capital L) is basically written language. This has all kinds of ramifications, from people thinking that stylistic conventions for writing are somehow "rules" that languages follow, to thinking that people whose pronunciation differs significantly from what we think of as being "how it's spelled" are somehow dumb.

But linguists know a secret: writing is a secondary technology, and spoken language (as opposed to signed language) is all about sounds.

There are tons of different writing systems, and languages can be represented --- well or poorly --- by many different systems. For instance, Turkish was historically written using the Arabic abjad (basically an alphabet that only has consonants), but is now written with roman letters. Tajik is the same language as Farsi, but one's written in Cyrillic and one in a modified abjad. Hell, people get tattoos in English that are written in Tengwar (Tolkien's Elvish script). All these different scripts represent sounds in ways that are sometimes more and sometimes less faithful to the (abstract) sound units that a given language uses. For instance, standard varieties of English have two sounds that we totally suck at writing, so we just slap two letters together. For both of them. When you blow air between your tongue and front teeth, whether your vocal chords are moving or not, we just write it <th> in English and call it a day, even though <t> refers to one sound and <h> refers to another and this combination makes no damn sense.

Rather than wade into the ridiculous morass of writing systems every time we want to talk about a language, even if it's just to give one example before moving on to other data from other languages, linguists have developed basically the best tool ever: The International Phonetic Alphabet.

How is it different than other alphabets? First, it's based on all the different sounds that human mouths and vocal tracts can make, and second, it's got one (and only one) character for each sound.

The beauty of it is that it is independent of accents, and in fact, you can represent different accents clearly using the IPA. So every linguist's pet peeve is reading language learning books that say things like "i as in the i in kit." Well, that's great, except for different accents/dialects/varieties of English pronounce the word "kit" differently.

So how does it work? Let's take a brief tour of SOUNDS YOU MAKE WITH YOUR MOUTH AND STUFF:

Sounds You Make With Your Mouth (and Stuff)!

I'm going to simplify a lot of this discussion, so expect nitpicky comments from fellow linguists about what really a consonant is, but basically, you can divide the sound we make into two main classes, if you don't think too hard about it:

Consonants
Vowels

Consonants, for our purposes, are basically any sound where the flow of air out from your lungs is obstructed in some way. Vowels are sounds where air flows freely.

Notice: I did not say "vowels are [insert list of letters (and sometimes other letter!)]." I said "vowels are sounds where air flows freely."

The cool thing is that each of these two classes can be completely described (for our purposes, again), using 3 parameters.

For Consonants:

place
manner
voicing

Let's take these one at a time.

Place is the location of the obstruction of airflow. This can be closure at the lips, the tongue at the teeth, at the alveolar ridge, at the hard palate, the back of the tongue at the velum, etcetera.

Manner is the way airflow is obstructed. If it's completely blocked off, it's a "stop." If it's just partially blocked and creates a turbulent airflow, it's a "fricative" (think "friction").

Voicing refers whether your vocal chords are vibrating.

Armed with these three, we can start specifying sounds. For instance, the <t> in <stop> is a unvoiced coronal stop (it's not voiced, it's made with the "crown" of the tongue -- that is, the tip -- and airflow is completely stopped for a second. Technically, less than a second, but whatever).

Since we can characterize all the meaningful units of sound that a language uses in this way (#thatsThePoint), wouldn't it be nice if there were one and only one symbol for that sound? GOOD NEWS: THERE IS! And, since the IPA was made up by a bunch of Europeans, it's exactly what we'd expect: <t>. If it were voiced? <d>. If it was in the same place, unvoiced and a fricative? <s>. Voiced? <z>. Same manner and voicing (stop, unvoiced), but made at the lips? <p>.

What's really amazing about this, and will be the subject of a different post, is that the way languages work seems to be by reference not to spelling and stuff, but to these sub-classifications of sound, which we call distinctive features because (1) they're features that (2) distinguish sounds from one another. In fact, languages tend to change based on natural classes of these features. For instance, some sound change might affect all stops, but not fricatives.

The IPA chart for consonants can be found on Wikipedia here, and you can click on each symbol and hear the audio. In fact, each sound has its own wikipedia article.

Similarly, vowels can be classified along three parameters:

For Vowels:

Tongue Height
Tongue Backness
lip rounding

Height refers to how high the body of your tongue is in your mouth.

Backness refers to how far front or back the highest point of your tongue is (again, a simplification, but basically right).

Rounding refers to whether you round your lips or not.

Vowel chart, and schematic of tongue height for front vowels.

So for instance, linguists interested in dialects of American English may talk about whether a particular variety, say California English, is fronting /u/ to [ʉ] or even [y] and this is meaningful -- we're describing an accent, but doing so in a more precise way than if you were to say "it sounds like 'kyewl' when they say 'cool'!"

Wikipedia, again, has a super useful chart, where you can hear the sounds. It's the same as the above chart --- "front" vowels are on the left, "high" vowels are on the top, and they come in pairs with the unrounded form on the left and rounded on the right. Add a little tilda on the top of the vowel (literally it's just a little n above the vowel) and you have a nasalized vowel --- a vowel where your velum is dropped a bit and you allow airflow out through your nose. French has a ton of these, so what's written as <on> is pronounced /ɔ̃/. Without the little squiggly, it's /ɔ/, the vowel in a New York accent's "coffee" (/kɔfi/), as opposed to a Canadian or Californian's "coffee" (/kɑfi/).

So how many vowels does English have? Well, most accents have 20, not 5. And you can discuss them all using the IPA: bead = /bid/, bed = /bɛd/, bad = /bæd/, and so on. If I boo someone, I /bu/ them, but someone from California might /bʉ/ them, and we have a precise way of telling, and discussing, the difference.

Some helpful observations

The IPA was designed to be intuitive, and useful. So most of the symbols are exactly what you would expect. The vowels are a little more complicated, but think about what would make sense for Europeans: /i/ is the vowel in 'beat', and we have a special character for the sound in 'bid' (it's /bɪd/). Consonants are basically what you'd expect, except <y> is /j/ (like in German Ja!). That shitty combination in English of <ng> for a voiced velar nasal is just one symbol, an <n> with the tail of a <g>, called engma: /ŋ/.

Brackets and slashes: Linguists use slashes to indicate a more abstract level, and brackets to deal with the sounds that are actually made. So In English /t/ can have a bunch of different realizations in speech:

[t] in 'stop' [stɑp]
[tʰ] in 'tap' [tʰæp]
[ɾ] in American English 'butter' [bʌɾəɹ]
[ʔ] (a glottal stop, the sound in the middle of 'uh-oh!') in Cockney 'butter' [bʌʔə]

and so on. Most normal people are just not aware of these differences, at all (especially the first two, but put your hand directly in front of your mouth an see how different the airflow is!).

Notice, too, that you can clearly represent accents using this tool. 'butter' is [bʌɾəɹ] in General American, but [bʌʔə] in Cockney, and [bəˈtœʁ] in a bad French accent. With the slashes, we can just talk about things that happen to (abstract) segments (of sound) in a language without having to specify all the different little things that happen in specific environments, and with the brackets we can get as specific as we want to, so I might say 'specific' /spəsɪfɪk/ as [spəsiɪfɪˀ]. And once you know the IPA, you can tell from that HOW I SOUND WHEN I SAY IT.

YOU CAN FREAKING SPELL ACCENTS.

With the IPA, we don't have to resort to all kinds of weird ways of talking about things ("the vowel in a New York accent when they say 'coffee'" or "the ü sound in German, if you know German," or "That thing that some French people do when they say 'oui' in a nonstandard way.") We can just use the IPA and talk about place, manner, and voicing, or height, backness, and rounding (/ɔ/, /y/, and /ɕ/, for the previous long-winded and confusing examples).

While it takes a little (let's be real, very little) effort to learn the IPA, the payoff is immense for anyone who wants to learn another language, learn another accent, or understand any discussion of sounds that humans make in a clear and concise way. It's basically seeing the matrix.

-----

Have a question or comment? Share your thoughts below!