The Writer in Black

The Writer in Black

Tuesday, March 18, 2014

Creating synthetic languages--my way

I've done some work with synthetic languages for use in my fiction.

J. R. R. Tolkien made that popular with his various varieties of Elvish, Dwarvish, Black Speech, and other languages in his Middle Earth stories.  Other writers hint at synthetic languages with select words and phrases, sometimes including a glossary in their stories showing what those words and phrases actually mean.  A few actually build something approximating a complete language, with grammar and vocabulary.

It's this latter that I'm going to talk about here, using examples from some of my own writing (forthcoming works).

Languages, at least spoken languages, are made up of sounds.  This may seem trivial, but the choice of what sounds are, and are not, used within a language determines to a great extent how the language sounds.  Consider two of the languages constructed by Tolkien.

First Quenya:
"Ai laurie lantar lassi surinen"

Now the Black Speech:
"Ash nazg gimbuktul"

The two languages have a very different sound to them.  Real languages are the same in that way.  German sounds quite different from Latin which sounds quite different from Thai, which sounds quite different from Japanese.

So I start with the sounds and how they are put together.

In the appendixes for The Silmarillion, there's a table for Elvish script that ties in to how the sounds are produced.  Generally speaking consonant sounds were categorized by where they were produced in the mouth and how open that portion of the mouth is.  Consonants were produced either at the lips, the teeth, the hard palate, or the soft palate.  There were four levels of "openness" and then voiced or unvoiced (using the vocal chords or not).  This led to a total of 32 separate sounds that could be written in Elvish script.

There are other characterizations possible for the sounds the human throat and mouth can produce but I found this one useful and adapted it to my own ends.

Vowels were a separate matter.  May years ago I took choir in high school.  Our instructor spent a great deal of time on vowels because vowels are what you sing.  They're what you use to produce notes.  He characterized vowels by whether they are produced in the front or back of the mouth and how open that portion is.  This process produced nine basic vowel sounds, plus some "blended" versions (mixing production in front and back of the mouth) which are normally written as "umlauted" vowels. Note that many of what we think of as vowels would be, in this system, diphthongs.  The long-i sound would be "ah" followed rapidly by "ee" and so forth.

Most languages do not use all the sounds that can be produced.  English doesn't use umlauts nor several of the consonant values except in borrowed words. (But then, English borrows, or outright steals, so many words that's not so much of a limit.)

So, the first step is to select sounds from the list.  Which ones does your language use, and which ones does it not use.  This choice helps determine the overall sound of your language.

As important as which sounds are included is how frequently the specific sounds are used.  Look at the two Tolkien examples again.  Largely the same sounds and if you looked at larger passages of Quenya you'd see that all the sounds in the passages of Black Speech are present.  What differs is the frequency.  In Quenya, "l", "r", "n" and so on feature heavily.  In Black speech, "full stops" (g, t) and fricative/sibilant sounds (sh, z) are prominent.  This gives the two languages a very different sound.  Quenya has a very "flowing" sound.  Black Speech is harsher on the ear.

So pick what sounds are more common.  Rank them.  This will help determine what your language sounds like.

From sounds you have words.  Words are made of syllables.  Syllables are collections of the sounds you've selected for that language.  Different languages build their syllables differently.  Japanese is very strict consonant-vowel (or simply vowel) and sometimes with an "n" at the end of the syllable.  Russian, on the other hand, tends to string consonant sounds together in ways that frustrate native English speakers and cause native Japanese speakers to figuratively throw up their hands in despair.

So how does your language build syllables?  As an example for "Old Aeriochi" I decided that syllables would use the following pattern:

Where "C" is "Consonant", "V" is "Vowel", and brackets indicate an optional item.  Using the sounds I'd selected, and all the possible combinations of consonants and vowels in that pattern, I had all the possible syllables in the language.

Problems arise, of course when you try to combine consonants.  Not all combinations work well.  So I needed some rules for combining consonants in order to keep the words pronounceable.  For Old Aerioch the rules were as follows:

  1. When a syllable starts with two consonants, the first consonant must be a full stop (p, b, k, or g in this language).  The second consonant must be an "approximate" (w, r, l, or y).
  2. When a trailing consonant in one syllable is followed by a leading syllable in a second, the second syllable's leading consonant moves to the mouth position of the first and if it's a stop it opens to a fricative (p becomes f, and so on).
This gives me the tools now to build words.  As an aid, I did some programming in Excel to create a list of all the possible syllables, assign a numerical value to them based on the frequency I'd assigned for the sounds they contain, and then sort them so I have a large table with the syllables using more frequent sounds at the top and the less frequent at the bottom.  When building common words, I can pick from the top of the list, less common words from farther down.  And so my vocabulary reflects the linguistic sound I determined in setting up the sound system.

It makes more than just words to make a language.  It also takes grammar and syntax.  And this is where a lot of people who create synthetic languages for use in fiction fall down.  Here, having studied formal grammar is quite helpful.  When I was in the Air Force, before being assigned to foreign language school the Air Force put me through an intense, six week, English grammar course.  The purpose of the course was not to ensure we used "good grammar".  That wasn't how the course was structured.  It was so that when we encountered "dative case" or "subjunctive mood" or "perfect tense" in the _foreign_ language class, we'd know what it meant.

For Old Aeriochi, I used a fairly common pattern:  inflected language with loose word order. Tense and mood were determined by suffixes on the verb.  Roll that nouns play in a sentence (declension) was determined by suffixes on the noun.  This is a good system to use for synthetic languages because it's easy to define.  It's also easy to expand.  For instance English has "Imperative" for verbs.  In Old Aeriochi" there are "strong imperative" and "weak imperative". (The difference is illustrated in that the weak imperative is used in a spell to help someone sleep despite pain from injuries while the strong imperative is used to send an opponent into slumberland in the middle of a fight and make sure they stay that way despite all that's going on around them.)

For the Oruk language I used in another forthcoming piece, I used an abbreviated version of the above process.  I didn't need as much language (Old Aeriochi was written for a series where I anticipate using it quite a bit, the Oruk language was written for a single short although I may re-use it at a different time.) I simply went "by ear" in selecting sounds and creating syllables and words.  For syntax I made it a word order language.  The word order is defined as follows:

[Interrogative particle] Verb [verb modifiers][te direct object[object modifiers]][subject [subject modifiers]]

That's a very simple pattern but then I don't need much for this story.  The interrogative particle is a short "word" that indicates that the sentence is a question.  "Te" indicates that the noun that follows is a direct object.  If I need more for future stories (no reason I can't use the same language elsewhere), I can expand on it.

And that's the basics.  It's a pretty involved process, but even simple languages are pretty complex.  To make the language more real you should have multiple types of verbs and nouns that follow related but slightly different rules, and a few irregulars that follow rules all their own.

So, do you need a language for your stories?  Now you have at least a basic primer on one way to create one.


  1. It also helps to keep Zipf's Law and its corollaries in mind. "Wandor's Ride," by Roland Green, was a violator in one of its early scenes.

    The protagonist, Bertan Wandor, had given indications that he may unknowingly be related to the royal class of an underdog race, the Sthi. An old crone gives him drugs to bring out hidden knowledge, then asks the question "Do you understand me?" in "the kingly tongue of the Sthi." However, the first word in the sentence is something like, "vryulokom," and the remainder of the sentence consists of words equally long and unwieldy.

    The two problems with the sentence are that common concepts in a language tend to get shorter words, and the people's name for themselves seems to come from a different language. I suppose I can let him get away with it (magnanimous of me, isn't it?) because I don't remember the language being used again in the series, which unfortunately ended with what looked like the next-to-last novel.

    Speaking of names, their etymology is also important when developing a society, and should normally bear some relationship to the spoken language.

    If you haven't read it, "Loglan: A Logical Language" by James Cooke Brown contains his ideas on various language-related subjects, which includes the various consonant-vowel combination schemes that he considered and how to deal with building abbreviated compound words.

    I haven't seen your word order used. I know Esperanto uses suffixes to indicate parts of speech, so word order isn't as important, and Japanese and Hungarian use Subject Object Verb order, but I'm unaware of any that use Verb Object Subject.

  2. The reference to Zipf's Law is well taken. I hadn't formulated it, but it was what I tended to do instinctively. After all, if the word for a common concept is long, people will start to abbreviate it, then the abbreviation will become the word in its own right. We have the progression as self-propelled motor vehicles became more common, from automobile carriage to automobile to car (or "auto" sometimes although that may have been a regional thing where I was growing up or it may have just been superseded by "car". Don't hear it these days.) Same thing with "semi-tractor-trailor" to just "semi".

    I specifically gave my Oruk language that word order precisely because it was different from any natural language I have encountered. In Russian, word order is flexible. You could, I suppose, start a sentence with a verb but I've never heard anyone actually do that in Russian.

    In Japanese, the sentence ends with a verb, possibly followed by any of several particles. Japanese is complicated. You've got grammatical subject (indicated by the partical "ga") and "topic" (indicated by "ha", pronounced "wa"). And, of course, Japanese is very big on simply not leaving out words that would be understood from context, much more so than English. And then you've got the convention of using a verb, with its subject and object, as a modifier for a noun. There's a reason why two years of study usually gives one a good conversational grasp of most romance languages while two years of Japanese is barely beginning.

    And, of course, there's Klingon, which is object, verb, subject. (Note, I don't know Klingon but it's one of several synthetic languages I researched when I started looking at doing my own.)

  3. Yep, shortening certainly happens. I once read that the two major forces on language development are the cradle and the market.

    I remember "auto" as short for automobile. My favorite example of shortening is "taximeter-cabriolet," which went through "taxicab" to both "taxi" and "cab."

    Another thing to think about is the Sapir-Whorf Hypothesis, which was first formulated in the 1920s, IIRC. Probably not useful for constructed languages in novels, unless you're directly using it as a major plot device, as in Vance's "The Languages of Pao." The strong version of the hypothesis can be expressed succinctly as "language limits thought." Of course, it can't limit it totally, or there'd be little development of new concepts to express. It was ignored for a number of years because Noam Chomsky had a competing theory of "deep structure in the brain," and he and his supporters in the field had sufficient influence to suppress Sapir-Whorf somewhat. I understand that that's why Loglan was a personal effort, rather than funded research.

    Recently, though, they've found some intriguing evidence in support of the hypothesis. In particular, there's a tribe of Australian aborigines that order time differently. When someone asks me to position several things in chronological order, I tend to go from left to right, possibly because that's how I write language. This is common for people who speak European languages.

    The people of this tribe, however, place things from East to West for chronological order - possibly because that's the path the sun takes - and appear to have a built-in compass because of it.

    I took German in high school (a waste), and four years of Japanese starting when I was 40. The "official" reason I gave was "to prevent brain ossification." I chose it because I figured it would stretch my mind more than the other easily-accessible lessons (French - Alliance Francaise is fairly prominent locally - and Spanish, which would be rather more useful given the number of illegals and others who speak it hereabouts).

    I have one or two editions of the Klingon Language Dictionary paperback, but I've never studied it. I've got bits and pieces of reference material for a number of other languages (Swedish, Hebrew, Latin, Esperanto, Vietnamese, and Hawaiian among them), but don't even look at them much. I am trying to keep up enough Japanese to get the gist of anime and manga, but there are so many other shiny things around to attract my attention.