Cryptography: Solution to Cipher 1

Recall that our original text is:

Cryptogram 1

Cryptogram 1

In solving such a message, there are several things we start by examining. First and foremost is the frequency of each letter in the message. In total, there are 292 letters in the message. Their frequency is as follows:

Φ - 43, 14.7%
Π - 33, 11.3%
Ψ - 33, 11.3%
Η - 27, 9.2%
Θ - 25, 8.6%
Ο - 25, 8.6%

Μ - 16, 5.5%
Κ - 15, 5.1%
Τ - 12, 4.1%
Γ - 11, 3.8%
Χ - 8, 2.7%
Δ - 7, 2.4%

Σ - 7, 2.4%
Υ - 7, 2.4%
Α - 6, 2.1%
Ι - 5, 1.7%
Λ - 4, 1.4%
Ω - 4, 1.4%

Ε - 2, 0.7%
Ξ - 1, 0.3%
Ρ - 1, 0.3%
Β - 0, 0.0%
Ζ - 0, 0.0%
Ν - 0, 0.0%

Another point to note is which letters end words. Although 21 of the 24 letters appear in our cipher, only nine occur as the final letter of a word. The list below shows these words, with their frequency:

Γ - 6
Δ - 3
Η - 6

Ι - 1
Μ - 11
Π - 6

Τ - 3
Φ - 9
Ψ - 12

Finally, we look at short words. There is only one single-letter word in our message: Δ.

There are seven two-letter words:

ΗΦ
ΘΓ(x2)

ΘΔ
ΙΠ(x2)

ΛΔ
ΟΤ (x2)

ΠΜ (x3)

Armed with this data, we get to work. We start with the frequency table for Biblical Greek. Based on the UBS text of Matthew, the frequency of the various letters is:


Α -- 11.0%
Ε -- 10.1%
Ο -- 10.1%
Ι -- 9.5%
Ν -- 8.2%
Σ -- 7.6%

Τ -- 7.4%
Υ -- 6.0%
Η -- 3.9%
Ρ -- 3.3%
Κ -- 3.3%
Ω -- 3.2%

Π -- 3.1%
Λ -- 2.8%
Μ -- 2.6%
Δ -- 2.0%
Θ -- 1.7%
Γ -- 1.6%

Β -- 0.6%
Χ -- 0.6%
Φ -- 0.6%
Ξ -- 0.3%
Ζ -- 0.2%
Ψ -- 0.1%

There are several ways to start our attack. One is to work directly with the above table of letter frequencies. This is usually the best approach in English or German, where the letter "E" predominates so much that it will be the most common message in almost any monoalphabetic cipher where the sample exceeds 100 letters. But it will be obvious that this is not true in Greek. Α Ε Ο Ι are all almost tied, with Ν Σ Τ close enough that one of them might be more common in a short message than one of the big four.

An easier line of attack, in this case, is the last letters. Two letters -- Ν Σ -- are overwhelmingly the most common terminal letters for Greek words. And we note that there are two letters which are overwhelmingly the most common in our terminal letters list: Μ Ψ. Thus it is highly likely that one of these represents Ν and the other Σ.

(This approach, incidentally, has had great use in Biblical linguistics, in the deciphering of Ugaritic. Hans Bauer's attack on the Ugaritic alphabet started with the assumption that it was a Semitic language, and that this regulated which letters were to be found at the beginnings and ends of words. That let him make up a short list of possible meanings for several letters, which could be tested -- whereupon many more fell into his lap. Several rounds of this broke the Ugaritic alphabet, and then it was a matter of figuring out the language -- no easy task, to be sure, but a lot easier than it was when the alphabet was unknown!)

Having a guess at two letters, we look at our two-letter words. The most common is ΠΜ. Among the most common words in Biblical Greek is ΕΝ And Ε is a common letter in Greek, and Π is the second-most-common letter in our sample. So it's a pretty good bet that ΠΜ is ΕΝ. Which, incidentally, gives us another likely word: ΔΕ is another common word, and we see ΙΠ occurs twice in our two-letter-word list.

But if Μ stands for Ν, then the other letter common at the end of words, Ψ, must be Σ. So let's assume
Π → Ε
Μ → Ν
Ι → Δ
Ψ → Σ

That makes our message appear as this (note: we will show "solved" letters in UPPER CASE, unsolved in lower):

ηΔΕκωοφ λδ χηφΔφη υφΝΕΣεΕ θηφΣ ωσΕΣφΝ ηκκη θδ αηαφη ΝδχφηξΕθΕ θηφΣ ΔΕ ωσΕΣφΝ θΕκΕφοφ υφΝΕΣεΕ ΕΝ θγ Νολγ υΕυσηχθηφ οθφ ΕΝ ΕθΕσουκγΣΣοφΣ αηφ ΕΝ ρΕφκΕΣφΝ ΕθΕσγΝ κηκδΣγ θγ κηγ θοτθγ αηφ οτΔ οτθγΣ ΕφΣηαοτΣοΝθηφ λοτ κΕυΕφ ατσφοΣ γΣθΕ ηφ υκγΣΣηφ ΕφΣ ΣδλΕφοΝ ΕφΣφΝ οτ θοφΣ χφΣθΕτοτΣφΝ ηκκη θοφΣ ηχφΣθοφΣ δ ΔΕ χσοωδθΕφη οτ θοφΣ ηχφΣθοφΣ ηκκη θοφΣ χφΣθΕτοτΣφΝ

At this point, things start to get a little more tricky. But note that interesting word ωσΕΣφΝ which occurs twice in the first line or two. That looks very much like a verb. Can we, then, do something with that letter φ? We know it has to be a vowel. It isn't ε, because we've assigned that. It could be η or ω. But observe that φ is the most common letter in our frequency list. Η and ω aren't common enough. Υ is unlikely in such a situation. Our choice is between α, ι, and ο.

But note that nine words end with φ, and that two of them are αηφ -- a rare letter followed by two common letters. Note also that there are only seven three-letter words in the whole message, and that two of them are αηφ. The obvious conclusion? αηφ represents ΚΑΙ -- the most amazing thing being that it shows up only twice in the message!

That gives us three more letters, and the following version of the message:

ΑΔΕκωοΙ λδ χΑΙΔΙΑ υΙΝΕΣεΕ θΑΙΣ ωσΕΣΙΝ ΑκκΑ θδ ΚΑΚΙΑ ΝδχΙΑξΕθΕ θΑΙΣ ΔΕ ωσΕΣΙΝ θΕκΕΙοΙ υΙΝΕΣεΕ ΕΝ θγ Νολγ υΕυσΑχθΑΙ οθΙ ΕΝ ΕθΕσουκγΣΣοΙΣ ΚΑΙ ΕΝ ρΕΙκΕΣΙΝ ΕθΕσγΝ κΑκδΣγ θγ κΑγ θοτθγ ΚΑΙ οτΔ οτθγΣ ΕΙΣΑΚοτΣοΝθΑΙ λοτ κΕυΕΙ ΚτσΙοΣ γΣθΕ ΑΙ υκγΣΣΑΙ ΕΙΣ ΣδλΕΙοΝ ΕΙΣΙΝ οτ θοΙΣ χΙΣθΕτοτΣΙΝ ΑκκΑ θοΙΣ ΑχΙΣθοΙΣ δ ΔΕ χσοωδθΕΙΑ οτ θοΙΣ ΑχΙΣθοΙΣ ΑκκΑ θοΙΣ χΙΣθΕτοτΣΙΝ

We don't have much in the way of complete words yet, but the form of what we're seeing looks good. This looks like Greek. That's promising.

From here we have several ways we could proceed. For example, look at that word ΑκκΑ, which occurs three times. There aren't many words which fit this pattern. You could argue that it's ΑΒΒΑ or ΑΝΝΑ -- but what are the odds of those words three times in a short message that's worth encrypting? A much better bet is ΑΛΛΑ.

And then look at all those words like θοΙΣ and θΑΙΣ. This is a strong indication that θ is τ -- and hence that ο in fact represents itself. (Note that having an occasional letter represent itself does not represent a weakness in the cipher; in fact, not allowing a letter to represent itself is a weakness, because it reduces the number of possible ciphers). If we make those changes, we have:

ΑΔΕΛωΟΙ λδ χΑΙΔΙΑ υΙΝΕΣεΕ ΤΑΙΣ ωσΕΣΙΝ ΑΛΛΑ Τδ ΚΑΚΙΑ ΝδχΙΑξΕΤΕ ΤΑΙΣ ΔΕ ωσΕΣΙΝ ΤΕΛΕΙΟΙ υΙΝΕΣεΕ ΕΝ Τγ ΝΟλγ υΕυσΑχΤΑΙ ΟΤΙ ΕΝ ΕΤΕσΟυΛγΣΣΟΙΣ ΚΑΙ ΕΝ ρΕΙΛΕΣΙΝ ΕΤΕσγΝ ΛΑΛδΣγ Τγ ΛΑγ ΤΟτΤγ ΚΑΙ ΟτΔ ΟτΤγΣ ΕΙΣΑΚΟτΣΟΝΤΑΙ λΟτ ΛΕυΕΙ ΚτσΙΟΣ γΣΤΕ ΑΙ υΛγΣΣΑΙ ΕΙΣ ΣδλΕΙΟΝ ΕΙΣΙΝ Οτ ΤΟΙΣ χΙΣΤΕτΟτΣΙΝ ΑΛΛΑ ΤΟΙΣ ΑχΙΣΤΟΙΣ δ ΔΕ χσΟωδΤΕΙΑ Οτ ΤΟΙΣ ΑχΙΣΤΟΙΣ ΑΛΛΑ ΤΟΙΣ χΙΣΤΕτΟτΣΙΝ

We're really almost there. We have quite a few unidentified letters -- but almost three-quarters of the message text is cracked, and we can easily figure out most of the remaining letters from context. For example, the first word is obviously αδελφοι, so ω represents φ. Also, it's quite clear that the combination Τδ is τη. That, by elimination, means that Τγ is τω. Consider, too, the phrase ΟτΔ Οτ. Clearly τ stands for υ.

At this point we have:

ΑΔΕΛΦΟΙ λΗ χΑΙΔΙΑ υΙΝΕΣεΕ ΤΑΙΣ ΦσΕΣΙΝ ΑΛΛΑ ΤΗ ΚΑΚΙΑ ΝΗχΙΑξΕΤΕ ΤΑΙΣ ΔΕ ΦσΕΣΙΝ ΤΕΛΕΙΟΙ υΙΝΕΣεΕ ΕΝ ΤΩ ΝΟλΩ υΕυσΑχΤΑΙ ΟΤΙ ΕΝ ΕΤΕσΟυΛΩΣΣΟΙΣ ΚΑΙ ΕΝ ρΕΙΛΕΣΙΝ ΕΤΕσΩΝ ΛΑΛΗΣΩ ΤΩ ΛΑΩ ΤΟΥΤΩ ΚΑΙ ΟΥΔ ΟΥΤΩΣ ΕΙΣΑΚΟΥΣΟΝΤΑΙ λΟΥ ΛΕυΕΙ ΚΥσΙΟΣ ΩΣΤΕ ΑΙ υΛΩΣΣΑΙ ΕΙΣ ΣΗλΕΙΟΝ ΕΙΣΙΝ ΟΥ ΤΟΙΣ χΙΣΤΕΥΟΥΣΙΝ ΑΛΛΑ ΤΟΙΣ ΑχΙΣΤΟΙΣ Η ΔΕ χσΟΦΗΤΕΙΑ ΟΥ ΤΟΙΣ ΑχΙΣΤΟΙΣ ΑΛΛΑ ΤΟΙΣ χΙΣΤΕΥΟΥΣΙΝ

From the second word it would appear that λ is μ, and checking the remaining words seems to confirm this. Again, it seems clear that σ is ρ and χ is π. Making those changes, we get:

ΑΔΕΛΦΟΙ ΜΗ ΠΑΙΔΙΑ υΙΝΕΣεΕ ΤΑΙΣ ΦΡΕΣΙΝ ΑΛΛΑ ΤΗ ΚΑΚΙΑ ΝΗΠΙΑξΕΤΕ ΤΑΙΣ ΔΕ ΦΡΕΣΙΝ ΤΕΛΕΙΟΙ υΙΝΕΣεΕ ΕΝ ΤΩ ΝΟΜΩ υΕυΡΑΠΤΑΙ ΟΤΙ ΕΝ ΕΤΕΡΟυΛΩΣΣΟΙΣ ΚΑΙ ΕΝ ρΕΙΛΕΣΙΝ ΕΤΕΡΩΝ ΛΑΛΗΣΩ ΤΩ ΛΑΩ ΤΟΥΤΩ ΚΑΙ ΟΥΔ ΟΥΤΩΣ ΕΙΣΑΚΟΥΣΟΝΤΑΙ ΜΟΥ ΛΕυΕΙ ΚΥΡΙΟΣ ΩΣΤΕ ΑΙ υΛΩΣΣΑΙ ΕΙΣ ΣΗΜΕΙΟΝ ΕΙΣΙΝ ΟΥ ΤΟΙΣ ΠΙΣΤΕΥΟΥΣΙΝ ΑΛΛΑ ΤΟΙΣ ΑΠΙΣΤΟΙΣ Η ΔΕ ΠΡΟΦΗΤΕΙΑ ΟΥ ΤΟΙΣ ΑΠΙΣΤΟΙΣ ΑΛΛΑ ΤΟΙΣ ΠΙΣΤΕΥΟΥΣΙΝ

At this point I'm not even going to bother any more. You should be able to figure out the rest for yourself. The solution is from 1 Corinthians 14:20-22:

Solution to Cryptogram 1

Thus the complete complete key is (note: I've included the three letters not found in the above sample of text, since I didn't know I wouldn't be using them):

Cryptogram 1 Key

As a footnote: It always seems, when I read one of these examples, that the person solving the cryptogram cheats, knowing the solution in advance. This can happen; had I not known the answer, for instance, I might have tried assuming that φ, the most common letter in the ciphertext, represented α, the most common letter in most passages. But this sample was short enough that this was not the desirable way to go about things. Of course, you could have tried it and run into roadblocks and started over; that often happens. But this illustrates an important principle. It's best to attack, as we did here, from all angles: counting letters, counting last letters of words, counting short words. That led me to a shorter solution without errors.

That's if you know where the words end. Since classical works were generally written without word divisions, they are usually encrypted the same way. If you want another challenge, you may try this:

Cryptogram 2

ΝΖΓΟΨΓΘΕΥΖΦΟΞΟΓΤΝΘΡΤΟΣΘΕΑΝΡΧΝΦΖΞΧΔΨΡΝΤΧΝΖΧΝΡΤΟΦΔ
ΔΧΘΩΟΨΘΟΖΧΝΡΞΑΨΝΤΡΦΝΡΝΧΡΩΥΨΥΞΡΞΟΩΝΦΘΨΝΖΞΧΥΡΞΨΟΔΕ
ΓΘΓΝΤΘΤΟΒΨΡΞΝΖΧΝΩΕΔΨΥΦΟΧΥΤΘΓΤΥΤΘΡΞΘΕΓΔΜΟΡΝΖΧΥΞΩΟ
ΞΡΞΨΟΔΕΞΥΓΔΞΘΧΟΡ

(Please note: the word/line breaks here are purely arbitrary, to make this fit on your screen. They should be ignored for frequency analysis.)

I won't walk you through the solution to this one. But if you want to see it, it's here.

Finally, here is one that offers the full range of difficulty known to the ancients. This is a monoalphabetic substitution, but it is not a simple cipher. Rather, it is a nomenclator (from Latin nomen calator, "name caller") which is a cipher with code elements. A proper nomenclator will eliminate certain common words by replacing them with symbols (e.g. in English, the word "the" might be replaced by % or some other token), will probably include two forms of common letters (e.g. in English, "E" might be replaced by either G or !), will eliminate letters often found together (e.g. English "SH" might become $), may include nulls (that is, characters simply to be eliminated), e.g. # might simply stand for "ignore me") and may include modifiers (e.g. 2 might stand for "repeat previous letter" or "repeat following letter").

Cryptogram 3

For this, because it's much harder, we'll give you several samples. Some are Biblical, some are not; two are from classical authors, although the Greek will be understandable to those who know only koine. You'll notice something of a theme to these quotations. If you can solve them, there is at least some chance it applies to you.

All quotes are reasonably grammatically complete.

Σ=$ΑΥΒΛΩΑΙ*ΔΛΜΥΑΨΜΥΩΞΑΖΥ~ΨΑΥ%ΜΗ+ΥΠΝΞΑΖ**ΜΗΩΠΓΖΛ~ΝΩΗ ΩΔΨΞΑΖΥΨ~ΑΥ%Μ\ΓΔΜΛΨΝΞ*ΑΖΜΛ~ΚΛΜΛ~

ΒΛΤΑΥΜΥ~%Ε%ΧΑΥ+ΠΟΑ*ΙΔΩ\ΚΑΝΓΖΛΩΑΥΩ

ΧΑΟΔΒΠ*ΥΒΖΥ~Ε%Σ=ΟΟ**ΛΩΗ+ΛΨΚΑ=ΠΚΔΩΑΥΩ%Υ

^ΚΜΗΨΗΚΠΚΘ=ΔΩ%ΩΒΖΔΩΦ=ΛΩΑΥ*ΒΗ^ΒΑΦΗΟΔ~ΗΨΜ%Ψ\Β\ΨΠΝΜΔΩ

ΞΖΛΜΑΖ%ΞΠΩΜΔΩΑΚΜΥ~ΜΠΥ+Υ%$ΨΝΩΑ~ΥΨΓΖΛΩΗΨΑΔ~ΑΤΠΥΔΩΛΨ

ΑΥΒΑΜΥΨΝΕΔΩΟΑΥΞΑΜ%Υ+ΥΠ~ΠΥΜΑΥΜΔΞΠΖΠΜ\ΒΥΒΛΩΜΛΨΧΑ\Ξ%~Υ Ω%ΞΟΔΨ$^ΛΩΑΥΒΥΦΛΩΜΣ=Λ~$ΒΛΧ**ΗΨΑΜΠΥ%ΝΜΔΠΥΜΑΥΜΔΒΑΑΩΞΥ ~ΜΑΥ^ΒΑΩΒΥ%ΚΖΥΩΛΕ*ΑΩΛΨΛΙΠΖΒΥΓ=%ΚΖΥΩΛΕ*ΑΩΛ~ΑΛΥΚΑΩΚΟΝ ΒΔΩΥΧ%ΟΠΨ~ΗΨΠΩΑΕΥΦΛΕΑΩΔ$ΖΥΞΥΦΛΕΑΩΔ

ΛΙ%ΖΜΛΥΞΖΛ*^ΧΑΝΨΗΞΠΩΜ%ΑΞΥΜΖΛΞΑΝ\~%ΜΠΧΗΩΜΜ%ΞΖΛΩΡ=ΛΥ%Ξ ΩΑΝΣ=ΕΠΑΩΧ*ΑΖΕΛΩΔΨΞΑΖΛΖΙ%ΩΛΩΝΞΛΡ%ΟΟ\~ΠΜΗΓΝΨΑΥΠΞΠ~ΥΕ ΑΜΓ=ΑΒΔΚΑΩΠΨΔΕΜ\ΟΛΙ\ΕΑΜΑ~ΘΑΒΑΑΚ**%ΨΜΛΩ\ΞΑΖΗΒΝΩΠΜΛ

ΗΒΑ~ΞΓΥ%ΞΛ**ΧΑΩΑΝΖΑΧΗΞΛΥΛΨΒΑΜΛΞΛΨΑ~ΜΥ**ΩΜΗΨΑΞΥΨΜΗ^~ \ΚΛΥΒΑΩΡΖΛΜΛΨΛΒΑΩΠ*ΝΜΗ~\ΒΑ^ΑΝΖΑΧΗΑΩ%ΩΧΖΔΞΛΥ~Λ=

Because this is so difficult, I'll give you a series of hints if you choose to take them. To help those who don't want to cheat (you don't get hints in a real cryptogram!), I've put them in white type. You can copy them and paste them into another program to read them, or simply drag across them; they should show up in inverse type. The goal, of course, is to use as few hints as possible.

HINT 1: ==All letters stand for letters; the only special symbols are the symbols =, *, %, etc.==

HINT 2: ==The only whole word to be replaced by a symbol is και==

HINT 3: ==There are three other combinations of letters replaced by a single symbol, but these may be replaced within words. The three combinations are ου με σοφ==

HINT 4: ==Two letters have been split in two (i.e. are represented by two different symbols): Α Σ==

HINT 5: ==The two remaining special symbols are NULL (i.e. simply omit from the plaintext) and DELETE PREVIOUS (i.e. this character and the character before it should be omitted). To prevent confusion, the latter symbol is not allowed to be doubled.==

HINT 6: ==The only New Testament passage is James 1:5-6.==

HINT 7: ==The passage from James is the sixth message.==

HINT 8: ==There are four passages from LXX: Ecclesiastes 2:13, Proverbs 3:31, Sirach 1:4, Job 28:12-13, in that order -- but of course there are some other quotes intervening.==

HINT 9: ==The symbol used for Ε is Α.==

HINT 10: ==The first letters of the eight actual plaintexts are Κ Δ Θ Μ Π Ε Ο Η ==

The whole answer to this cryptogram is here.