Contents: Introduction *Accuracy and Precision *Ancient Mathematics *Assuming the Solution *Average: see under Mean, Median, and Mode *Binomials and the Binomial Distribution *Checksums: see under Ancient Mathematics *Cladistics *Corollary *Definitions *Dimensional Analysis *[The Law of the] Excluded Middle *Exponential Growth *Fallacy *Game Theory *The Golden Ratio (The Golden Mean) *Curve Fitting, Least Squares, and Correlation *Mean, Median, and Mode *Necessary and Sufficient Conditions: see under Rigour *p-Hacking *Probability *Arithmetic, Exponential, and Geometric Progressions *Rigour, Rigorous Methods *Sampling and Profiles *Saturation *Significant Digits *Standard Deviation and Variance *Statistical and Absolute Processes *Statistical Significance *Tree Theory *Utility Theory: see under Game Theory *

Appendix: Assessments of Mathematical Treatments of Textual Criticism


Mathematics -- most particularly statistics -- is frequently used in text-critical treatises. Unfortunately, most textual critics have little or no training in advanced or formal mathematics. This series of short items tries to give examples of how mathematics can be correctly applied to textual criticism, with "real world" examples to show how and why things work.

What follows is not enough to teach, say, probability theory. It might, however, save some errors -- such as an error that seems to be increasingly common, that of observing individual manuscripts and assuming text-types have the same behavior (e.g. manuscripts tend to lose parts of the text due to haplographic error. Does it follow that text-types do so? It does not. We can observe this in a mathematical text, that of Euclid's Elements. Almost all of our manuscripts of this are in fact of Theon's recension, which on the evidence is fuller than the original. If manuscripts are never compared or revised, then yes, texts will always get shorter over time. But we know that they are revised, and there may be other processes at work. The ability to generalize must be proved; it cannot be assumed).

The appendix at the end assesses several examples of "mathematics" perpetrated on the text-critical world by scholars who, sadly, were permitted to publish without being reviewed by a competent mathematician (or even by a half-competent like me. It will tell you how bad the math is that I, who have only a bachelor's degree in math and haven't used most of that for fifteen years, can instantly see the extreme and obvious defects).

There are several places in this document in which I frankly shout at textual critics (e.g. under Definitions and Fallacy). These are instances where errors are particularly blatant and common. I can only urge textual critics to heed these warnings.

One section -- that on Ancient Mathematics -- is separate: It is concerned not with mathematical technique but with the notation and abilities of ancient mathematicians. This can be important to textual criticism, because it reminds us of what errors they could make with numerals, and what calculations they could make.

Accuracy and Precision

"Accuracy" and "Precision" are terms which are often treated as synonymous. They are not.

Precision is a measure of how much information you are offering. Accuracy is a more complicated term, but if it is used at all, it is to measure of how close an approximation is to an ideal.

(I have to add a caution here: Richards Fields heads a standards-writing committee concerned with such terms, and he tells me they deprecate the use of"accuracy." Their feeling is that it blurs the boundary between the two measures above. Unfortunately, their preferred substitute is "bias" -- a term which has a precise mathematical meaning, referring to the difference between a sample and what you would get if you tested a whole population. But "bias" in common usage is usually taken to be deliberate distortion. I can only advise that you choose your terminology carefully. What I call "accuracy" here is in fact a measure of sample bias. But that probably isn't a term that it's wise to use in a TC context. I'll talk of "accuracy" below, to avoid the automatic reaction to the term "bias," but I mean "bias." In any case, the key is to understand the difference between a bunch of decimal places and having the right answer.)

To give an example, take the number we call "pi" -- the ratio of the circumference of a circle to its diameter. The actual value of π is known to be 3.141593....

Suppose someone writes that π is roughly equal to 3.14. This is an accurate number (the first three digits of π are indeed 3.14), but it is not overly precise. Suppose another person writes that the value of π is 3.32456789. This is a precise number -- it has eight decimal digits -- but it is very inaccurate (it's wrong by more than five per cent).

When taking a measurement (e.g. the rate of agreement between two manuscripts), one should be as accurate as possible and as precise as the data warrants.

As a good rule of thumb, you can add an additional significant digit each time you multiply your number of data points by ten. That is, if you have ten data points, you only have precision enough for one digit; if you have a hundred data points, your analysis may offer two digits.

Example: Suppose you compare two manuscripts (which could be compared at thousands of potential points of variation) at eleven points of variation, and they agree in six of them. 6 divided by 11 is precisely 0.5454545..., or 54.5454...%. However, with only eleven data points, you are only allowed one significant digit. So the rate of agreement here, to one significant digit, is 50% (at least, that's your estimated rate of agreement over the larger sample set for which they could be compared).

Now let's say you took a slightly better sample of 110 data points, and the two manuscripts agree in sixty of them. Their percentage agreement is still 54.5454...%, but now you are allowed two significant digits, and so can write your results as 55% (54.5% rounds to 55%).

If you could increase your sample to 1100 data points, you could increase the precision of your results to three digits, and say that the agreement is 54.5%.

Chances are that no comparison of manuscripts will ever allow you more than three significant digits. When Goodspeed gave the Syrian element in the Newberry Gospels as 42.758962%, Frederick Wisse cleverly and accurately remarked, "The six decimals tell us, of course, more about Goodspeed than about the MS." (Frederick Wisse, The Profile Method for Classifying and Evaluating Manuscript Evidence, (Studies and Documents 44, 1982), page 23.)

Ancient Mathematics

Modern mathematics is essentially universal (or at least planet-wide): Every serious mathematician uses Arabic numerals, and the same basic notations such as + - * / ( ) ° > < ∫. This was by no means true in ancient times; each nation had its own mathematics, which did not translate at all. (If you want to see what I mean, try reading a copy of The Sand Reckoner by Archimedes sometime.) Understanding these differences can sometimes have an effect on how we understand ancient texts.

There is evidence that the earliest peoples had only two "numbers" -- one and two, which we might think of as "odd" and "even" -- though most primitive peoples could count to at least four: "one, two, two-and-one, two-and-two, many." This theory is supported not just by the primitive peoples who still used such systems into the twentieth century but even, implicitly, by the structure of language. Greek is one of many Indo-European languages with singular, dual, and plural numbers (though of course the dual was nearly dead by New Testament times); enough of the Germanic languages of the first century C.E. also had a dual form to make it clear that there was such a thing in proto-Germanic. Certain Oceanic languages actually have five number cases: Singular, dual, triple, quadruple, and plural. In what follows, observe how many number systems use dots or lines for the numbers 1-4, then some other symbol for 5. Indeed, we still do this today in hashmark tallies: count one, two three, four, then strike through the lot for 5: I II III IIII IIII.

But while such curiosities still survive in out-of-the-way environments, or for quick tallies, every society we are interested in had evolved much stronger counting methods. We see evidence of a money economy as early as Genesis 23 (Abraham's purchase of the burial cave), and such an economy requires a proper counting system. Even Indo-European seems to have had proper counting numbers, something like oino, dwo, treyes, kwetores, penkwe, seks, septm, okta, newn, dekm, most of which surely sound familiar. In Sanskrit, probably the closest attested language to proto-Indo-European, this becomes eka, dvau, trayas, catvaras, panca, sat, sapta, astau, nava, dasa, and we also have a system for higher numbers -- e.g. eleven is one-ten, eka-dasa; twelve is dva-dasa, and so forth; there are also words for 20, 30, 40, 50, 60, 70, 80, 90, 100, and compounds for 200, etc. (100 is satam, so 200 is dvisata, 300 trisata, etc.) Since there is also a name for 1000 (sahasra), Sanskrit actually has provisions for numbers up to a million (e.g. 200,000 is dvi-sata-sahasra). This may be post-Indo-European (since the larger numbers don't resemble Greek or German names for the same numbers), but clearly counting is very old.

You've probably encountered Roman Numerals at some time:
1 = I
2 = II
3 = III
4 = IIII (in recent times, sometimes IV, but this is modern)
5 = V
6 = VI
7 = VII
8 = VIII
9 = VIIII (now sometimes IX)
10 = X
11 = XI
15 = XV
20 = XX
25 = XXVRoman Numerals

etc. This is one of those primitive counting systems, with a change from one form to another at 5. Like so many things Roman (e.g. their calendar), this is incredibly and absurdly complex. This may help to explain why Roman numerals went through so much evolution over the years; the first three symbols (I, V, and X) seem to have been in use from the very beginning, but the higher symbols took centuries to standardize -- they were by no means entirely fixed in the New Testament period. The table at right shows some of the phases of the evolution of the numbers. Some, not all.

In the graphic showing the variant forms, the evolution seems to have been fairly straightforward in the case of the smaller symbols -- that is, if you see ⩛ instead of L for 50, you can be pretty certain that the document is old. The same is not true for the symbols for 1000; the evolution toward form like a Greek Φ, in Georges Ifrah's view, was fairly direct, but from there we see all sorts of variant forms emerging -- and others have proposed other histories. I didn't even try to trace the evolution of the various forms. The table in Ifrah shows a tree with three major and half a dozen minor branches, and even so appears to omit some forms. The variant symbols for 1000 in particular were still in widespread use in the first century C. E.; we still find the form ❨|❩ in use in the ruins of Pompeii, e.g., and there are even printed books which use this notation. The use of the symbol M for 1000 has not, to my knowledge, been traced back before the first century B.C.E. It has also been theorized that, contrary to Ifrah's proposed evolutionary scheme, the notation D for 500 is in fact derived from the ❨|❩ notation for 1000 -- as 500 is half of 1000, so D=|❩ is half of ❨|❩. The ❨|❩ notation also lent itself to expansion; one might write ❨❨|❩❩ for 10000, e.g., and hence ❨❨❨|❩❩❩ for 100000. Which in turn implies |❩❩ for 5000, etc.

What's more, there were often various ways to represent a number. An obvious example is the number 9, which can be written as VIIII or as IX. For higher numbers, though, it gets worse. In school, they probably taught you to write 19 as XIX. But in manuscripts it could also be written IXX (and similarly 29 could be IXXX), or as XVIIII. The results aren't really ambiguous, but they certainly aren't helpful!

Fortunately, while paleographers and critics of Latin texts sometimes have to deal with this, we don't have to worry too much about the actual calculations it represents. Roman mathematics didn't really even exist; they left no texts at all on theoretical math, and very few on applied math, and those very low-grade. (Their best work was by Boethius, long after New Testament times, and even it was nothing more than a rehash of works like Euclid's with all the rigour and general rules left out. The poverty of useful material is shown by the nature of the books available in the nations of the post-Roman world. There is, for example, only one pre-Conquest English work with any mathematical content: Byrhtferth's Enchiridion. Apart from a little bit of geometry given in an astronomical context, its most advanced element is a multiplication table expressed as a sort of mnemonic poem.) No one whose native language was not Latin would ever use Roman mathematics if an alternative were available; the system had literally no redeeming qualities. In any case, as New Testament scholars, we are interested mostly in Greek mathematics, though we should glance at Babylonian and Egyptian and Hebrew maths also. (We'll ignore, e.g., Chinese mathematics, since it can hardly have influenced the Bible in any way. Greek math was obviously relevant to the New Testament, and Hebrew math -- which in turn was influenced by Egyptian and Babylonian -- may have influenced the thinking of the NT authors.) The above is mostly by way of preface: It indicates something about how numbers and numeric notations evolved.numerals

The Greek system of numerals, as used in New Testament and early Byzantine times, was at least more compact than the Roman, though it (like all ancient systems) lacked the zero and so was not really suitable for advanced computation. The 24 letters of the alphabet were all used as numerals, as were three obsolete letters, bringing the total to 27. This allowed the representation of all numbers less than 1000 using a maximum of three symbols, as shown at right:

Thus 155, for instance, would be written as ρνε, 23 would be κγ, 999 would be ϡϘΘ, etc.

And, to take a famous example, in Rev. 13:18, 666, as in P47 A 046 051 1611 2329 2377 would be χξς (or χξϝ or χξϹ); the variant 616 of P115 C Irenaeuspt is χις (in P115, ΧΙϹ); 2344's reading 665 is χξε.

Numbers over 1000 could also be expressed, simply by application of a divider. So the number 875875 would become ωοη,ωοη'. Note that this allowed the Greeks to express numbers actually larger than the largest "named" number in the language, the myriad (ten thousand). (Some deny this; they say the system only allowed four digits, up to 9999. This may have been initially true, but both Archimedes and Apollonius were eventually forced to extend the system -- in different and conflicting ways. In practice, it probably didn't come up very often.)

Of course, this was a relatively recent invention. The Mycenaean Greeksin the Linear B tablets had used a completely different system:| for digits, - for tens, o for hundreds,* for thousands.So, for instance, the number we would now express as 2185 wouldhave been expressed in Pylos as**o====|||||. Butthis system, like all things about Linear B, seems to have beencompletely forgotten by classical times.

A second system, known as the "Herodian," or "Attic," wasstill remembered in New Testament times, though rarely if ever used. Itwas similar to Roman numerals in that it used symbols for particular numbersrepeatedly -- in this system, we had
I = 1
Δ = 10
H = 100
X = 1000
M = 10000

(the letters being derived from the first words of the names of the numbers).

However, like Roman numerals, the Greeks added a trick to simplify, or at least compress, the symbols. To the above five symbols, theyadded Π for five -- but it could be five of anything -- five ones, five tens, five hundreds, with a subscripted figure showing which it was. In addition, in practice, the number was often written as Γ rather than Π to allow numbers to be fitted under it. So, e.g., 28,641 would be written as


In that context, it's perhaps worth noting that the Greek verb for "to count" is πεμπω, related to πεντε, five. The use of a system such as this was almost built into the language. But its sheer inconvenience obviously helped assure the later success of the Ionian system, which -- to the best of my knowledge -- is the one followed in all New Testament manuscripts which employ numerals at all.

The whole situation became so complicated that texts were actually written to facilitate conversions between systems -- e.g. at St. Gall there is a manuscript (#459) which offering a Latin/Greek conversion table, part of which is transcribed here:

• nia

Where the top line is the Latin word for the number (note the use of informal, vernacular names such as "ogda" rather than "okta"), the second the Roman numeral (observe VIIII rather than IX), the pronunciation of the Greek letter (often with hints of later pronunciations, e.g. "mi" and "ni" for mu and nu), and then the Greek letter itself.

And it should be remembered that these numerals were very widely used. Pick up a New Testament today and look up the "Number of the Beast" in Rev. 13:18 and you will probably find the number spelled out (so, e.g., in Merk, NA27, and even Westcott and Hort; Bover and Hodges and Farstad uses the numerals). It's not so in the manuscripts; most of them use the numerals (and numerals are even more likely to appear in the margins, e.g. for the Eusebian tables). This should be kept in mind when assessing variant readings. Since, e.g., σ and ο can be confused in most scripts, one should be alert to scribes confusing these numerals even when they would be unlikely to confuse the names of the numbers they represent. O. A. W. Dilke in Greek and Roman Maps (a book devoted as much to measurement as to actual maps) notes, for instance, that "the numbers preserved in their manuscripts tend to be very corrupt" (p. 43). Numbers aren't words; they are easily corrupted -- and, because they have little redundancy, if a scribe makes a copying error, a later scribe probably can't correct it. It's just possible that this might account for the variant 70/72 in Luke 10:1, for instance, though it would take an unusual hand to produce a confusion in that case.

There is at least one variant where a confusion involving numerals is nearly a certainty -- Acts 27:37. Simply reading the UBS text here, which spells out the numbers, is flatly deceptive. One should look at the numerals. The common text here is

εν τω πλοιω σος

In B, however, supported by the Sahidic Coptic (which of course uses itsown number system), we have


Which would become

εν τω πλοιω ως ος

This latter reading is widely rejected. I personally think it deserves respect. The difference, of course, is only a single omega, added or deleted. But I think dropping it, which produces a smoother reading, is more likely. Also, while much ink has been spilled justifying the possibility of a ship with 276 people aboard (citing Josephus, e.g., to the effect that the ship that took him to Rome had 600 people in it -- a statement hard to credit given the size of all known Roman-era wrecks), possible is not likely.

We should note some other implications of this system -- particularly for gematria (the finding of mathematical equivalents to a text). Observe that there are three numerals -- those for 6, 90, and 900 -- which will never be used in a text (unless one counts a terminal sigma, and that doesn't really count). Other letters, while they do occur, are quite rare (see the section on Cryptography for details), meaning that the numbers corresponding to them are also rare. The distribution of available numbers means that any numeric sum is possible, at least if one allows impossible spellings, but some will be much less likely than others. This might be something someone should study, though there is no evidence that it actually affects anything.

Gematria had other uses than mysticism. It could be used to validate a text. This method is what is now known as a checksum: If you are sent a transmission (in this case, perhaps, a message), and the checksum for the message is correct, then you know the message was accurately received.

Suppose, for instance, that someone is supposed to travel north or south at a crossroads, depending on a message he has been given. So the message would read either βορεας for north or νοτος for south. Going back to the table above, we find that has these values:




So if a message shows up reading

βορεας τοη = North 378

Then we know the instruction has been correctly transmitted. If it were to show up as, say,

νοτος χϙ = North 690

then the checksum doesn't match the message; we have a problem.

This sort of checksum was used to validate some ancient documents, although in a way that reminds me a bit of medieval locks (which were too simple as locks to really trouble a lockpick, but often so ornately covered with flaps and panels that it was hard to figure out just what to pick): Anyone could theoretically figure out the authentication, but it was a huge amount of work that probably wasn't worth doing. Notker Balbalus explained the method which he claimed was derived from the Council of Nicea. The trick was to write take the first letter of the name of the writer, the second letter of the name of the recipient, the third letter of the bearer, the fourth letter of the city in which it was written, and the number of the current indiction. To this was added 561. Only it wasn't explained as 561; it was explained as the first letter of Father (Π=80), Son (Υ=400), Holy Spirit (Α=1), Peter (Π=80). I don't know if Notker was actually stupid enough to believe that 80+400+1+80 sometimes added up to something other than 561, but he sure acted like it. As a second validation, one was supposed to put 99, the total for ΑΜΗΝ, somewhere in the letter.

Anyway, let's work one of these things out. Suppose I, Robert (Ροβεαρτ), writing from Berea, Kentucky (βεροια) want to send a letter to Andrea (ανδρεα). Thomas (Θωμας) is to carry it. I send it in the year 2019 (the year as I write this). 2019 is indiction 12. So the checksum of my letter is given by:

 100 = Ρ, Ρ[οβεαρτ]+ 50 = Ν, [α]Ν[δρεα]+ 40 = Μ, [Θω]Μ[ας]+ 70 = Ο, [Βερ]Ο[ια]+ 12 for the indiction+561 for the sake of being a mathematical nitwit=833=πλγ

So this letter would have been authenticated by inclusion of the checksum πλγ.

Apparently the fact that this was utterly obvious, once you knew the trick, and that anything could be authenticated this way, never occurred to anyone in the church. The only actual security that this provided was that it was a lot of work. Which, in fact, meant that the system did not work well, because many of the validations were miscalculated. But this sort of checksum will be found in copies of some old church letters.

Of course, Greek mathematics was not confined simply to arithmetic. Indeed, Greek mathematics must be credited with first injecting the concept of rigour into mathematics -- for all intents and purposes, turning arithmetic into math. This is most obvious in geometry, where they formalized the concept of the proof.

According to Greek legend, which can no longer be verified, it was the famous Thales of Miletus who gave some of the first proofs, showing such things as the fact that a circle is bisected by a diameter (i. e. there is a line -- in fact, an infinite number of lines -- passing through the center of the circle which divides the area of a circle into equal halves), that the base angles of an isoceles triangle (the ones next to the side which is not the same length as the other two) are equal, that the vertical angles between two intersecting lines (that is, either of the two angles not next to each other) are equal, and that two triangles are congruent if they have two equal angles and one equal side. We have no direct evidence of the proof by Thales -- everything we have of his work is at about third hand -- but he was certainly held up as an example by later mathematicians.

The progress in the field was summed up by Euclid (fourth/third century), whose Elements of geometry remains fairly definitive for plane geometry even today.

Euclid also produced the (surprisingly easy) proof that the number of primes is infinite -- giving, incidentally, a nice example of a proof by contradiction, a method developed by the Greeks: Suppose there is a largest prime (call it p). So take all the primes: 2, 3, 5, 7, ... p. Multiply all of these together and add one. This number, since it is one more than a multiple of all the primes, cannot be divisible by any of them. It is therefore either prime itself or a multiple of a prime larger than p. So p cannot be the largest prime, which is a contradiction.

A similar proof shows that the square root of 2 is irrational -- that is, it cannot be expressed as the ratio of any two whole numbers. The trick is to express the square root of two as a ratio and reduce the ratio p/q to simplest form, so that p and q have no common factors. So, since p/q is the square root of two, then (p/q)2 = 2. So p2=2q2. Since 2q2 is even, it follows that p2 is even. Which in turn means that p is even. So p2 must be divisible by 4. So 2q2 must be divisible by 4, so q2 must be divisible by 2. And, since we know the square root of 2 is not a whole number, that means that q must be divisible by 2. Which means that p and q have a common factor of 2. This contradiction proves that there is no p/q which represents the square root of two.

This is one of those crucial discoveries. The Egyptians, as we shall see, barely understood fractions. The Babylonians did understand them, but had no theory of fractions. They could not step from the rational numbers (fractions) to the irrational numbers (endless non-repeating decimals). The Greeks, with results such as the above, not only invented mathematical logic -- crucial to much that followed, including statistical analysis such as many textual critics used -- but also, in effect, the whole system of real numbers.

The fact that the square root of two was irrational had been known as early as the time of Pythagoras, but the Pythagoreans hated the fact and tried to hide it. Euclid put it squarely in the open. (Pythagoras, who lived in the sixth century, of course, did a better service to math in introducing the Pythagorean Theorem. This was not solely his discovery -- several other peoples had right triangle rules -- but Pythagoras deserves credit for proving it analytically.)

Relatively little of Euclid's work was actually original; he derived most of it from earlier mathematicians, though often the exact source is uncertain (Boyer, in going over the history of this period, seems to spend about a quarter of his space discussing how particular discoveries are attributed to one person but perhaps ought to be credited to someone else; I've made no attempt to reproduce all these cautions and credits). That does not negate the importance of his work. Euclid gathered it, and organized it, and so allowed all that work to be maintained. In another sense, he did even more than that. The earlier work had been haphazard. Euclid turned it into a system. This is crucial -- equivalent, say, to the change which took place in biology when species were classified into genuses and families and such. Before Euclid, mathematics, like biology before Linnaeus, was essentially descriptive. But Euclid made it a unity. To do so, he set forth ten postulates, and took everything from there.

Let's emphasize that. Euclid set forth ten postulates (properly, five axioms and five postulates, but this is a difference that makes no difference). Euclid, and those on whom he relied, set forth what they knew, and defined their rules. This is the fundamental basis to all later mathematics -- and is something textual critics still haven't figured out! (Quick -- define the Alexandrian text!)

Euclid in fact hadn't figured everything out; he made some assumptions he didn't realize he was making. Also, since his time, it has proved possible to dispense with certain of his postulates, so geometry has been generalized. But, in the realm where his postulates (stated and unstated) apply, Euclid remains entirely correct. The Elements is still occasionally used in math classes today. And the whole idea of postulates and working from them is essential in mathematics. I can't say it often enough: this was the single most important discovery in the history of math, because it defines rigour. Euclid's system, even though the individual results preceded him, made most future maths possible.

The sufficiency of Euclid's work is shown by the extent to which it eliminated all that came before. There is only one Greek mathematical work which survives from the period before Euclid, and it is at once small and very specialized -- and survived because it was included in a sort of anthology of later works. It's not a surprise, of course, that individual works have perished (much of the work of Archimedes, e.g., has vanished, and much of what has survived is known only from a single tenth-century palimpsest, which obviously is both hard to interpret and far removed from the original). But all of the pre-Euclidean writings? Clearly Euclid was considered sufficient.

And for a very long time. The first printed edition of Euclid came out in 1482, and it is estimated that over a thousand editions have been published since; it has been claimed that it is the most-published book of all time other than the Bible.

Not that the Greeks stopped working once Euclid published his work. Apollonius, who did most of the key work on conic sections, came later, as did Eratosthenes, perhaps best remembered now for accurately measuring the circumference of the earth but also noteworthy for inventing the "sieve" that bears his name for finding prime numbers. And the greatest Greek mathematician was no more than a baby when Euclid wrote the Elements. Archimedes -- surely the greatest scientific and mathematical genius prior to Newton, and possibly even Newton's equal had he had the data and tools available to the latter -- was scientist, engineer, the inventor of mathematical physics, and a genius mathematician. In the latter area, several of his accomplishments stand out. One is his work on large numbers in The Sand Reckoner, in which he set out to determine the maximum number of sand grains the universe might possibly hold. To do this, he had to invent what amounted to exponential notation. He also, in so doing, produced the notion of an infinitely extensible number system. The notion of infinity was known to the Greeks, but had been the subject of rather unfruitful debate. Archimedes gave them many of the tools they needed to address some of the problems -- though few later scholars made use of the advances.inscribed and circumscribed polygons

Archimedes also managed, in one step, to create one of the tools that would turn into the calculus (though he didn't know it) and to calculate an extremely accurate value for π, the ratio of the circumference of a circle to its diameter. The Greeks were unable to find an exact way to calculate the value -- they did not know that π is irrational; this was not known with certainty until Lambert proved it in 1761. The only way the Greeks could prove a number irrational was by finding the equivalent of an algebraic equation to which it was a solution. They couldn't find such an equation for π, for the good and simple reason that there is no such equation. This point -- that π is what we now call a transcendental number -- was finally proved by Ferdinand Lindemann in 1882.

Archimedes didn't know that π is irrational, but he did know he didn't know how to calculate it. He had no choice but to seek an approximation. He did this by the beautifully straightforward method of inscribed and circumscribed polygons. The diagram at right shows how this works: The circumference of the circle is clearly greater than the circumference of the square inscribed inside it, and less than the square circumscribed around it. If we assume the circle has a radius of 1 (i.e. a diameter of 2), then the perimeter of the inner square can be shown to be 4 times the square root of two, or about 5.66. The perimeter of the outer square (whose sides are the same length as the diameter of the circle) is 8. Thus the circumference of the circle, which is equal to 2π,is somewhere between 5.66 and 8. (And, in fact, 2π is about 6.283, so Archimedes is right). But now notice the second figure, in which an octagon has been inscribed and circumscribed around the circle. It is obvious that the inner octagon is closer to the circle than the inner square, so its perimeter will be closer to the circumference of the circle while still remaining less. And the outer octagon is closer to the circle while still remaining greater.

If we repeat this procedure, inscribing and circumscribing polygons with more and more faces, we come closer and closer to "trapping" the value of π. Archimedes, despite having only the weak Greek mathematical notation at his disposal, managed to trap the value of π as somewhere between 223/71 (3.14085) and 220/70 (3.14386). The first of these values is about .024% low of the actual value of π; the latter is about .04% high; the median of the two is accurate to within .008%. That is an error too small to be detected by any measurement device known in Archimedes's time; there aren't many outside an advanced science lab that could detect it today.

Which is nice enough. But there is also a principle there. Archimedes couldn't demonstrate it, because he hadn't the numbering system to do it -- but his principle was to add more and more sides to the inscribing and circumscribing polygons (sometimes called a method of exhaustion). Suppose he had taken infinitely many sides? In that case, the inscribing and circumscribing polygons would have merged with the circle, and he would have had the exact value of π. Archimedes also did something similar to prove the area of circles. (In effect, the same proof.) This is the principle of the limit, and it is the basis on which the calculus is defined. It is sometimes said that Archimedes would have invented the calculus had he had Arabic numerals. This statement is too strong. But he might well have created a tool which could have led to the calculus.

An interesting aspect of Greek mathematics was their search for solutions even to problems with no possible use. A famous example is the attempt to "square the circle" -- starting from a circle, to construct a square with the same area using nothing but straight edge and compass. This problem goes back all the way to Anaxagoras, who died in 428 B.C.E. The Greeks never found an answer to that one -- it is in fact impossible using the tools they allowed themselves -- but the key point is that they were trying for general and theoretical rather than practical and specific solutions. That's the key to a true mathematics.

In summary, Greek mathematics was astoundingly flexible, capable of handling nearly any engineering problem found in the ancient world. The lack of Arabic numbers made it difficult to use that knowledge (odd as it sounds, it was easier back then to do a proof than to simply add up two numbers in the one million range). But the basis was there.

To be sure, there was a dark -- or at least a goofy -- side to Greek mathematics. Plato actually thought mathematics more meaningful than data -- in the Republic, 7.530B-C, he more or less said that, where astronomical observations and mathematics disagreed, too bad for the facts. Play that game long enough, and you'll start distorting the math as well as the facts....

The goofiness is perhaps best illustrated by some of the uses to which mathematics was put. The Pythagoreans were famous for their silliness (e.g. their refusal to eat beans), but many of their nutty ideas were quasi-mathematical. An example of this is their belief that 10 was a very good and fortunate number because it was equal to 1+2+3+4. Different Greek schools had different numerological beliefs, and even good mathematicians could fall into the trap; Ptolemy, whose Almagest was a summary of much of the best of Greek math, also produced the Tetrabiblos of mystical claptrap. The good news is, relatively few of the nonsense works have survived, and as best I can tell, none of the various superstitions influenced the NT writers. The Babylonians also did this sort of thing -- they in fact kept it all secret, concealing some of their knowledge with cryptography, and we at least hear of this sort of mystic knowledge in the New Testament, with Matthew's mention of (Babylonian) Magi -- but all he seems to have cared was that they had secret knowledge, not what that knowledge was.

At least the Greeks had the sense to separate rigourous from silly, which many other peoples did not. Maybe they were just frustrated with the difficulty of achieving results. The above description repeatedly laments the lack of Arabic numbers -- i.e. with positional notation and a zero. This isn't justa matter of notational difficulty; without a zerx, you can't have the integers, nor negative numbers, let alone the real and complex numbers that let you solve all algebraic equations. Arabic numbers are the mathematical equivalent of an alphabet, only even more essential. The advantage they offer is shown by an example we gave above: The determination of π by means of inscribed and circumscribed polygons. Archimedes could manage only about three decimal places even though he was a genius. François Viète (1540-1603) and Ludolph van Ceulen (1540-1610) were not geniuses, but they managed to calculateπ to ten and 35 decimal places, respectively, using the method of Archimedes -- and they could do it because they had Arabic numbers.

The other major defect of Greek mathematics was that the geometry wasnot analytic. They could draw squares, for instance -- but they couldn'tgraph them; they didn't have cartesian coordinates or anything likethat. Indeed, without a zero, they couldn't draw graphs; there was no wayto have a number line or a meeting point of two axes. This may soundtrivial -- but modern geometry is almost all analytic; it's much easierto derive results using non-geometric tools. It has been argued that thereal reason Greek mathematics stalled in the Roman era was not lack ofbrains but lack of scope: There wasn't much else you could do just withpure geometric tools.

The lack of a zero (and hence of a number line) wasn't justa problem for the Greeks. We must always remember a key fact about earlymathematics: there was no universal notation; every people had to re-invent thewhole discipline. Hence, e.g., though Archimedes calculated thevalue of π tobetter than three decimal places, we find 1 Kings 7:23, in itsdescription of the bronze sea, rounding off the dimensions to theratio 30:10. (Of course, the sea was built and the account writtenbefore Archimedes. More to the point, both measurements could beaccurate to the single significant digit they represent withoutit implying a wrong value for π -- if,e.g., the diametre were 9.7 cubits, the circumference would bejust under 30.5 cubits. It's also worth noting that the Hebrewsat this time were probably influenced by Egyptian mathematics --and the Egyptians did not have any notion of number theory, andso, except in problems involving whole numbers or simple fractions,could not distinguish between exact and approximate answers.)

Still, Hebrew mathematics was quite primitive. There reallywasn't much there apart from the use of the letters to representnumbers. I sometimes wonder if the numerical detail found inthe so-called "P" source of the Pentateuch doesn'tsomehow derive from the compilers' pride in the fact that theycould actually count that high!

Much of what the Hebrews did know may well have been derived fromthe Babylonians, who had probably the best mathematics other thanthe Greek; indeed, in areas other than geometry, the Babylonianswere probably stronger. And they started earlier; we find advancedmathematical texts as early as 1600 B.C.E., with someof the basics going all the way back to the Sumerians, who seem tohave been largely responsible for the complex 10-and-60 notation usedin Babylon. How much of this survived to the time of the Chaldeansand the Babylonian Captivity is an open question;Ifrah says the Babylonians convertedtheir mathematics to a simpler form around 1500 B.C.E.,but Neugebauer, generallythe more authoritative source, states that their old forms were stillin use as late as Seleucid times. Trying to combine the data leads meto guess the Chaldeans had a simpler form, but that the older, bettermaths were retained in some out-of-the-way places.

It is often stated that the Babylonians used Base 60. This statementis somewhat deceptive. The Babylonians used a mixed base, partly 10 andpartly 60. The chart below, showing the cuneiform symbols they used forvarious numbers, may make this clearer.

Babylonian Numbers

If your browser fully supports unicode, you can perhaps cut and paste these versions:


This mixed system is important, because base 60 is too large to be a comfortablebase -- a multiplication table, for instance, has 3600 entries, compared to 100entries in Base 10. The mixed notation allowed for relatively simple additionand multiplication tables -- but also for simple representation of fractions.

For very large numbers, they had still another system -- a partialpositional notation, based on using a space to separate digits. So, forinstance, if they wrote |  ||  ||| (note the spaces between the wedges),that would mean one times 60 squared (i.e. 3600) plus two times 60 plusthree, or 3723. This style is equivalent to our 123 = one times ten squaredplus two times ten plus three. The problem with this notation (here we goagain) is that ithad no zero; if they wrote IIII  II, say, there was no way to tell if thismeant 14402 (4x602+0x60+2) or 242 (4x60+2). And there was no way,in this notation, to represent 14520 (4x602+2x60+0). (TheBabylonians did eventually -- perhaps in or shortly before Seleucid times --invent a placeholder to separate the two parts, though it wasn't a true zero;they didn't have a number to represent what you got when you subtracted,e.g., nine minus nine.)

On the other hand, Babylonian notation did allow representation of fractions, atleast as long as they had no zero elements: Instead of using positive powersof 60 (602=3600, 601=60, etc.), they could use negativepowers -- 60-1=1/60, 60-2=1/3600, etc. So they could represent, say,1/15 (=4/60) as ||||, or 1/40 (=1/60 + 30/3600) as I  <<<, making them theonly ancient people with a true fractional notation.

Thus it will be seen that the Babylonians actually used Base 10 --but generally did calculations in Base 60.

There is a good reason for the use of Base 60,the reason being that 60 has so many factors: It's divisible by2, 3, 4, 5, 6, 12, 15, 20, and 30. This means that all fractions involvingthese denominators are easily expressed (important, in a system wheredecimals were impossible due to the lack of a zero and even fractionsdidn't have a proper means of notation). This let the Babylonians set upfairly easy-to-use computation tables. This proved to be so much moreuseful for calculating angles and fractions that even the Greeks tookto expressing ratios and angles in Base 60, and we retain a residue ofit today (think degrees/minutes/seconds). The Babylonians, by usingBase 60, were able to express almost every important fraction simply,making division simple; multiplication by fractions was also simplified.This factalso helped them discover the concept (though they wouldn't have understoodthe term) of repeating decimals; they had tables calculating these, too.

Base 60 also has an advantage related to human physiology. Wecan count up to five at a glance; to assess numbers six or greaterrequired counting. So, given the nature of the cuneiform numbersexpressing 60 or 70 by the same method as 50 (six or seven pairsof brackets as opposed to five) would have required more carefulreading of the results. Whereas, in Babylonian notation, numbers could be read quicklyand accurately. A minor point, but still an advantage.

Nor were the Babylonians limited to calculating fractions. The Babylonianscalculated the square root of two to be roughly 1.414213, an errorof about one part in one million! (As a rough approximation, theyused 85/60, or 1.417, still remarkably good.) All of this was part oftheir approach to what we would call algebra, seeking the solution tovarious types of equations. Many of the surviving mathematics tabletsare what my elementary school called "story problems" --a problem described, and then solved in such a way as to permitgeneral solutions to problems of the type.

There were theoretical complications, to be sure. Apart from the problem thatthey sometimes didn't distinguish between exact and approximate solutions,their use of units would drive a modern scientist at least half mad -- thereis, for instance, a case of a Babylonian tablet adding a "length"to an "area." It has been proposed that "length" and"width" came to be the Babylonian term for variables, as we woulduse x, y, and z. This is possible -- but the result still permits confusionand imprecision.

We should incidentally look at the mathematics of ancient Mari, sinceit is believed that many of the customs followed by Abraham came from thatcity. Mari appears to have used a modification of the Babylonian system thatwas purely 10-based: It used a system exactly identical to the Babylonianfor numbers 1-59 -- i.e. vertical wedges for the numbers 1-9, and chevrons( < ) for the tens. So <<II, e.g., would represent 22, just as in Babylonian.

The first divergence came at 60. The Babylonians adopted a differentsymbol here, but in Mari they just went on with what they were doing, usingsix chevrons for 60, seven for seventy, etc. (This frankly must have beenhighly painful for scribes -- not just because it took 18 strokes, e.g. toexpress the number 90, but because 80 and 90 are almost indistinguishable).(Interestingly, they used the true Babylonian notation for internationaland "scientific" documents.)

For numbers in the hundreds, they would go back to the symbol used forones, using positions to show which was which -- e.g. 212 would be ||<||.But they did not use this todevelop a true positional notation (and they had no zero); rather, theyhad a complicated symbol for 1000 (four parallel horizontal wedges, avertical to their right, and another horizontal to the right of that), whichthey used as a separator -- much as we would use the , in the number 1,000 --and express the number of thousands with the same old unit for ones.

This system did not, however, leave any descendants that we know of;after Mari was destroyed, the other peoples in the area went back to thestandard Babylonian/Akkadian notation.

The results of Babylonian math are quitesophisticated; it is most unfortunate that the theoreticalwork could not have been combined with the Greek concept ofrigour. The combination might have advanced mathematics by hundreds ofyears. It is a curious irony that Babylonian mathematics wasimmensely sophisticated but completely pointless; like the Egyptiansand the Hebrews, they had notheory of numbers, and so while they could solve problemsof particular types with ease, they could not generalize to largerclasses of problems. Which may not sound like a major drawback, butrealize what this means: If the parameters of a problem changed,even slightly, the Babylonians had no way to know if their oldtechniques would accurately solve it or not.

None of this matters now, since we have decimals and Arabic numerals.Little matters even to Biblical scholars, even though, as noted,Hebrew math probably derivesfrom Babylonian (since the majority of Babylonian tablets come from theera when the Hebrew ancestors were still under Mesopotamian influence,and they could have been re-exposed during the Babylonian Captivity,since Babylonian math survived until the Seleudid era) or perhapsEgyptian; there is little mathin the Old Testament, and what there is has been "translated" intoHebrew forms.Nonetheless the pseudo-base of 60 has genuine historical importance:The 60:1 ratio of talent: mina: shekel is almost certainly based on theBabylonian use of Base 60.

Much of Egyptian mathematics resembles the Babylonian in thatit seeks directly for the solution, rather than creating rigourousmethods, though the level of sophistication is much less.

A typical example of Egyptian dogmatism in mathematics is theirinsistence that fractions could only have unitary numerators -- thatis, that 1/2, 1/3, 1/4, 1/5 were genuine fractions, but that afraction such a 3/5 was impossible. If the solution to a problem,therefore, happened to be 3/5, they would have to find some alternateformulation -- 1/2 + 1/10, perhaps, or1/5 + 1/5 + 1/5, or even1/3 + 1/4 + 1/60. Thus a fraction had no uniqueexpression in Egyptian mathematics -- making rigour impossible; insome cases, it wasn't even possible to tell if two people had comeup with the same answer to a problem!

Similarly, they had a fairly accurate way of calculating the areaof a circle (in modern terms, 256/81, or about 3.16) -- but they didn'tdefine this in terms of a number π (theiractual formula was (8d/9)2, where d is the diameter),and apparently did not realize that this number had any other uses suchas calculating the circumference of the circle.

Egyptian notation was of the basic count-the-symbols type we've seen, e.g.,in Roman and Mycenean numbers. In heiroglyphic, the units were shown with theso-very-usual straight line |. Tens we a symbol like an upside-down U -- ∩.So 43, for instance, would be ∩∩∩∩III. For hundreds,they used a spiral; a (lotus?) flower and stem stood for thousands. An imageof a bent finger stood for ten thousands. A tadpole-like creature representedhundred thousands. A kneeling man with arms upraised counted millions -- andthose high numbers were used, usually in boasts of booty captured. They alsohad four symbols for fractions: special symbols for 1/2, 2/3, and 3/4, plusthe generic reciprocal symbol, a horizontal oval we would read a "one over."So some typical fractions would be

Hieratic Numerals

It will be seen that there is no way to express, say, 2/5 in this system;it would be either 1/5+1/5 or, since the Egyptians don't seem to have likedrepeating fractions either, something like 1/3+1/15. (Except that they seemto have preferred to put the smaller fraction first, so this would be written1/15+1/3.)

The Egyptians actually had a separate fractional notation for volumemeasure, fractions-of-a-heqat. I don't think this comes up anywhere wewould care about, so I'm going to skip trying to explain it. Nonetheless,it was a common problem in ancient math -- the inability to realize thatnumbers were numbers. It often was not realized that, say, three drachmawere the same as three sheep were the same as three logs of oil. Variousancient systems had different number-names, or at least three differentsymbols, for all these numbers -- as if we wrote "3 sheep"but "III drachma." We have vestiges of this today -- considercurrency, where instead of saying, e.g., "3 $," we write "$3" --a significant notational difference.

We also still have some hints of the ancient problems with fractions, especiallyin English units:Instead of measuring a one and a half pound loaf of bread as weighing "1.5pounds," it will be listed as consisting of "1 pound 8 ounces."A quarter of a gallon of milk is not ".25 gallon"; it's "1 quart."(This is why scientists use the metric system!) This was even more common inancient times, when fractions were so difficult: Instead of measuring everythingin shekels, say, we have shekel/mina/talent, and homer/ephah, and so forth.

Even people who use civilized units of measurement often preserve the ancientfractional messes in their currency. The British have rationalized pounds andpence and guineas -- but they still have pounds and shillings and pence.Americans use dollars and cents, with the truly peculiar notation that dollarsare expressed (as noted above) "$1.00," while cents are "100¢"; the whole should ideally be rationalized. Germans, until the comingof the Euro, had marks andpfennig. And so forth. Similarly, we have a completely non-decimal time system;1800 seconds are 30 minutes or 1/2 hour or 1/48 day. Oof!

We of course are used to these funny cases. But it should alsways be kept inmind that the ancients used this sort of system for everything -- and hadeven less skill than we in converting.

But let's get back to Egyptian math....

The hieratic/demotic had a more compact, though more complicated,system than hieroglyphic. I'm not going to try to explain this, just show the varioussymbols as listed in figure 14.23 (p. 176) of Ifrah. This is roughlythe notation used in the Rhind Papyrus, though screen resolution makesit hard to display the strokes clearly.

This, incidentally, does much to indicate the difficulty of ancient notations. The Egyptians, in fact, do not seem even to have had a concept of general "multiplication"; their method -- which is ironically similar to a modern computer -- was the double-and-add. For example, to multiply 23 by 11 (which we could either do by direct multiplication or by noting that 11=10+1, so 23x11 = 23x(10+1)=23x10 + 23x1 =230+23=253), they would go through the following steps:
23x1 = 23
23x2 = 46
23x4 = 92
23x8 = 184
and 11=8+2+1
so 23x11 = (23x8) + (23x2) + (23+1) = 184 + 46 + 23 = 253

This works, but oy. A problem I could do by inspection takes six major steps, with all the chances for error that implies.

The same, incidentally, is true in particular of Roman numerals.This is thought to be the major reason various peoples invented theabacus: Even addition was very difficult in their systems, so theydidn't do actual addition; they counted out the numbers on the abacusand then translated them back into their notation.

That description certain seems to fit the Hebrews.Hebrew mathematics frankly makes one wonder why God didn't dosomething to educate these people. Their mathematics seems to havebeen even more primitive than the Romans'; there is nothing original,nothing creative, nothing even particularly efficient. It's almostfrightening to think of a Hebrew designing Solomon's Temple, forinstance, armed with all the (lack of) background on stresses and supports thata people who still lived mostly in tents had at their disposal. (Onehas to suspect that the actual temple construction was managed byeither a Phoenician or an Egyptian.)

This doubtless explains a problem which bothers mathematicians even ifit does not bother the average Bible scholar: The claim in Revelation 7:9that the number of the saved was "uncountable" or was so largethat "no one could count it." This of course is pure balderdash -- ifyou believe in Adam and Eve, then the total number of humans that ever lived canbe numbered in the tens of billions, which is easily countable by computers, andeven if you make the assumption based on evolutionary biology that the human raceis a hundred thousand or so years old, the number rises only slightly, and evenif you go back and count all australopithecines as human, it's still only hundredsof billions, and even if there are races on other planets throughout the universewhich are counted among the saved, well, the universe had a beginning time (the BigBang), and its mass is finite, so we can categorically say that the number of savedis a finite, countable number. A human being might not be able to actually do thecounting, but a modern human, or Archimedes, would have been able to write down thenumber if God were to supply the information. But someone who knew only Hebrewmathematics could not, and so could say that the number was beyond counting when infact it was merely beyond his comprehension.

The one thing that the Hebrews could call their own was theirnumbering system (and even that probably came from the Phoeniciansalong with the alphabet). They managed to produce a system with mostof the handicaps, and few of the advantages, of both the alphabeticsystems such as the Greek and the cumulative systems such as the Roman.As with the Greeks, they used letters of the alphabet for numbers --which meant that numbers could be confused with words, so they oftenprefixed ' or a dot over thenumber to indicate that it was a numeral. But, of course, the Hebrewalphabet had only 22 letters -- and, unlike the Greeks, they didnot invoke otherletters to supply the lack (except that a few texts use the terminalforms of the letters with two shapes, but this is reportedly rare).So, for numbers in the high hundreds, they ended up duplicating letters --e.g. since one tau meant 400, two tau meant 800.Thus, although the basic principle was alphabetic, you still hadto count letters to an extent.

The basic set of Hebrew numbers is shown at right.

An interesting and uncertain question is whether this notation preceded, supplanted, or existed alongside Aramaic numerals. The Aramaeans seem to have used a basic additive system. The numbers from one to nine were simple tally marks, usually grouped in threes -- e.g. 5 would be || ||| (read from right to left, of course); 9 would be ||| ||| |||. For 10 they used a curious pothook, perhaps the remains of a horizontal bar, something like a ∼ or ∩ or ⏜. They also hada symbol for 20, apparently based on two of these things stuck together; the result often looked rather like an Old English yogh (Ȝ) or perhaps ≈. Thus the number 54 would be written | ||| ∼ȜȜ.

There is archaeological evidence for the use of both "Hebrew"and "Aramaic" forms in Judea. Coins of Alexander Jannaeus(first century B.C.E.) use alphabetic numbers. But wefind Aramaic numbers among the Dead Sea Scrolls. This raises at least apossibility that the number form one used depended upon one's politics.The Jews at Elephantine (early Persian period) appear to have usedAramaic numbers -- but they of course were exiles, and living in aperiod before Jews as a whole had adopted Aramaic. On the whole, the evidenceprobably favors the theory that Aramaic numbering preceded Hebrew, but wecannot be dogmatic. In any case, Hebrew numbers were in use by New Testamenttimes; we note, in fact, that coins of the first Jewish Revolt -- which areof course very nearly contemporary with the New Testament books -- use theHebrew numerals.

There is perhaps one other point we should make about mathematics,and that is the timing of the introduction of Arabic numerals. An earlymanuscript of course cannot contain such numbers; if it has numerals(in the Eusebian apparatus, say), they will be Greek (or Roman, orsomething). A late minuscule, however, can contain Arabic numbers --and, indeed, many have pages numbered in this way.History of Arabic Numerals

Arabic numerals underwent much change over the years. The graphicat right barely sketches the evolution. The first three samples arebased on actual manuscripts (in the first case, based on scans of theactual manuscript; the others are composite).

The first line is from the Codex Vigilanus, generally regarded asthe earliest use of Arabic numerals in the west (though it usesonly the digits 1-9, not the zero). It was written, notsurprisingly, in Spain, which was under Islamic influence. The codex(Escurial, Ms. lat. d.1.2) was copied in 976 C. E. by amonk named Vigila at the Abelda monastery. The next several samplesare (based on the table of letterforms in Ifrah) typical of thenext few centuries. Following this, I show the evolution of formsdescribed in E. Maunde Thompson, An Introduction to Greek andLatin Paleography, p. 92. Thompson notes that Hindu/Arabic numerals were usedmostly in mathematical works until the thirteenth century, becominguniversal in the fourteenth century. Singer, p. 175,describes a more complicated path: Initially they were used primarilyin connection with the calendar. The adoption of Arabic numerals formathematics apparently can be credited to one Leonardo of Pisa, whohad done business in North Africa and seen the value of the system.He'll perhaps sound more familiar if we note that he was usuallycalled "Fibonacci," the "Son of Bonaccio" -- nowfamous for his series (0, 1, 1, 2, 3, 5, 8...) in which each termis the sum of the previous two. But his greatest service to mathematicswas his support of modern notation.In 1202 he put forth the Book of the Abacus, a manual ofcalculation (which also promoted the horizontal stroke - to separate the numeratorsand denominators of fractions, though his usage was, by modern standards,clumsy, and it took centuries for this notationto catch on). The use of Arabic numerals was further encouraged whenthe Yorkshireman John Holywood (died 1250) produced his own bookon the subject, which was quote popular; Singer, p. 173,reports that Holywood "did more to introduce the Arabic notationthan any other." Within a couple of centuries, they were commonlyused. In Chaucer's Treatise on theAstrolabe I.7, for instance, addressed to his ten-year-old son, hesimply refers to them as "noumbers of augrym" -- i.e.,in effect, abacus numbers -- and then proceeds to scatter themall through the text.) If someone has determined theearliest instance of Arabic numbers in a Biblical manuscript, Iconfess I do not know what it is.

Most other modern mathematical symbols are even more recent. Thesymbols + and - for addition and subtraction, for instance, are firstfound in print in Johann Widman's 1489 publication Rechnung uffallen Kauffmanschafften. (Prior to that, it was typical to usethe letters p and m.) The = sign seems to go back to England'sRobert Recorde (died 1558), who published several works dating backto 1541 -- though Recorde's equality symbol was much wider than ours,looking more like ====. (According to John Gribbin, Recorde developedthis symbol on the basis that two parallel lines were as equal as twothings could be. Gribbin also credits Records with the + and - symbols,but it appears he only introduced them into English. The symbols x formultiplication and ÷ for division were not adopted until theseventeenth century -- and, we note, are still not really universal, sincewe use a dot for multiplication and a / for division.) The = notationbecame general about a centurylater. The modern notation of variables (and parameters) can becredited to François Viète (1540-1603), who alsopushed for use of decimal notation in fractions and experimentedwith notations for the radix point (what we tend to call the"decimal point," but it's only a decimal point in Base 10;in Base 2, e.g., it's the binary point. In any case, it's the symbolfor the division between whole number and fractional parts -- usually,in modern notation, either a point or a comma).

The table below briefly shows the forms of numerals in some ofthe languages in which New Testament versions exist. Some of theseprobably require comment -- e.g. Coptic numerals are theoreticallybased on the Greek, but they had a certain amount of time to diverge.Observe in particular the use of the chi-rho for 900; I assume this isprimarily a Christian usage, but have not seen this documented. Manyof the number systems (e.g. the Armenian) have symbols for numberslarger than 900, but I had enough trouble trying to draw these clearly!

Various Number Systems

Addendum: Textual Criticism of Mathematical Works

Most ancient mathematical documents exist in only a singlecopy (e.g. the Rhind Papyrus is unique), so any textualcriticism must proceed by conjecture.And this is in fact trickier than it sounds. If an ancientdocument adds, say, 536 and 221 and reaches a total of 758instead of the correct 757, can we automatically assume thedocument was copied incorrectly? Not really; while this is atrivial sum using Arabic numerals, there are no trivial sumsin most ancient systems; they were just too hard to use!

But the real problems are much deeper. Copying a mathematicalmanuscript is a tricky proposition indeed. Mathematics has far lessredundancy than words do. In words, we have "mis-spellings,"e.g., which formally are errors but which usually are transparent.In mathematics -- it's right or it's wrong. And any copying errormakes it wrong. And, frequently, you not only have to copy the textaccurately, but any drawings. And labels to the drawings. And thetext that describes those labels. To do this right requires severalthings not in the standard scribe's toolkit -- Greek mathematics wasbuilt around compass and straight edge, so you had to have a good oneof each and the ability to use it. Plus the vocabulary was inevitablyspecialized.

The manuscripts of Euclid, incidentally, offer a fascinating parallel with the New Testament tradition, especially as the latter was seen by Westcott and Hort. The majority of manuscripts belong to a single type, which we know to be recensional: It was created by the editor Theon. Long after Euclid was rediscovered, a single manuscript was found in the Vatican, containing a text from a different recension. This form of the text is generally thought to be earlier. Such papyrus scraps as are available generally support the Vatican manuscript, without by any means agreeing with it completely. Still, it seems clear that the majority text found in Theon has been somewhat smoothed and prettied up, though few of the changes are radical and it sometimes seems to retain the correct text where the Vatican type has gone astray.

Bibliography to the section on Ancient Mathematics

The study of ancient mathematics is difficult; one has to understand language and mathematics, and have the ability to figure out completely alien ways of thinking. I've consulted quite a few books to compile the above (e.g. Chadwick's publications on Linear B for Mycenaean numerals), and read several others in a vain hope of learning something useful, but most of the debt is to five books (which took quite a bit of comparing!). The "select bibliography:"

In addition, if you're interested in textual criticism of mathematical works, you might want to check Thomas L. Heath's translation of Euclid (published by Dover), which includes an extensive discussion of Euclid's text and Theon's recension, as well as a pretty authoritative translation with extensive notes.

Assuming the Solution

"Assuming the solution" is a mathematical term for a particularly vicious fallacy (which can easily occur in textual criticism) in which one assumes something to be true, operates on that basis, and then "proves" that (whatever one assumed) is actually the case. It's much like saying something like "because it is raining, it is raining." It's just fine as long as it is, in fact, actually raining -- but if it isn't, the statement is inaccurate. In any case, it doesn't have any logical value. It is, therefore, one of the most serious charges which can be levelled at a demonstration, because it says that the demonstration is not merely incomplete but is founded on error.

As examples of assuming the solution, we may offer either Von Soden's definition of the I text or Streeter's definition of the "Cæsarean" text. Both, particularly von Soden's, are based on the principle of "any non-Byzantine reading" -- that is, von Soden assumes that any reading which is not Byzantine must be part of the I text, and therefore the witness containing it must also be part of the I text.

The problem with this is that it von Soden had created a definition which guaranteed that something would emerge, and naturally something did. A definition which has such a negative basis means that everythingcan potentially be classified as an I manuscript, including (theoretically) two manuscripts which have not a single reading in common at points of variation. It obviously can include manuscripts which agree only in Byzantine readings. This follows from the fact that most readings are binary (that is, only two readings are found in the tradition). One reading will necessarily be Byzantine. Therefore the other is not Byzantine. Therefore, to von Soden, it was an I reading. It doesn't matter where it actually came from, or what sort of reading it is; it's listed as characteristic of I.

This sort of error has been historically very common in textual criticism. Critics must strive vigorously to avoid it -- to be certain they do not take something on faith. Many results of past criticism were founded on assuming the solution (including, e.g., identifying the text of P46 and B with the Alexandrian text in Paul). All such results need to be re-verified using definitions which are not self-referencing.

Note: This is not a blanket condemnation of recognizing manuscriptsbased on agreements in non-Byzantine readings. That is, Streeter's method of finding the Cæsarean text is not automatically invalid if properly applied. Streeter simply applied it inaccurately -- in two particulars. First, he assumed the Textus Receptus was identical with the Byzantine text. Second, he assumed that any non-Textus Receptus reading was Cæsarean. The first assumption is demonstrably false, and the second too broad. To belong to a text-type, manuscripts must display significant kinship in readings not associated with the Byzantine text. This was not the case for Streeter's secondary and tertiary witnesses, which included everything from A to the purple uncials to 1424. The Cæsarean text must be sought in his primary witnesses (which would, be it noted, be regarded as secondary witnesses in any text-type which included a pure representative):Θ 28 565 700 f1 f13 arm geo.

Binomials and the Binomial Distribution

Probability is not a simple matter. The oddsof a single event happening do not translate across multiple events. Forinstance, the fact that a coin has a 50% chance to land heads does notmean that two coins together have a 50% chance of both landingheads. Calculating the odds of such events requires the use of distributions.

The most common distribution in discrete events such as coin tosses ordie rolls is the binomial distribution. This distribution allows usto calculate the odds of independent events occurring a fixed number oftimes. That is, suppose you try an operation n times. What are theodds that the "desired" outcome (call it o) will happenm and only m times? The answer is determined by the binomialdistribution.

Observe that the binomial distribution applies only toevents where there are two possible outcomes, o and not o.(It can be generalized to cover events with multiple outcomes, butonly by clever definition of the event o). The binomial probabilitiesare calculated as follows:

If n is the number of times a trial is taken, and m is the number ofsuccesses, and p(o) is the probability of the event taking place ina single trial, then the probability P(m,n) of the result occurring mtimes in n trials is given by the formula


and where n! (read "n factorial") is defined as 1x2x3x...x(n-1)xn. So,e.g, 4! = 1x2x3x4 = 24, 5! = 1x2x3x4x5 = 120. (Note: For purposes of calculation, thevalue 0! is defined as 1.)

(Note further: The notation used here, especially the symbol P(m,n), is not universal.Other texts will use different symbols for the various terms.)

The various coefficients of P(m,n) are also those of the well-known "Pascal'sTriangle""

0           11         1   12       1   2   13     1   3   3   14   1   4   6   4   15 1   5  10   10  5   1

where P(m,n) is item m+1 in row n. For n greater than about six or seven,however, it is usually easier to calculate the terms (known as the"binomial coefficients") using the formula above.

Example: What are the odds of rolling the value one exactly twice if youroll one die ten times? In this case, the odds of rolling a one (what we have calledp(o)) are one in six, or about .166667. So we want to calculate

             10!              2             (10-2)P(2,10) = --------- * (.16667)  * (1-.16667)          2!*(10-2)!           10*9*8*7*6*5*4*3*2*1          2         8        = ---------------------- * .16667  * .83333          (2*1)*(8*7*6*5*4*3*2*1)

which simplifies as
           10*9         2         8        =  ---- * .16667  * .83333     = 45 * .02778 * .23249 = .2906            2*1

In other words, there is a 29% chance that you will get two ones if you rollthe die ten times.

For an application of this to textual criticism, consider a manuscript witha mixed text. Assume (as a simplification) that we have determined (by whatevermeans) that the manuscript has a text that is two-thirds Alexandrian and one-thirdByzantine (i.e., at a place where the Alexandrian and Byzantine text-typesdiverge, there are two chances in three, or .6667, that the manuscript will have theAlexandrian reading, and one chance in three, or .3333, that the reading will be Byzantine).We assume (an assumption that needs to be tested, of course) that mixture israndom. In that case, what are the odds, if we test (say) eight readings, thatexactly three will be Byzantine?The procedure is just as above: We calculate:

            8!           3        5P(3,8) = -------- * .3333  * .6667         3!*(8-3)!           8*7*6*5*4*3*2*1        3       5   8*7*6        = ------------------ *.3333 * .6667  = ----- * .0370 * .1317 = .2729         (3*2*1)*(5*4*3*2*1)                  3*2*1

In other words, in a random sample of eight readings, there is just over a27% chance that exactly three will be Byzantine.

We can also apply this over a range of values. For example, we can calculatethe odds that, in a sample of eight readings, between two and four will beByzantine. One way to do this is to calculate values of two, three, and fourreadings. We have already calculated the value for three. Doing the calculations(without belabouring them as above) gives us

P(2,8) = .2731
P(4,8) = .1701

So if we add these up, the probability of 2, 3, or 4 Byzantine readings is.2729+.2731+.1701 = .7161. In other words, there is nearly a 72% chance that,in our sample of eight readings, between two and four readings will be Byzantine.By symmetry, this means that there is just over a 29% chance that there will befewer than two, or more than four, Byzantine readings.

We can, in fact, verify this and check our calculations by determiningall values.


Observe that, if we add up all these terms, they sum to .9992 --which is as good an approximation of 1 as we can expect with thesefigures; the difference is roundoff and computational imperfection.Chances are that we don't have four significant digits of accuracyin our figures anyway; see the section onAccuracy and Precision.

(It is perhaps worth noting that binomials do not have to use only twoitems, or only equal probabilities. All that is required is that theprobabilities add up to 1. So if we were examining the so-called"Triple Readings" of Hutton, which are readings whereAlexandrian, Byzantine, and "Western" texts have distinctreadings, we might find that 90% of manuscripts have the Byzantinereading, 8% have the Alexandrian, and 2% the "Western." Wecould then apply binomials in this case, calculating the odds ofa reading being Alexandrian or non-Alexandrian, Byzantine or non-Byzantine,"Western" or non-Western. We must, however, be very aware of thedifficulties here. The key one is that the "triple readings"are both rare and insufficiently controlled. In other words, theydo not constitute anything remotely resembling a random variable.)

The Binomial Distribution has other interesting properties. For instance, itcan be shown that the Mean of the distributionis given by

μ = np

(So, for instance, in our example above, where n=8 and p=.33333, themean, or the average number of Byzantine readings we would expect if wetook many, many tests of eight readings, is 8*.33333, or 2.6667.)

Similarly, the variance is given by

σ2 = np(1-p)

while the standard deviationσ is, of course, the square root of the above.

Our next point is perhaps best made graphically. Let's make a plot ofthe values given above for P(n,8) in the case of a manuscript two-thirds Alexandrian,one-third Byzantine.

      *  *      *  *      *  *      *  *  *   *  *  *  *   *  *  *  *   *  *  *  *    *  *  *  *  **  *  *  *  *  *  *-------------------------0  1  2  3  4  5  6  7  8

This graph is, clearly, not symmetric. But let's change things again.Suppose, instead of using p(o)=.3333, we use p(o)=.5 -- that is, a manuscriptwith equal parts Byzantine and Alexandrian readings. Then our table is as follows:


Our graph then becomes:

            *            *         *  *  *         *  *  *      *  *  *  *  *      *  *  *  *  *   *  *  *  *  *  *  *-------------------------0  1  2  3  4  5  6  7  8

This graph is obviously symmetric. More importantly (though it isperhaps not obvious with such a crude graph and so few points), itresembles a sketch of the so-called "bell-shaped" or"normal" curve:

It can, in fact, be shown that the oneis an approximation of the other. The proof is sufficiently complex,however, that even probability texts don't get into it; certainlywe won't burden you with it here!

We should note at the outset that the "normal distribution"has no direct application to NT criticism. This is because the normaldistribution is continuous rather than discrete. That is,it applies at any value at all -- you have a certain probability at1, or, 2, or 3.8249246 or √3307/π.A discrete distribution applies only at fixedvalues, usually integers. But NT criticism deals with discrete units --a variant here, a variant there. Although these variants are myriad,they are still countable and discrete.

But this is often the case in dealing with real-world distributionswhich approximate the normal distribution.Because the behavior of the normal distribution is known and well-defined,we can use it to model the behavior of a discrete distribution whichapproximates it.

The general formula for a normal distribution, centered around the meanμ and with standard deviation σ, is given by

This means that it is possible to approximate the value of the binomial distributionfor a series of points by calculating the area of the equivalent normal distributionbetween corresponding points.

Unfortunately, this latter cannot be reduced to a simple formula (for thosewho care, it is an integral without a closed-form solution). The results generallyhave to be read from a table (unless one has a calculator with the appropriatestatistical functions). Such tables, and information on how to use them, arefound in all modern statistics books.

It's worth asking if textual distributions follow anything resembling a normalcurve. This, to my knowledge, has never been investigated in any way. Andthis point becomes very important in assessing such things as the so-called"Colwell rule" (see the section on E. C. Colwell & Ernest W. Tune:"Method in Establishing QuantitativeRelationships Between Text-Types of New Testament Manuscripts.")This is a perfectly reasonable dissertation for someone -- taking a significantgroup of manuscripts and comparing their relationships over a number of samples.We shall only do a handful, as an example. For this, we use the data fromLarry W. Hurtado, Text-Critical Methodology and the Pre-Caesarean Text: Codex W inthe Gospel of Mark. We'll take the three sets of texts which he finds clearlyrelated: ℵ and B, A and the TR, Θ and 565.

Summarizing Hurtado's data gives us the following (we omit Hurtado's decimaldigit, as he does not have enough data to allow three significant digits):

Chapter% of ℵ
with B
% of A
with TR
% of Θ
with 565
STD DEV4.05.29.6

Let's graph each of these as variations around the mean. That is, let'scount how many elements are within half a standard deviation (s) of the mean m,and how many are in the region one standard deviation beyond that, andso forth.

For ℵ and B,m is 79 and s is 4.0. So:

         %agree < m-1.5s, i. e. % < 73      |*m-1.5s < %agree < m-.5s, i.e. 73 <= % < 77  |**m-.5s  < %agree < m+.5s, i.e. 77 <= % <= 81 |********m+.5s  < %agree < m+1.5s, i.e. 81 < % <= 85 |***         %agree > M+1.5s, i.e. % > 85       |*

For A and TR,m is 86.9 and s is 5.2. So:

         %agree < m-1.5s, i. e. % < 80      |*m-1.5s < %agree < m-.5s, i.e. 80 <= % < 85  |**m-.5s  < %agree < m+.5s, i.e. 85 <= % <= 90 |*********m+.5s  < %agree < m+1.5s, i.e. 90 < % <= 95 |***         %agree > M+1.5s, i.e. % > 95       |

For Θ and 565,m is 70 and s is 9.6. So:

         %agree < m-1.5s, i. e. % < 55      |*m-1.5s < %agree < m-.5s, i.e. 55 <= % < 66  |*****m-.5s  < %agree < m+.5s, i.e. 66 <= % <= 74 |**m+.5s  < %agree < m+1.5s, i.e. 74 < % <= 84 |*******         %agree > M+1.5s, i.e. % > 84       |

With only very preliminary results, it's hard to draw conclusions. Thefirst two graphs do look normal. The third looks just plain strange.This is not anything like a binomial/normal distribution. The strongimplication is that one or the other of these manuscripts is block-mixed.

This hints that distribution analysis might be a useful tool in assessingtextual kinship. But this is only a very tentative result; we musttest it by, e.g., looking at manuscripts of different Byzantine subgroups.


WARNING: Cladistics is a mathematicaldiscipline arising out of the needs of evolutionary biology.It should be recalled, however, that mathematics is independentof its uses. The fact that cladistics is useful in biologyshould not cause prejudice against it; it has since been appliedto other fields. For purposes of illustration,however, I will use evolutionary examples because they're whatis found in all the literature.

A further warning: I knew nothing about cladistics beforeStephen C. Carlson began to discuss the matter with reference totextual criticism. I am still not expert. You will not learncladistics from this article; the field is too broad. The goalof this article is not to teach cladistics but to explaingenerally what it does.

Consider a problem: Are dolphins and fish related?

At first glance, it would certainly seem so. After all, both arestreamlined creatures, living in water, with fins, which use motionsof their lower bodies to propel themselves.

And yet, fish reproduce by laying eggs, while dolphins producelive young. Fish breathe water through gills; dolphins breathe airthrough lungs. Fish are cold-blooded; dolphins are warm-blooded.Fish do not produce milk for their young; dolphins do. More subtly,fish tails are vertical; dolphin tails are horizontal.

Based on the latter characteristics, dolphins would seem tohave more in common with rabbits or cattle or humans than withfish. So how do we decide if dolphins are fish-like orrabbit-like? This is the purpose of cladistics: Based on avariety of characteristics (be it the egg-laying habits ofa species or the readings of a manuscript), to determinewhich populations are related, and how.

Biologists have long believed that dolphins are more closelyrelated to the other mammals, not the fish. The characteristicsshared with the mammals go back to the "ur-mammal";the physical similarities to fish are incidental. (The technicalterm for characteristics which evolved independentlyis an "analogous feature" or a "homoplasy."Cases of similar characteristics which derive from common ancestryare called "homologous features" or "homologies."In biology, homologies are often easy to detect -- for example, allmammals have very similar skeletons if you just count all the bones,althouth the sizes of the bones varies greatly. A fish -- even a bonyfish -- has a very different skeleton, so you can tell a dolphin isnot a fish by its bones. Obviously such hints are less common whendealing with manuscript variants.)

This is the point at which textual critics become interested,because kinship based on homology is very similar to the stemmaticconcept of agreement in error. Example: Turtles and lizards andhorses all have four legs. Humans and chimpanzees have two armsand two legs -- and robins and crows also have only two legs.Are we more like robins or horses? Answer: Likehorses. Four legs is the "default mode"; for amphibians, reptiles,and mammals; theseparation into distinct arms and legs is a recent adaption --not, in this case, an error, but a divergence from the originalstock. This is true even though birds, like humans, also havetwo legs and two limbs which are not legs. Similarly, a textcan develop homoplasies: assimilation of parallels, h.t. errors, andexpansion of epithets are all cases where agreement in reading canbe the result of coincidence rather than common origin.

To explain some of this, we need perhaps to look at some biologicalterminology. There are two classes of relationship in biology:Clades and grades.This topic is covered more fully in the separate article, but a summaryhere is perhaps not out of place, because the purpose of cladistics is tofind clades. A clade is a kinship relationship. A grade is a similarityrelationship. A biological example of a grade involves reptiles, birds,and mammals. Birds and mammals are both descended from reptiles. Thus,logically speaking, birds and mammals should be treated as reptiles. Butthey aren't; we call a turtle a reptile and a robin a bird. Thus "reptile"is a grade name: Reptiles are cold-blooded, egg-laying, air-breathing creatures.A warm-blooded creature such as a human, although descended from the sameproto-reptile as every snake, lizard, and tortoise, has moved into a differentgrade. (We observe that grade definitions are somewhat arbitrary. The reptile/bird/mammal distinction is common because useful. Classifying creaturesby whether they are green, brown, blue, red, or chartreuse is a grade distinctionthat has little use -- creatures are too likely to change color based on theirlocal environment -- and so is not done.)

Clades are based solely on descent from a common ancestor. Thus the greatapes are a small clade within the larger clade of apes, within the yet largerclade of primates, within the very large clade of mammals.

This distinction very definitely exists in textual criticism. Consider,for example, the versions. Versions such as the Latin, Syriac, Coptic, andOld Church Slavonicare taken directly from the Greek. The Anglo-Saxon version is a translationof a translation, taken from the Latin; similarly, the Bulgarian is atranslation (or, more properly, an adaption) of a translation; it comesfrom the Old Church Slavonic.

Thus we can divide these by clades and grades. The Latin, Syriac, Coptic,and Old Church Slavonic belong to the grade of "direct translationsfrom the Greek;" the Anglo-Saxon and Bulgarian belong to the gradeof "translations from other versions." But the Anglo-Saxon andBulgarian are not related; they have no link closer than the Greek.They belong to a grade but not a clade. By contrast, the Latin andAnglo-Saxon do not belong to a grade -- the former is translated directlyfrom the Greek, and the latter from the Latin -- but they do form a clade:The Anglo-Saxon comes from the Latin. Anything found in the Anglo-Saxonmust either be from the Latin or must be a corruption dating from after thetranslation. It cannot have a reading characteristic of the Greek whichdid not make it into its Latin source.

Cladistics is, at its heart, a method for sorting out grades andcombiming the data to produce clades. It proceeds by examining eachpoints of variation, and trying to find the "optimum tree."("Optimum" meaning, more or less, "simplest.")For this we can take a New Testament example. Let's look atMark 3:16 and the disciple called either Lebbaeus or Thaddaeus.Taking as our witnesses A B D E L, we find that D reads Lebbaeus,while A B E L read Thaddaeus. That gives us a nice simple tree(though this isn't the way you'll usually see it in a biologicalstemma):

-----------*-----|  |  |  |      |A  B  E  L      D

Which in context is equivalent to

      Autograph           |-----------*-----|  |  |  |      |A  B  E  L      D

The point shown by * is a node -- a point of divergence.At this point in the evolution of the manuscripts, somethingchanged. In this case, this is the point at which D (or, perhaps,A B E L) split off from the main tree.

This, obviously, is very much like an ordinary stemma, whichwould express the same thing as

        Autograph            |     --------------     |            |     X            Y     |            |----------        ||  |  |  |        |A  B  E  L        D

But now take the very next variant in the Nestle/Aland text:Canaanite vs. Canaanean. Here we find A and E reading Canaanite,while B D L have Canaanean. That produces a different view:

----------*------|  |  |      |  |B  D  L      A  E

Now we know, informally, that the explanation for this is thatB and L are Alexandrian, A and E Byzantine, and D "Western."But the idea is to verify that. And to extend it to largerdata sets, and cases where the data is more mixed up. This is where cladisticscomes in. Put very simply, it takes all the possible trees for aset of data, identifies possible nodes, and looks for the simplesttree capable of explaining the data. With only our two variants, it'snot easy to demonstrate this concept -- but we'll try.

There are actually four possibletrees capable of explaining the above data:

                            Autograph                                :   ----*----*----    i.e.    ----*----*----| |   |    | |            | |   |    | |B L   D    A E            B L   D    A E

                               Autograph                                  :   --*---*----*----   i.e.   --*---*----*----|   |   |    | |          |   |   |    | |B   L   D    A E          B   L   D    A E

                            Autograph                                :   ----*----*---*--   i.e.   ----*----*---*--| |   |    |   |          | |   |    |   |B L   D    A   E          B L   D    A   E

                               Autograph                                  :   --*---*----*---*--  i.e.  --*---*----*---*--|   |   |    |   |        |   |   |    |   |B   L   D    A   E        B   L   D    A   E

To explain: The first diagram, with two nodes, defines threefamilies, B+L, D, and A+E. The second, with three nodes, definesfour families: B, L, D, and A+E. The third, also with threenodes, has four families, but not the same four: B+L, D, A, E.The last, with four nodes, has five families: B, L, D, A, E.

In this case, it is obvious that the first design, with onlytwo nodes, is the simplest. It also corresponds to our sense ofwhat is actually happening. This is why people trustcladistics.

But while we could detect the simplest tree in this caseby inspection, it's not that simple as the trees get morecomplex. There are two tasks: Creating the trees, and determiningwhich is simplest.

This is where the math gets hairy. You can't just look at allthe trees by brute force; it's difficult to generate them, andeven harder to test them. (This is the real problem with classicalstemmatics: It's not in any way exhaustive, even when it'sobjective. How do we know this? By the sheer number of possibilities.Suppose you have fifty manuscripts, and any one can bedirectly descended from two others -- an original and a corrector.Thus for any one manuscript, it can have any of 49 possibleoriginals and, for each original, 49 possible correctors [the other48 manuscripts plus no corrector at all]. That's 2401 linkagesjust for that manuscript. And we have fifty of them!An informal examination of one of Stephen C. Carlson'scladograms shows 49 actual manuscripts -- plus 27 hypothesizedmanuscripts and a total of 92 links between manuscripts!)So there is just too much data to assess to make "brute force"a workable method. And, other than brute force, there is noabsolutely assured method for finding the best tree. This means that,in a situation like that for the New Testament, we simply don'thave the computational power yet to guarantee the optimal tree.

Plus there is the possibility that multiple trees can satisfythe data, as we saw above. Cladistics cannot prove that its chosentree is the correct tree, only that it is the simplest of thoseexamined. It is, in a sense, Ockham's Razor turned into a mathematical tool.

Does this lack of absolute certainty render cladistics useless? By no means; it is thebest available mathematical tool for assessing stemmatic data.But we need to understand what it is, and what it is not. Cladistics,as used in biology, applies to group characteristics (a largeor a small beak, red or green skin color, etc.) and processes(the evolution of species). The history of the text applies to avery different set of data. Instead of species and groups ofspecies, it deals with individual manuscripts. Instead of characteristicsof large groups within a species, we are looking at particularreadings. Evolution proceeds by groups, over many, many generations.Manuscript copying proceeds one manuscript at a time, and for allthe tens of thousands of manuscripts and dozens of generationsbetween surviving manuscripts, it is a smaller, more compact tradition thanan evolutionary tree.

An important point, often made in the literature, is that the results of cladistics can prove non-intuitive. The entities which "seem" most closely related may not prove to be so. (This certainly has been the case with Stephen C. Carlson's preliminary attempts, which by and large confirm my own results on the lower levels of textual grouping -- including finding many groups not previously published by any other scholars. But Carlson's larger textual groupings, if validated by larger studies, will probably force a significant reevaluation of our assessments of text-types.) This should not raise objections among textual critics; the situation is analogous to one Colwell described (Studies in Methodology, p. 33): "Weak members of a Text-type may contain no more of the total content of a text-type than strong members of some other text-type may contain. The comparison in total agreements of one manuscript with another manuscript has little significance beyond that of confirmation, and then only if the agreement is large enough to be distinctive."

There are other complications, as well. A big one is mixture.You don't see hawks breeding with owls; once they developed intoseparate species, that was it. There are no reunions of types, only separations.But manuscripts can join. One manuscript of one type canbe corrected against another. This means that the tree doesn'tjust produce "splits" (A is the father of B and C,B is the father of D and E, etc.) but also "joins" (Ais the offspring of a mixture of X and Y, etc.) This results invastly more complicated linkages -- and this is an area mathematicianshave not really explored in detail.

Another key point is that cladograms -- the diagramsproduced by cladistics -- are not stemma. Above,I called them trees, but they aren't. They aren't "rooted" --i.e. we don't know where things start. In the case of the treesI showed for Mark, we know that none of the manuscripts is theautograph, so they have to be descendant. But this is notgenerally true, and in fact we can't even assume it fora cladogram of the NT. A cladogram -- particularly one forsomething as interrelated as the NT -- is not really a "tree"but more of a web. It's a set of connections, but the connectionsdon't have a direction or starting point. Think, by analogy, ofthe hexagon below:


If you think of the red dots at the vertices (nodes) as manuscripts,it's obvious what the relationship between each manuscript is: It'slinked to three others. But how do you tell where the first manuscriptis? Where do you start?

Cladistics can offer no answer to this. In the case of NT stemma,it appears that most of the earliest manuscripts are within a fewnodes of each other, implying that the autograph is somewhere nearthere. But this is not proof.

Great care, in fact, must be taken to avoid reading too muchinto a cladogram. Take the example we used above, of A, B, D, E, L.A possible cladogram of this tree would look like

     /\    /  \   /    \  /     /\ /     /  \/ \   /  / \B  L  D  A  E

This cladogram, if you just glance at it, would seem to imply that D (i.e. the"Western" text) falls much closer to A and E (the Byzantine text) thanto B and L (the Alexandrian text), and that the original text is to be foundby comparing the Alexandrian text to the consensus of the other two. However,this cladogram is exactly equivalent to

     /\    /  \   /    \  / \    \ /   \    \/ \   \  / \B  L  D  A  E

And this diagram would seem to imply that D goes more closelywith the Alexandrian text. Neither (based on our data) is true;the three are, as best we can tell, completely independent. Thekey is not the shape of the diagram but the location of the nodes.In the first, our nodes are at

     *\    /  \   /    \  /     /* /     /  \/ \   /  / \B  L  D  A  E

In the second, it's

     /*    /  \   /    \  * \    \ /   \    \/ \   \  / \B  L  D  A  E

But it's the same tree, differently drawn. Theimplications are false inferences based on an illusion inthe way the trees are drawn.

We note, incidentally, that the relations we've drawn as treesor stemmas can be drawn "inline," with a sort of a modified settheory notation. In this notation, square brackets [] indicate a relationor a branch point. For example, the above stemma would be
[ [ B L ] D [ A E ] ]

This shows, without ambiguity of branch points, that B and L gotogether, as do A and E, with D rather more distant from both.

This notation can be extended. For example, it is generally agreedthat, within the Byzantine text, the uncials E F G H are more closelyrelated to each other than they are to A; K andΠ are closelyrelated to each other, less closely to A, less closely still to E F G H.So, if we add F G H K Πto the above notation, we get

[[B L] D [[A [K Π]][E F G H]]]

It will be evident that this gets confusing fast. Although thenotation is unequivocal, it's hard to convert it to a tree in one's mind. And, with thisnotation, there is no possibility of describing mixture, which can be shownwith a stemmatic diagram, if sometimes a rather complex one.

It has been objected that cladistics cannot handle mixture, and so is irrelevant to stemmatics. It is true that, as originally done, cladistics always assumes splits, not combinations. However, at least two attempts have been made to resolve this. Stephen C. Carlson's modified cladistic methods allow for a manuscript to have multiple parents, plus there has been extensive mathematical work on what is called "lateral gene transfer." I can't describe either, since I don't know the maths, but I believe the methods of cladistics can, and should, be used even in contaminated traditions.

Cladistics is a field that is evolving rapidly, and new methods and applications are being found regularly. I've made no attempt to outline the methods for this reason (well, that reason, and because I don't fully understand it myself, and because the subject really requires more space than I can reasonably devote). To this point, the leading exponent of cladistics in NT criticism is Stephen C. Carlson, who as mentioned has been evolving new methods to adapt the discipline to TC circumstances. I cannot comprehensively assess his math, but I have seen his preliminary results, and am impressed.


In mathematical jargon, a corollary is a result that followsimmediately from another result. Typically it is a more specific case ofa general rule. An elementary example of this might be as follows:

Theorem: 0 is the "additive identity." That is, forany x, x+0=x.

Corollary: 1+0=1

This is a very obvious example, but the concept has value, as it allowslogical simplification of the rules we use. For example, there are quitea few rules of internal criticism offered by textual critics. All of these,however, are special cases of the rule "That reading is best whichbest explains the others." That is, they are corollaries of this rule.Take, for example, the rule "Prefer the harder reading." Whyshould one prefer the harder reading? Because it is easier to assume thata scribe would change a hard reading to an easy one. In other words, thehard reading explains the easy. Thus we prove that the rule "Preferthe harder reading" is a corollary of "That reading is best whichbest explains the others." QED. (Yes, you just witnessed a logicalproof. Of course, we did rather lightly glide by some underlying assumptions....)

Why do we care about what is and is not a corollary? Among other things,because it tells us when we should and should not apply rules. For example,in the case of "prefer the harder reading," the fact that it isa corollary reminds us that it applies only when we are looking at internalevidence. The rule does not apply to cases of clear errors in manuscripts(which are a province of external evidence).

Let's take another corollary of the rule "That reading is best whichbest explains the others." In this case, let's examine "Prefer theshorter reading." This rule is applied in all sorts of cases. It shouldonly be applied when scribal error or simplification can be ruled out -- aswould be obvious if we examine the situation in light of "That reading isbest which best explains the others."


It may seem odd to discuss the word "definition" in a section onmathematics. After all, we all know what a definition is, right -- it'sa way to tell what a word or term means.

Well, yes and no. That's the informal definition of definition. But that'snot a sufficient description.

Consider this "definition": "The Byzantine text is the texttypically found in New Testament manuscripts."

In a way, that's correct -- though it might serve better as adefinition of the "Majority Text." But while, informally,it tells us what we're talking about, it's really not sufficient.How typical is "typical?" Does a reading supported by95% of the tradition qualify? It certainly ought to. How about onesupported by 75%? Probably, though it's less clear. 55%? By no meansobvious. What about one supported by 40% when no other reading issupported by more than 30% of the tradition? Uh....

And how many manuscripts must we survey to decide what fraction ofthe tradition is involved, anyway? Are a few manuscripts sufficient,or must we survey dozens or hundreds?

To be usable in research settings, the first requirement for a definitionis that it be precise. So, for instance, a precise definition of theMajority Text might be the text found in at least 50% plus one of allmanuscripts of a particular passage. Alternately, and morepractically, the Majority Text might be defined as In the gospels,the reading found in the most witnesses of the test group A E K MS V 876 1010 1424. This may not be "the" Majorityreading, but it's likely that it is. And, of great importance,this definition can be applied without undueeffort, and is absolutely precise: It always admits one and onlyone reading (though there will be passages where,due to lacunose or widely divergent witnesses, it will not define aparticular reading).

But a definition may be precise without being useful. For example,we could define the Byzantine text as follows: The plurality readingof all manuscripts written after the year 325 C. E. within 125 kilometersof the present site of the Hagia Sophia in Constantinople. This definition isrelentlessly precise: It defines one and only one reading everywhere inthe New Testament (and, for that matter, in the Old, and in classicalworks such as the Iliad). The problem is, we can't tell what that readingis! Even among surviving manuscripts, we can't tell which were writtenwithin the specified distance of Constantinople, and of course thedefinition, as stated, also includes lost manuscripts! Thus thisdefinition of the Byzantine text, while formally excellent, is somethingwe can't work with in practice.

Thus a proper definition must always meet two criteria: It mustbe precise and it must be applicable.

I can hear you saying, Sure, in math, they need good definitions.But we're textual critics. Does this matter? That is, do we reallycare, in textual criticism, if a definition is precise and applicable?

The answer is assuredly yes. Failure to apply both preciseand applicable definitions is almost certain to be fatal to goodmethod. An example is the infamous "Cæsarean" text,Streeter's definition was, in simplest terms, any non-Textus Receptusreading found in two or more "Cæsarean" witnesses.This definition is adequately precise. It is nonetheless fatallyflawed in context, for three reasons: First, it's circular; second,the TR is not the Byzantine text, so in fact many of Streeter's"Cæsarean" readings are in fact nothing more norless than Byzantine readings; third, most readings are binary, soone reading will always agree with theTR and one will not, meaning that every manuscript except the TRwill show up, by his method, as "Cæsarean"!

An example of a definition that isn't even precise is offeredby Harry Sturz. He defined (or, rather, failed to define) theByzantine text as being the same as individual Byzantine readings!In other words, Sturz showed that certain Byzantine readings werein existence before the alleged fourth century recension thatproduced the Byzantine text. (Which, be it noted, no one everdenied!) From this he alleged that the Byzantine text as awhole is old. This is purely fallacious (not wrong, necessarily,but fallacious; you can't make that step based on the data) -- but Sturz, becausehe didn't have a precise definition of the Byzantine text, thoughthe could do it.

The moral of the story is clear and undeniable: If you wishto work with factual data (i.e. if you want to produce statistics,or even just generalizations, about external evidence), you muststart with precise and applicable definitions.

THIS MEANS YOU. Yes, YOU. (And me, and everyone else, of course.But the point is the basis of all scientific work: Definitions mustbe unequivocal.)

Dimensional Analysis

Also known as, Getting the units right!

Have you ever heard someone say something like "That's at least alight-year from now?" Such statements make physicists cringe. Alight-year is a unit of distance (the distance light travelsin a year), not of time.

Improper use of units leads to meaningless results, and correct use ofunits can be used to verify results.

As an example, consider this: The unit of mass is (mass). The unit ofacceleration is (distance)/(time)/(time). The unit of force is(mass)(distance)/(time)/(time). So the product of mass times accelerationis (mass)(distance)/(time)/(time) -- which happens to be the same asthe unit of force. And lo and behold, Newton's second lawstates that force equals mass times acceleration. And that means thatif a result does not have the units of force (mass times distancedivided by time squared, so for instance kilograms times metres dividedby seconds squared, or slugs times feet divided by hours squared),it is not a force.

This may sound irrelevant to a textual critic, but it is not. Suppose youwant to estimate, say, the number of letters in the extant New Testamentportion of B. How are you going to do it? Presumably by estimating the amountof text per page, and then multiplying by the number of pages. But that,in fact, is dimensional analysis: letters per page times pages per volumeequals letters per volume. We can express this as an equation todemonstrate the point:

letters   pages    letters   pages    letters------- * ------ = ------- * ------ = ------- pages    volume    pages    volume   volume

We can make things even simpler: Instead of counting letters per page, we cancount letters per line, lines per column, and columns per page. This time letus work the actual example. B has the following characteristics:


     pages     columns       lines      letters142 ------ * 3 ------- * 42 ------ * 16 ------- =    volume      page        column       line               pages   columns   lines    letters142*3*42*16 * ------ * ------- * ------ * ------- =              volume    page     column    line          pages   columns   lines    letters286272 * ------ * ------- * ------ * ------- =         volume    page     column    line

286272 letters/volume (approximately)

The Law of the Excluded Middle

This, properly, is a rule of logic, not mathematics, but it is asource of many logical fallacies. The law of the excluded middle isa method of simplifying problems. It reduces problems to one of twopossible "states." For example, the law of the excludedmiddle tells us that a reading is either original ornot original; there are no "somewhat original"readings. (In actual fact, of course, there is some fuzzinesshere, as e.g. readings in the original collection of Paul's writingsas opposed to the reading in the original separate epistles. But thisis a matter of definition of the "original." A readingwill either agree with that original, whatever it is, or willdisagree.)

The problem with the law of the excluded middle lies in applyingit too strongly. Very many fallacies occur in pairs, in cases wherethere are two polar opposites and the truth falls somewhere inbetween. An obvious example is the fallacy of number. Since it hasrepeatedly been shown that you can't "count noses" -- i.e.that the majority is not automatically right -- there are somewho go to the opposite extreme and claim that numbers mean nothing.This extreme may be worse than the other, as it means one can simplyignore the manuscripts. Any reading in any manuscript -- or evena conjecture, found in none -- may be correct. This is the logicalconverse of the Majority Text position.

The truth unquestionably lies somewhere in between. Countingnoses -- even counting noses of text-types -- is not the wholeanswer. But counting does have value, especially at higherlevels of abstraction such as text-types or sub-text-types.All other things being equal, the reading found in the majorityof text-types must surely be considered more probable than theone in the minority. And within text-types, the reading foundwithin the most sub-text-types will be original. And so on, downthe line. One must weight manuscripts, not count them -- butonce they are weighed, their numbers have meaning.

Other paired fallacies include excessive stress on internalevidence (which, if taken to its extreme, allows the critic tosimply write his own text) or external evidence (which, taken toits extreme, would include clear errors in the text) andover/under-reliance on certain forms of evidence (e.g. Boismardwould adopt readings solely based on silence in fathers, clearlyplacing too much emphasis on the fathers, while others ignoretheir evidence entirely. We see much the same range of attitudetoward the versions. Some would adopt readings based solely onversional evidence, while others will not even accept evidencefrom so-called secondary versions such as Armenian and Georgian).

Exponential Growth

Much of the material in this article parallels that in the sectionon Arithmetic, Exponential, and Geometric Progressions,but perhaps it should be given its own section to demonstrate the powerof exponential growth.

The technical definition of an exponential curve is a function of theform


where a is a positive constant. When a is greater thanone, the result is exponential growth.

To show you how fast exponential growth can grow, here are some results ofthe function for various values of a


It will be seen that an exponential growth curve can grow very quickly!

This is what makes exponential growth potentially of significance fortextual critics: It represents one possible model of manuscript reproduction.The model is to assume each manuscript is copied a certain number of timesin a generation, then destroyed. In that case, the constant a aboverepresents the number of copies made. x represents the number ofgenerations. y represents the number of surviving copies.

Why does this matter? Because a small change in the value of the constanta can have dramatic effects. Let's demonstrate this by demolishingthe argument of the Byzantine Prioritists that numeric preponderance meanssomething. The only thing it necessarily means is that the Byzantinetext had a constant a that is large enough to keep it alive.

For these purposes, let us assume that the Alexandrian text is theoriginal, in circulation by 100 C.E. Assume it has areproductive constant of 1.2. (I'm pulling these numbers out of my head,be it noted; I have no evidence that this resembles the actual situation.This is a demonstration, not an actual model.) We'll assume a manuscript"generation" of 25 years. So in the year 100 x=0. The year 125corresponds to x=1, etc. Our second assumption is that the Byzantine textcame into existence in the year 350 (x=10), but that it has a reproductiveconstant of 1.4.

If we make those assumptions, we get these results for the number ofmanuscripts at each given date:

generationyear Alexandrian
ratio, Byzantine to
Alexandrian mss.

The first column, "generation," counts the generationsfrom the year 100. The second column, "year," gives theyear. The next two columns, "Alexandrian manuscripts" and "Byzantine manuscripts," give the number of manuscripts of each type wecould expect at that particular time. (Yes, we get fractions of manuscripts.Again, this is a model!) The final column, the "ratio," tells us howmany Byzantine manuscripts there are for each Alexandrian manuscript. For thefirst 250 years, there are no Byzantine manuscripts. For a couple of centuriesafter that, Byzantine manuscripts start to exist, but are outnumbered. Butby 625 -- a mere 275 years after the type came into existence -- they areas numerous (in fact, slightly more numerous) than Alexandrian manuscripts.By the year 800, when the type is only 450 years old, it constitutes three-quartersof the manuscripts. By the year 1000, it has more than a 10:1 dominance, andit just keeps growing.

This doesn't prove that the Byzantine type came to dominate by means ofbeing faster to breed. All the numbers above are made up. The point is,exponential growth -- which is the model for populations allowed toreproduce without constraint -- can allow a fast-breeding population toovertake a slower-breeding population even if the slow-breeding populationhas a head start.

We can show this another way, by modelling extinction. Suppose we start witha population of 1000 (be it manuscripts or members of a species or speakers ofa language). We'll divide them into two camps. Call them "A" and"B" for Alexandrian and Byzantine -- but it could just as wellbe Neandertals and modern humans, or Russian and non-Russian speakers inone of the boundary areas of Russia. We'll start with 500 of A and 500 ofB, but give A a reproductive rate of 1.1 and B a reproductive rate of 1.15.And remember, we're constraining the population. That is, at the end ofeach generation, there can still only be 1000 individuals. All that changesis the ratio of individuals. We will also assume that there must be at least100 individuals to be sustainable. In other words, once one or the otherpopulation falls below 100, it goes extinct and the other text-type/species/languagetakes over.

So here are the numbers:

Generationpopulation of Apopulation of B
0 500 500
1 478 522
2 457 543
3 435 565
4 414 586
5 393 607
6 372 628
7 352 648
8 333 667
9 314 686
10 295 705
11 277 723
12 260 740
13 244 756
14 228 772
15 213 787
16 199 801
17 186 814
18 173 827
19 161 839
20 149 851
21 139 861
22 129 871
23 119 881
24 110 890
25 102 898
26 94 906

Observe that it takes only 26 generations for Population A to die out.

How fast the die-off takes depends of course on the difference in breeding rates.But 26 generations of (say) dodos is only 26 years, and for people it's only 500-800years.

It may be argued that a difference in breeding rate of 1.1 versus 1.2 is large.This is true. But exponential growth will always dominate in the end.Let's take a few other numbers to show this point. If we hold B's rate of increaseto 1.2, and set various values for A's rate of population increase, the tablebelow shows how many generations it takes for A to go extinct.

Reproductive constant for AGenerations to extinction.

Note the first column, comparing a reproductive rate for A of 1.19 with a rateof 1.2 for B. That's only a 5% difference. Population A still goes extinct in 264generations -- if this were a human population, that would be about 6000years.

In any case, to return to something less controversial than political genetics,the power of exponential growth cannot be denied. Any population with a high growthrate can outpace any population with a slow growth rate, no matter how bigthe initial advantage of the former. One cannot look at current numbers of apopulation and predict past numbers, unless one knows the growth factor.


The dictionary definition of "fallacy" is simply something false or based on false information.

This is, of course, a largely useless definition. We have the word "wrong" to apply to things like that. In practice, "fallacy" has a special meaning -- a false belief based on attractive but inaccurate data or appealing but incorrect logic. It's something we want to believe for some reason, even though there is no actual grounds for belief.

A famous example of this is the Gambler's Fallacy. This is the belief that, if you've had a run of bad luck in a game of chance (coin-tossing or dice-playing, for instance), you can expect things to even out because you are due a run of good luck.

This is an excellent example because it shows how the fallacy comes about. The gambler knows that, over a large sample, half of coin tosses will be heads, one sixth of the rolls of a die will produce a six, and so forth. So the "expected" result of two tosses of a coin is one heads, one tails. Therefore, if the coin tossed tails last time, heads is "expected" next time.

This is, of course, not true. The next toss of the coin is independent of the previous. The odds of a head are 50% whether the previous coin toss was a head, a tail, or the-coin-fell-down-a-sewer-drain-and-we-can't-get-it-back.

Thus the gambler who has a run of bad luck has no more expectations for the future than the gambler who has had a run of good luck, or a gambler who has thrown an exactly even number of heads and tails. Yes, if the gambler tosses enough coins, the ratio of heads to tails will eventually start to approach 1:1 -- but that's not because the ratio evens out; it's just that, with enough coin tosses, the previous run of "bad luck" will be overwhelmed by all the coin tosses which come after.

A typical trait of fallacies is that they make the impersonal personal. In the Gambler's Fallacy, the errant assumption is that the statistical rule covering all coin tosses applies specially and specifically to the next coin toss. Indeed, to your (or the gambler's) next coin toss. The pathetic fallacy is to believe that, if something bad happens, it's because the universe is "out to get us" -- that some malevolent fate caused the car to blow a tire and the bus to be late all in the same day in order to cause me to be late to a meeting. This one seems actually to be hard-wired into our brains, in a sense -- it's much easier to remember a piece of unexpected "bad luck" than good.

These two fallacies are essentially fallacies of observation -- misunderstanding of the way the universe works. The other type of fallacy is the fallacy of illogic -- the assumption that, because a particular situation has occurred, that there is some logical reason behind it.

The great critical example of this is the Fallacy of Number. This is the belief that, because the Byzantine text-type is the most common, it must also be the most representative of the original text.

This illustrates another sort of logical flaw -- the notion of reversibility. The fallacy of number begins with the simple mathematical model of exponential growth. This model says that, if a population is capable of reproducing faster than it dies off, then the population will grow explosively, and the longer it is allowed to reproduce, the larger the population becomes.

The existence of exponential growth is undeniable; it is why there are so many humans (and so many bacteria) on earth. But notice the condition: if a population is capable of reproducing faster than it dies off. Exponential growth does not automatically happen even in a population capable of it. Human population, for instance, did not begin its rapid growth until the late nineteenth century, and the population explosion did not begin until the twentieth century. Until then, deaths from disease and accident and starvation meant that the population grew very slowly -- in many areas, it grew not at all.

The fallacy of number makes the assumption that all manuscripts have the same number of offspring. If this were true, then the conclusion would be correct: The text with the most descendants would be the earliest, with the others being mutations which managed to leave a few descendants of their own. However, the assumption in this case cannot be proved -- which by itself is sufficient to make the argument from number fallacious. There are in fact strong reasons to think that not all manuscripts leave the same number of descendants. So this makes the fallacy of number especially unlikely to be correct.

We can, in fact, demonstrate this mathematically. Let's assume that the Byzantine Text is original, and see where this takes us. Our goal is to test the predictive capability of the model (always the first test to which a model must be subjected). Can Byzantine priority be used to model the Alexandrian text?

We start from the fact that, as of this writing, there are just about 3200 continuous-text Greek manuscripts known. Roughly three-fourths of these contain the Gospels -- let's say there are 2400 gospel manuscripts in all. The earliest mass production of gospel manuscripts can hardly have been before 80 C.E. For simplicity, let's say that the manuscript era ended in 1580 C.E. -- 1500 years. We assume that a manuscript "generation" is twenty years. (A relatively minor assumption. We could even use continuous compounding, such as banks now use to calculate interest. The results would differ only slightly; I use generations because, it seems to me, this method is clearer for those without background in taking limits and other such calculus-y stuff.) That means that the manuscript era was 75 generations.

So we want to solve the equation (1+x)75 = 2400. The variable x, in this case, is a measure of how many new surviving manuscripts are created in each generation. It turns out that 1+x = 1.10935, or x=0.10935.

Of our 2400 Gospel manuscripts, at most 100 can be considered primarily Alexandrian. On this basis, we can estimate when the Alexandrian text originated. We simply count the number of generations needed to produce 100 Alexandrian manuscripts in a situation where .10935 new manuscripts are created in a generation, That means we want to solve the equation (1.10935)y=100, where y is the number of generations. The answer turns out to be about 44.5 generations, or 890 years.

890 years before the end of the manuscript era is 690 C. E. -- the very end of the seventh century.

P75 dates from the third century. B and ℵ date from the fourth. Thus our three primary Alexandrian witnesses are at least three centuries earlier than the model based on equal descendants allows.

Of our 2400 Gospel manuscripts, at most five can be considered "Western." Solving the equation (1.10935)z=5, it turns out that the earliest "Western" manuscript would date from 390 years before the end of the manuscript era -- around 1190.

I have never seen D dated later than the seventh century.

Thus a model of exponential growth fails catastrophically to explain the number and distribution of both Alexandrian and "Western" manuscripts. We can state quite confidently that manuscripts do not reproduce exponentially. Therefore the argument based on exponential reproduction of manuscripts operates on a false assumption, and the argument from number is fallacious.

The fallacy of number (like most fallacies) demonstrates one of the great rules of logic: "Unnecessary assumptions are the root of all evil."

The above argument, note, does not prove that the Byzantine text is not the original text. The Byzantine text may be original. But if it is, the point will have to be proved on grounds other than number of manuscripts.

Game Theory

As far as I know, there is no working connection between game theoryand textual criticism. I do not think there can be one with the actualpractice of textual criticism.But I know someone who hoped to find one, so I suppose I shoulddiscuss the topic here. And I find it very interesting, so I'mgoing to cover it in enough depth to let you perhaps do someuseful work -- or at least realize why it's useless for textualcriticism.

There is one very indirect connection for textual scholars,having to do with the acquisitionof manuscripts and artifacts. Many important relics have been found by nativeartifact-hunters in places such as Egypt and the region of Palestine. Oftenthey have broken them up and sold them piecemeal -- as happened with theStone of Mesha and several manuscripts, divided into separate leaves or evenhaving individual leaves or rolls torn to shreds and the shreds soldindividually.

To prevent this, dealers need to create a pricing structure whichrewards acquisition of whole pages and whole manuscripts, without makingthe bonus so high that the hunters will ignore anything less than a wholemanuscript. Unfortunately, we cannot really state a rule for how the pricesshould be structured -- it depends on the economic circumstances in thelocality and on the location where collection is occurring and on the natureof expected finds in the vicinity (so at Qumran, where there is the possibilityof whole books, one might use a different pricing structure than at Oxyrhynchus,where one finds mostly fragments. But how one sets prices for Egypt as a whole,when one does not know where manuscripts like P66 and P75are found, is a very tricky question indeed. Since I do not know enough aboutthe antiquities markets to offer good examples, I'm going to skip that and justdo an elementary overview of game theory.)

Although this field of mathematics is called "game theory,"a better name might be something like "strategy theory."The purpose is to examine strategies and outcomes under situationswith rigid rules. These situations may be genuine games, such astic-tac-toe -- but they may equally be real-world situations suchas buying and selling stocks, or even deciding whether to launch anuclear war. The rules apply in all cases. Indeed,the economics case is arguably the most important; several Nobelprizes have been awarded for applications of game theory to marketsituations.

Game theory is a relatively new field in mathematics; it firstcame into being in the works of John von Neumann, whose proof ofthe minimax theorem in 1926 gave the field its first foundations;von Neumann's 1944 Theory of Games and Economic Behavioris considered the foundation of the field. (There are mentionsof "game theory" before that, and even some Frenchresearch in the field, but it was von Neumann who really foundedit as a discipline.)

For the record, an informal statement of the minimax theorem isthat, if two "players" have completely opposed interests-- that is, if they're in a situation where one wins if and onlyif the other loses -- then there is always a rational course of actionfor both players: A best strategy. It is called a minimax because it holdsthe loser's loss to a guaranteed (on average) minimum and while keepingthe winner's wins at a guaranteed maximum. Put another way, the minimaxtheorem says that there is a strategy which will assure a guaranteed consistentmaximum result for one party and a minimum loss for the other.

Not all games meet this standard --e.g. if two competing companies are trying to bolster their stockprices, a rising stock market can allow them both to win -- but gamesthat do involved opposed interests can often illustrate even the cases that don't meet thecriterion. The minimax theorem doesn't say those other games don'thave best strategies, after all -- it's just that it isn't guaranteed.

To try to give an idea of what game theory is like, let'slook at a problem I first met in Ivan Morris's The Lonely Monk and OtherPuzzles. It shows up in many forms (apparently it was originallydescribed by Martin Shubik, whom we will meet again below),so I'll tell this my own way.

A mafia boss suspects that one of his hit men, Alonzo, may havebeen cheating him, and puts him under close guard. A week later,he discovers that Bertrand might have been in the plot, and handshim over to the guard also. Finally, evidence turns up againstCesar.

At this point, the boss decides it's time to make an example.He decides to stage a Trial by Ordeal, with the three fighting tothe finish. Alonzo, however, has been in custody for two weeks,and has been severely debilitated; once a crack shot, he now canhit a target only one time in three. Bertrand too has suffered,though not quite as much; he can hit one time in two. Cesar,newly placed in detention, is still able to hit every time.

So the boss chains the three to three equidistant stakes,and gives each one in turn a single-shot pistol. Alonzo isgranted the first shot, then Bertrand, then Cesar, and repeat,with a re-loaded pistol, until two are dead.

There are two questions here: First, at whom should Alonzoshoot, and second, what are his odds of survival in each case?

Assume first that Alonzo shoots at Bertrand. If he hitsBertrand (33% chance), Bertrand dies, and Cesar instantlyshoots Alonzo dead. Not such a good choice.

But if Alonzo shoots at Bertrand and misses, then Bertrand,knowing Cesar to be the greater threat, shoots at Cesar.If he misses (50% chance), then Cesar shoots Bertrand, and Alonzo has one chance in three to kill Cesar before being killed. If,on the other hand, Bertrand kills Cesar, then we have a duelthat could go on forever, with Alonzo and Bertrand alternatingshots. Alonzo has one chance in three of hitting on the first shot,and two chances in three of missing; Bertrand thus has onechance in three of dying on Alonzo's first shot, and two chancesin three of surviving; if he survives, he has one chance in twoof killing Alonzo. The rules of compound probability thereforesay that Alonzo has one chance in three of killing Bertrand onhis first shot, and one chance in three (1/2 times 2/3)of being killed by Bertrand on his first shot, and onechance in three of neither one being killed and the processrepeating. The process may last forever, but the odds are even; eachhas an equal likelihood of surviving.So, in the case where Alonzo opens the action by shooting Bertrand,his chances of survival are 1/3*1/2=1/6for the case where Bertrand misses Cesar, and 1/2x1/2=1/4 in thecase where Bertrand hits Cesar. That's a total of 5/12.

Thus if Alonzo shoots at Bertrand, he has one chance in threeof instant death (because he kills Bertrand), and 2/3*5/12=5/18 ofsurviving (if he misses Bertrand).

Less than one chance in three. Ow.

What about shooting at Cesar?

If Alonzo shoots at Cesar and misses, then we're back in thesituation covered in the case where he shoots at Bertrand and misses.So he has a 5/12 chance in that case. Which, we note incidentally,is better than fair; if this were a fair contest, his chance ofsurvival would be 1/3, or 4/12.

But what if he hits Cesar? Then, of course, he's in a duelwith Bertrand, this time with Bertrand shooting first. Andwhile the odds between the two are even if Alonzo shootsfirst, it's easy enough to show that, if Bertrand shoots first,Alonzo has only one chance of four of winning, er, living.

To this point, we've simply been calculatingprobabilities. Game theory comes inas we try to decide the optimal strategy. Let's analyze ourfour basic outcomes:

And, suddenly, Alonzo's strategy becomes clear: He shoots in theair! Since his odds of survival are best if he misses both Bertrandand Cesar, he wants to take the strategy that ensures missing.

This analysis, however, is only the beginning of game theory; thethree-way duel (which has been called a "truel")is essentially a closed situation, with only threepossible outcomes, and those outcomes, well, terminal. Although therewere three possible outcomes of this game, it was essentially a solitairegame; Bertrand and Cesar's strategies were fixed even though the actualoutcome wasn't. As J. D. Williams writes on p. 13 of The CompleatStrategyst, "One-person games are uninteresting,from the Game Theory point of view, and therefore are not really studiedhere. Their solution is quite straightforward, conceptually: Yousimply select the course of action that yields the most and do it. If thereare chance elements, you select the action which yields the most onaverage...." This is, of course, one of the demonstrations why gametheory isn't much help in dealing with textual criticism: Reconstructinga text is a solitairegame, guessing what a scribe did. As Rapoport-Strategy, p. 73, says, "there areformidable conceptual difficulties in assigning definitive probabilitiesto unique events," adding that "With respect to... these,the 'rationality' of human subjects leaves a great deal to be desired....[T]he results do indicate that a rational decision theory based on anassumption that others follow rational principles of riskydecisions could be extremely misleading." He also warns (p. 85)that the attempt to reduce a complex model to something simple enoughto handle with the tools of game theory is almost certainly doomed tofail: "the strategist [read, in our case, textual critic] has noexperiments to guide him in his theoretical development.... Accordinglyhe simplifies not in order to build a science from the bottom up but inorder to get answers. The answers he gets are to the problem he poses,not necessarily, not even usually, to the problems with which the worldwe have made confronts us."

Still, this exampleillustrates an important point about game theory: It's not aboutwhat we ordinarily call games. Game theory, properly so called,is not limited to, say, tic tac toe, or even a game like chess -- thoughwhat von Neumann proved with the minimax theoremis that such games have an optimal strategythat works every time. (Not that it wins, necessarily, but thatit gives the best chance for the best outcome. It has been said thatthe purpose of game theory is not really to determine how to win --since that depends on your opponent as well as yourself -- but how tobe sure you do not regret your actions if you lose. Von Neumann appliedgame theory to poker, e.g., and the result produced a lot of surprises:You often have to bet big on poor hands, and even so, your expected payoff,assuming you face opponents who are also playing the optimal strategy,is merely to break even! See Ken Binmore, Game Theory: A Very Short Introduction,pp. 89-92. It appears that the players who win a lot in poker aren't the oneswho have the best strategy but the ones who are best at reading their opponents.)

If we look at the simple game of tic tac toe, we know the possible outcomes, and can write out the precise strategies both players play to achieve a draw (or to win if the opponent makes a mistake). By contrast, the game of chess is so complicated that we don't know the players' strategies, nor even who wins if both play their best strategies (it's estimated that the "ideal game" would last around five thousand moves, meaning that the strategy book would probably take more space than is found in every hard drive in, say, all of Germany. What's more, according to Binmore, p. 37, the number of pure strategies is believed to be greater than the number of electrons in the universe -- which also means that there are more strategies than can be individually examined by any computer that can possibly be built. It isn't even possible to store a table which says that each individual strategy has been examined or not!). But not all games are so rigidly determined -- e.g. an economic "game," even if it takes all human activity into account, could not know in advance the effects of weather, solar flares, meteors....

Most game theory is devoted to finding a long-term strategy for dealing with games that happen again and again -- investing in the stock market, playing hundreds of hands of blackjack, something like that. In the three-way duel, the goal was to improve one's odds of survival once. But ordinarily one is looking for the best long-term payoff.

Some such games are trivial. Take a game where, say, two playersbet on the result of a coin toss. There is, literally, no optimalstrategy, assuming the coin is fair. Or, rather, there is no strategythat is less than optimal: Anything you guess is as likely to work asany other. If you guess "heads"every time, you'll win roughly 50% of the bets. If you guess"tails," you'll also win just about 50% in the long run. If youguess at random, you'll still win 50% of the time, because, on everytoss, there is a 50% chance the coin will agree with you.

Things get much, much more interesting in games with somewhatunbalanced payoffs. Let's design a game and see where it takes us. (This willagain be a solitaire game, but at least it will show us how to calculate astrategy.) Ourhypothetical game will again use coin tosses -- but this time we'lltoss them ten at a time, not one at a time. Here is the rule (one sosimple that it's even been stolen by a TV game show): before theten coins are tossed, the player picks a number, from 0 to 10, representingthe number of heads that will show up. If the number of heads is greater thanor equal to the player's chosen number, he gets points equal to the number he guessed. Ifthe number of heads is less than his number, he gets nothing. So, e.g.,if he guesses four, and six heads turn up, then he gets four points.

So how many should our player guess, each time, to earn the greatestpayoff in the long term?

We can, easily enough, calculate the odds of 0, 1, 2, etc. heads,using the data on the Binomial Distribution.It turns out to be as follows:

# of
 Odds of
n heads
01 0.001
110 0.010
245 0.044
3120 0.117
4210 0.205
5252 0.246
6210 0.205
7120 0.117
845 0.044
910 0.010
101 0.001

Now we can determine the payoffs for each strategy. For example, the"payoff" for the strategy of guessing "10" is 10 pointstimes .001 probability = .01. In other words, if you repeatedly guess 10,you can expect to earn, on average, .01 points per game. Not much of apayoff.

For a strategy of "9," there are actually two ways to win:if nine heads show up, or if ten heads show up. So your odds of winningare .010+.001=.011. The reward in points is 9. So your projected payoff is9*.011=.099. Quite an improvement!

We're balancing two factors here: The reward of the strategy with theprobability. For example, if you choose "0" every time, you'llwin every game -- but get no payoff. Choose "1" every time, andyou'll win almost all the time, and get some payoff, but not much. So whatis the best strategy?

This we can demonstrate with another table. This shows the payoff foreach strategy (rounded off slightly, of course):


So the best strategy for this game is to consistently guess "4."

But now let's add another twist. In the gameabove, there was no penalty for guessing high, except that you didn't win.Suppose that, instead, you suffer for going over. If, say, you guess "5,"and only four heads turn up, you lose five points. If you guess, "10,"then, you have one chance in 1024 of earning 10 points -- and 1023 chances in 1024of earning -10 points. Does that change the strategy?


This shows a distinct shift. In the first game, every guess except"0" had at least a slight payoff, and the best payoffs werein the area of "4"-"5".Now, we have large penalties for guessing high, and the only significantpayoffs are for "3" and "4," with"3" being the optimal strategy.

Again, though, we must stress that this is a solitaire game. There is noopponent. So there is no actual game theory involved --it's just probability theory.

True games involve playing against an opponent of some sort, humanor computer (or stock market, or national economy, or something).Let's look at a two-person game, though a very simple one:We'll again use coins. The game starts with A and B each putting a fixedamount in the bank, and agreeing on a number of turns.In each round of the game, players A and B set outa coin. Each can put out a dime (ten cents, or a tenth of a dollar)or a quarter (25 cents). Whatever coins they putout, A gets to claim a value equivalent to the combined value from the bank.At the end of the game, whatever is left in the bank belongs to B.

This game proves to have a very simple strategy for each player.A can put out a quarter or a dime. If he puts out a quarter, he isguaranteed to claim at least 35 cents from the bank, and it mightbe 50 cents; if he puts out a dime, the most he can pick up is 35cents, and it might be only 20.

B can put out a quarter or a dime; if he does the former, he loses atleast 35 cents, and it might be 50; if he plays the dime, he limits hislosses to a maximum of 35 cents, and it might be only 20.

Clearly, A's best strategy is to put out a quarter, ensuring that he winsat least 35 cents; B's best strategy is to put out a dime, ensuringthat he loses no more than 35 cents. These are what are called "dominantstrategies" -- a strategy which produces the best results no matter whatthe other guy does. The place the two settle on is called thesaddle point. Williams, on p. 27, notes that a saddle point is a situation whereone player can announce his strategy in advance, and it will not affect the other'sstrategy!

Note that games exist where both players have a dominantstrategy, or where only one has a dominant strategy, or where neither player hasa dominant strategy. Note also that a dominant strategy does not inherentlyrequire always doing the same thing. The situation, in which both have adominant strategy, produces the "Nash Equilibrium,"named after John Nash, the mathematician (artificially famous as a resultof the movie "A Beautiful Mind") who introduced the concept. In general,the Nash Equilibrium is simply the state a game achieves if all parties involvedplay their optimal strategies -- or, put another way, if they take the course theywould take should they know their opponents' strategy (Binmore, p. 14).

Note also that a game can have multiple Nash Equilibria -- the requirement fora Nash Equilibrium is simply that it is stable once both players reach it. Think,perhaps, of a ball rolling over a non-smooth surface. Every valley is a Nash Equilibrium --once the ball rolls into it, it can't roll its way out. But there may be several valleysinto which it might fall, depending on the exact initial conditions.

The game below is an example of one which has an optimal strategy more complicatedthan always playing the same value -- it's a game with an equilibrium but no saddlepoint. We will play it with coins although it's usually playedwith fingers -- it's the game known as "odds and evens."In the classical form, A andB each show one or two fingers, with A winning if they show the same numberof fingers and B winning if they show different numbers. In our coinversion, we'll again use dimes and quarters, with A earning a point ifboth play the same coin, and B winning if they play different coins. It'sone point to the winner either way. But this time, let's show the result asa table (there is a reason for this, which we'll get to).

B Plays
1 -1
-1 1

The results are measured in payoff to A: a 1 means A earns one point,a -1 means A loses one point.

One thing is obvious about this game: Unlike the dime-and-quarter case,you should not always play the same coin. Your opponent will quickly seewhat you are doing, and change strategies to take advantage. The only wayto keep your opponent honest is to play what is called a "mixedstrategy" -- one in which you randomly mix together multiple moves.(One in which you always do the same thing is a "pure strategy."Thus a mixed strategy consists of playing multiple rounds of a game andshuffling between pure strategies from game to game. If a game has asaddle point, as defined above, then the best strategy is a purestrategy. If it does not have a saddle point, then a mixed strategywill be best.)

Binmore, p. 23, notes that many people already understand the needfor a random strategy in certain games, even if they don't know exactlywhat ratio of choices to make. The reason is a classic aphorism: "Youhave to keep them guessing."

Davis, pp. 27-28, offers a different version of the argument, based ona plot in Poe's "The Purloined Letter." In that story, one boyinvolved in a playground game of matching marbles could always win eventually,because he evolved techniques for reading an opponent's actions. How, then,could one hold off this super-kid? Only one way: By making random choices. Itwouldn't let you beat him, but at least you wouldn't lose. There is an interestingcorollary here, as pointed out by Davis, p. 31: If you are smarter than youropponent, you can perhaps win by successfuly second-guessing him. But if youare not as smart as your opponent, you can hold him to a draw by using a randommixed strategy.

This may seem like a lot of rigmarole for a game we all know is fair, andwith such a simple outcome. But There Are Reasons. The above table can be used tocalculate the value (average payout to A), and even the optimal strategy (orratio of strategies, for a mixed strategy) forany zero-sum game (i.e. one where the amount gained by Player A isexactly equal to that lost by Player B, or vice versa) with two options foreach player.

The system is simple. Call the options for Player A "A1"and "A2" and the options for Player B "B1" and"B2." Let the outcomes (payoffs) bea b c d. Then our table becomes:

B Plays
a b
c d

The value of the game, in all cases meeting the above conditions,is

   ad - bc-------------a + d - b - c 

With this formula, it is trivially easy to prove that the value forthe "odds and evens" game above is 0. Just as we would haveexpected. There is no advantage to either side.

But wait, there's more! Not only do we know the value of the game,but we can tell the optimal strategy for each player! We express it asa ratio of strategies. For player A, the ratio of A1 to A2 is given by(a - b)/(c - d).For B, the ratio of B1 to B2 is(a - c)/(b - d).In the odds and evens case, since
a = 1
b = -1
c = -1
d = 1,
that works out to the optimal ratio for A being
A1:A2 = [1-(-1)]/[-1-(1)] = -2/2 = -1.
We ignore the minus sign; the answer is 1 -- i.e. we play A1 asoften as A2.
Similarly, the optimal ratio for B is 1. As we expected.The Nash Equilibrium is for each player to play arandom mix of dimes and quarters, and the value of the game if theydo is zero.

We must add an important note here, one which we mentioned abovebut probably didn't emphasize enough. The above applies only in gameswhere the players have completely opposed interests. If onegains, another loses. Many games,such as the Prisoner's Dilemma we shall meet below, do not meet thiscriterion; the players have conjoined interests. And even a game which, atfirst glance, appears to be zero-sum may not be. For example, a situation inwhich there are two opposing companies striving for a share of the marketmay appear to be zero-sum and their interests completely opposed. But that isonly true if the size of their market is fixed. If (for instance)they can expand the market by cooperating, then the game ceases to be zero-sum.And that changes thesituation completely.

There is also a slight problem if the numbers in the results table areaverage payouts. Suppose, for instance, that the above game, the oddsand evens game, has two "phases." In the first phase, you playodds and evens. The winner in the first phase plays a second phase, in whichhe rolls a single die. If it comes up 2, 3, 4, 5, or 6, the player earns $2.But if he rolls a 1, he loses $4. The average value of this game is $1, soin terms of payouts, we haven't changed the game at all. But in terms of danger,we've altogether changed things. Suppose you come in with only $3 in your bank.In all likelihood, you could play "regular" odds and evens for quitea long time without going bankrupt. But in the modified two-phase game, thereis one chance in 12 that you will go bankrupt on the first turn -- and thateven though you won against your opponent! Sure, in the long run it wouldaverage out, if you had a bigger initial bankroll -- but that's no help if yougo bankrupt early on.

This sort of thing can affect a player's strategy. There are two ways thiscan happen -- though both involve the case where only some of the results havesecond phases. Let's take our example above, and make one result and oneresult only lead to the second phase:

B Plays
1 -1
-1 1

That is, if both players play a dime, then B "wins" but has toplay our second phase where he risks a major loss.

(Note that this simple payoff matrix applies only to zero-sum games, wherewhat one player loses is paid to the other. In a game which is not zero-sum,we have to list the individual payoffs to A and B in the same cell, because one maywell gain more than the other loses.)

Now note what happens: If B has a small bankroll, he will want to avoidthis option. But A is perhaps trying to force him out of the game. ThereforeB will wish to avoid playing the dime, and A will always want to play the dime.Result: Since B is always playing the quarter, and A the dime, A promptlydrives B bankrupt because B wanted to avoid bankruptcy!

The net result is that, to avoid being exploited, B has to maintain thestrategy he had all along, of playing Dime and Quarter randomly. Or, at least,he has to play Quarter often enough to keep A honest. This is a topic we willcover below, when we get to the Quantal Response Equilibrium. The real pointis that any game can be more complicated than it seems. But we have enoughcomplexities on our hands; let's ignore this for now.

It's somewhere around here that the attempt to connect gametheory and textual criticism was made. Game theory helps us todetermine optimal strategies. Could it not help us to determinethe optimal strategy for a scribe who wished to preserve thetext as well as possible?

We'll get back to that, but first we have to enter a very bigcaution. Not all games have such a simple Nash Equilibrium. Let's change therules. Instead of odds and evens, with equal payouts, we'll say thateach player puts out a dime or a quarter, and if the two coins match, Agets both coins; if they don't match, the payout goes to B. Thissounds like a fair game; if the players put out their coins at random,then one round in four will results in two quarters being played (50 centwin for A), two rounds in four will result in one quarter and one dime(35 cent payout to B), and one round in four will result in two dimes (20cent payout to A). Since 50+20=35+35=70, if both players play equal andrandom strategies, the game gives an even payout to both players.

But should both players play at equal numbers of dimes and quartersrandom? We know they should play at random (that is, that each shoulddetermine randomly which coin to play on any given turn); if one player doesn'tpick randomly,then the other player should observe it and react accordingly (e.g. if A playsquarters in a non-random way, B should play his dime according to the samepattern to increase his odds of winning). But playing randomly doesnot imply playing each strategy the same number of times.

Now the formulas we listed above come into play. Ourpayoff matrix for this game is:

B Plays
20 -35
-35 50

So, from the formula above, the value of the game is(20*50 - (-35*-35))/(20+50 -(-35) -(-35)) = (1000-1225)/(140) = -225/140 = -45/28,or about -1.6. In other words, if both players play their optimal strategies,the payoff to B averages about 1.6 cents per game. The game looks fair, butin fact is slightly biased toward B. You can, if you wish, work out the correctstrategy for B, and try it on someone.

And there is another problem: Human reactions. Here, we'll take an actualreal-world game: Lawn tennis. Tennis is one of the few sports with multipleconfigurations (men's singles, women's singles, men's doubles, women's doubles,mixed doubles). This has interesting implications for the least common of theforms, mixed doubles. Although it is by no meansalways true that the male player is better than the female, it is usuallytrue in tennis leagues, including professional tennis. (This because players willusually get promoted to a higher league if they're too good for the competition.So the best men play with the best women, and the best men are better.)So a rule of thumb evolved in the sport, saying "hit to the woman."

It can, in fact, be shown by game theory that this rule is wrong. Imaginean actual tennis game, as seen from the top, with the male players shown as M (forman or monster, as you prefer) and the female as W (for woman or weaker, again asyou prefer).

+-+-------------+-+| |             | || |             | || +------+------+ || |      |      | || |  M   |   W  | || |      |      | |+-+------+------+-+| |      |      | || |  W   |   M  | || |      |      | || +------+------+ || |             | || |             | |+-+-------------+-+

Now at any given time player A has two possible strategies, "play to the man" or"play to the woman." However, player B also has two strategies:"stay" or "cross." To cross means for the man to switchover to the woman's side and try to intercept the ball hit to her. (Inthe real world, the woman can do this, too, and it may well work -- themixed doubles rule is that the man loses the mixed doublesmatch, while the woman can win it -- but that's a complication we don'treally need.)

We'll say that, if A hits the ball to the woman, he wins a point, but ifhe hits to the man, he loses. This is oversimplified, but it's the ideabehind the strategy, so we can use it as a starting point.That means that our results matrix is as follows:

B Plays
  Stay   Cross 
-1 1
1 -1

Obviously we're basically back in the odds-and-evens case: The optimalstrategy is to hit 50% of the balls to M and 50% to W. The tennis guidelineto "hit to the woman" doesn't work. If you hit more than 50%of the balls to the woman, the man will cross every time, but if you hitless than 50% to the woman, you're hitting too many to the man.

But -- and this is a very big but -- the above analysis assumesthat both teams are mathematically and psychologically capable of playingtheir optimal strategies. When dealing with actual humans, as opposed tocomputers, this is rarely the case. Even if a person wants to playthe optimal strategy, and knows what it is, a tennis player out on thecourt probably can't actually randomly choose whether to cross or stay.And this ignores psychology. As Rapoport-Strategy says on pages 74-75, "The assumptionof 'rationality' of the other is inherent in the theory of the zero-sum game....On the other hand, if the other is assumed 'rational' but is not, theminimax strategy may fail to take advantage of the other's 'irrationality.'But the irrationality can be determined only by means of an effective descriptivetheory.... Experimental investigations of behavior in zero-sum games haveestablished some interesting findings. For the most part, the minimaxsolution is beyond the knowledge of subjects ignorant of game theory....In some cases, it has been demonstrated that when plays of the same game arerepeated, the subject's behavior is more consistently explained by a stochasticlearning theory rather than by game theory."

To put this in less technical language, most people remember failures better thansuccesses. (Davis, p. 71, notes that "people who feel they have won somethinggenerally try to conserve their winnings by avoiding risks. In an identical situaition,the same people who perceive that they have just lost something will take risks theyconsidered unacceptable before, to make themselves whole.".)If a player crosses, and gets "burned" for it, it'slikely that he will back off and cross less frequently. In the real world,in other words, you don't have to hit 50% of the shots to the manto keep him pinned on his own side. Binmore says on p. 22 that "Gametheory escapes the apparent infinite regression... by appealing to the ideaof a Nash equilibrium." But even if the players know there is a Nashequilibrium, that doesn't mean they are capable of applying the knowledge.

So how many do you have to hit to the man? This is the wholetrick and the whole problem. As early as 1960, the Nobel-winning game theoristThomas C. Schelling was concerned with this issue, but could not reachuseful conclusions (Rapoport-Strategy, p. 113). Len Fisher (in Rock, Paper,Scissors: Game Theory in Everyday Life, Basic Books, 2008, p. 79),referring back to Schelling's work, mentions the "Schelling Point,"which, in Schelling's description, is a "focal point for each person'sexpectation of what the other expects him to expect to be expected to do."(And economists wonder why people think economics is confusing!)

More recently, Thomas Palfreyrefers to the actual optimal strategy for dealing with aparticular opponent as the "quantal response equilibrium."(Personally, I call it the "doublethink equilibrium." It'swhere you land after both players finish second-guessing themselves.)

The problem of double-thinking was recognized quite early in the historyof game theory by John Maynard Keynes, who offered the quite sexist exampleof contemporary beauty contests, where the goal was not to pick the mostbeautiful woman but the woman whom the largest number of others would declareto be beautiful. Imagine the chaos that results if all the many competitorsin such a contest are trying to guess what the others will do!

This should be sufficient reason to show why, to the misfortuneof those who bet on these things,there is no universal quantal response equilibrium.In the tennis case above, there are some doubles players who like to cross; youwill have to hit to the man a lot to pin them down. Others don't liketo cross; only a few balls hit their way will keep them where theybelong. (The technical term for this is "confirmation bias,"also known as "seeing what you want to see" -- a phenomenonby no means confined to tennis players. Indeed, one might sometimeswonder if textual critics might, just possibly, occasionally beslightly tempted to this particular error.)Against a particular opponent, there is a quantal responseequilibrium. But there is no general QRE, even in the casewhere there is a Nash Equilibrium.

We can perhaps make this clearer by examining another game, knownas "Ultimatum." In this game, there are two players andan experimenter. The experimenter puts up a "bank" -- say,$100. Player A is then to offer Player B some fraction of the bankas a gift. If B accepts the gift, then B gets the gift and A getswhatever is left over. If B does not accept the gift, thenthe experimenter keeps the cash; A and B get nothing. Also, for thegame to be fully effective, A and B get only one shot; once theyfinish their game, the experimenter has to bring in another pair ofplayers.

This game is interesting, because, theoretically, B should takeany offer he receives. There is no second chance; if he turns downan offer of, say, $1, he gets nothing. But it is likely that Bwill turn down an offer of $1. Probably $5 also. Quitepossibly $10. Which puts an interesting pressure on A: Althoughtheoretically B should take what he gets, A needs to offer up enoughto gain B's interest. How much is that? An interesting question --but the answer is pure psychology, not mathematics.

Or take this psychological game, described by Rapoport-Strategy, p. 88, supposedlybased on a true story of the Pacific War during World War II. The rule then,for bomber crews,was that they had to fly thirty missions before being retired. Unfortunately,the odds of surviving thirty missions were calculated as only one in four.The authorities did come up with a way to improve those odds: Theycalculated that, if they loaded the planes with only half a load of fuel,replacing the weight with bombs, it would allow them to drop just as manybombs as under the other scenario while having only half as many crews fly.The problem, of course, is that the crew would run out of fuel and crash after dropping the bombs. So the proposal was to draw straws: Half the crews wouldfly and drop their bombs and crash (and presumably die, since the Japanesedidn't take prisoners). The other half of the crews would be sent homewithout ever flying.

Theoretically, this was a good deal for everyone: The damage done toJapan was the same, and the number of bomber crew killed was reduced.It would save fuel, too, if anyone cared. But no one was interested.

(Note: I don't believe this story for a minute. It's a backward versionof the kamikaze story. But it says something about game psychology: TheJapanese were willing to fly kamikazes. The Americans weren't, eventhough their bomber crews had only slightly better odds than the suicidebombers. However, though this story is probably false, it has been shownthat people do think this way. There is a recent study -- unfortunately,I only heard about it on the news and cannot cite it -- which offered peoplea choice between killing one person and killing five. If they did not have topersonally act to kill the one, they were willing to go along. But theyhad a very hard time pulling the trigger. This is in fact an old dilemma;Rapoport-Strategy, p. 89, describes the case where a mother had to choose which oneof her sons to kill; if she did not kill one, both would die. Often the motheris unable to choose.)

What's more, even a game which should have an equilibrium can"evolve" -- as one player starts to understand the other'sstrategy, the first player can modify his own, causing the strategiccalculation to change. This can happen even in a game which on its faceshould have a stable equilibrium (Binmore, p. 16).

Another game, described by John Allen Paulos, (A MathematicianPlays the Stock Market, pp. 54-55, shows even more the extent to whichpsychology comes into play. Professor Martin Shubik would go into hisclasses and auction off a dollar bill. Highest bidder would earn thebill -- but the second-highest bidder was also required to pay off onhis bid. This had truly interesting effects: There was a reward ($1)for winning. There was no penalty for being third or lower. But the#2 player had to pay a fee, with no reward at all. As a result,players strove intensely not to be #2. Better to pay a little moreand be #1 and get the dollar back! So Shubik was able to auction hisdollar for prices in the range of $4. Even the winner lost, but helost less than the #2 player.

In such a game, since the total cost of the dollar is the amountpaid by both the #1 and #2 player, one should never see a bidof over .51 dollar. Indeed, it's probably wise not to bid at all.But once one is in the game, what was irrational behavior when thegame started becomes theoretically rational, except that the cyclenever ends. And this, too, is psychology.

(Note: This sort of trap didn't really originatewith Shubik. Consider Through the Looking Glass. In thechapter "Wool and Water," Alice is told she can buyone egg for fivepence plus a farthing, or two eggs for twopence --but if she buys two, she must eat both. Also, there is a sort of auction,the "Vickrey auction," which sounds similar although it isin fact different: All bidders in an auction submit sealed bids. Thecompetitor submitting the highest bid wins the auction -- but pays theamount submitted by the second-highest bidder. Thus the goal is theinteresting one of trying to be the high bidder while attempting to makesure that you are willing to pay whatever your closest competitor bids!And there is a biological analogy -- if two males squabble over a female,both pay the price in time and energy of the contest, but only one gets to mate.)

In addition to the Dollar Auction in which only the top two biddershave to pay, Binmore, p. 114, mentions a variant, the "all-pay" auction,in which every bidder is required to pay what he has bid. This is hardlyattractive to bidders, who will usually sit it out -- but he notes areal-world analogy: Corrupt politicians or judges may be bribed by allparties, and may accept the bribes, but will only act on one of the bribes(presumably the largest one).

We might add that, in recent years, there has been a good bit ofresearch about the Dollar Auction.There are two circumstances under which, theoretically, it isreasonable to bid on the dollar -- if you are allowed to bid first.Both are based on each player being rational and each player havinga budget. If the two budgets are equal, then the first bidder shouldbid the fractional part of his budget -- e.g. 66 cents if the budgetis $1.66; 34 cents if the budget is $8.34, etc. If the second bidderresponds, then the first bidder will immediately go to thefull amount of the mutual budget, because that's where all dollarauctions will eventually end up anyway. Because he has bid, it'sworthwhile for him to go all-out to win the auction. The secondbidder has no such incentive; his only options are to lose or tospend more than a dollar to get a dollar. So a rational second bidderwill give in and let the first bidder have it for the cost of theinitial bid. The other scenario is if both have budgets and thebudgets differ: In that case, the bidder with the higher budgetbids one cent. Having the larger budget, he can afford to outbidthe other guy, and it's the same scenario as above: The secondbidder knows he will lose, so he might as well give in withoutbidding. In the real world, of course, it's very rare to knowexactly what the other guy can afford, so such situationsrarely arise. Lacking perfect information, the Dollar Auction isa sucker game. That's usually the key to these games: Information.To get the best result, you need to know what the other guyintends to do. The trick is to find the right strategy if youdon't know the other guy's plan.

The Dollar Auction is not the only auction game where the winner canregret winning. Binmore, p. 115, notes the interesting case of oil leases,where each bidder makes a survey in advance to try to determine the amountof oil in the field -- but the surveys are littlemore than educated guesses. The bidders probably get a list of estimateswhich varies widely, and they make their bids accordingly. The winningbidder will probably realize, once he wins, that the estimate he was workingfrom was probably the most optimistic -- and hence likely to be too high.So by winning, he knows he has won a concession that probably isn't worthwhat he bid for it!

DIGRESSION: I just read a biology book which relates the NashEquilibrium to animal behavior -- what are called "EvolutionaryStable Strategies," though evolution plays no necessary part inthem: They are actually strategies which maintain stable populations.The examples cited had to do with courtship displays, andparasitism, and such. The fact that the two notions were developedindependently leads to a certain confusion. Obviously the Nash Equilibriumis a theoretical concept, while the evolutionary stable strategy (ESS) isregarded as "real world." Then, too, the biologists' determinationof ESS are simply equilibria determined mostly by trial and error usingrather weak game theory principles -- although Davis, p. 140, observes thatthere is a precise definition of an evolutionarily stable strategy relativeto another strategy. Often the ESS is found by simulationrather than direct calculation. There is, to be sure, nothing wrong withthat, except that the simulation can settle on an equilibrium other thanthe Nash Equilibrium -- a NashEquilibrium is deliberately chosen, which the biological states aren't.So sometimes they go a little off-track.
More to the point, an ESS is genetically determined, and anESS can be a mixed strategy (the classic example of this is considered tobe "hawk" and "dove" mating behavior -- hawks fighthard for mates, and get more mates but also die younger because they fightso much; doves don't get as many mates per year but survive to breed anotherday. So hawks and doves both survive). Because the strategy is mixed, and becausegenes get shuffled in every generation, the number of individuals of each typecan get somewhat off-balance. Game theory can beused to determine optimal behavior strategies, to be sure --but there are other long-term stable solutions which also come up innature despite not representing true Nash Equilibria. I haven't noticedthis much in the number theory books. But many sets of conditions havemultiple equilibria: One is the optimal equilibrium, but ifthe parties are trying to find it by trial and error, they may hit analternate equilibrium point -- locally stable while not the ideal strategy.Alternately, because of perturbations of one sort or another,equilibrium situation can also sort of cycle around the Nashequilibrium. This is particularly true when the opponents are separatespecies, meaning that DNA cannot cross. If there is only one speciesinvolved, the odds of a Nash Equilibrium are highest, since the genescan settle down to de facto cooperation. With multiple species, it iseasy to settle into a model known as "predator-prey," whichgoes back to differential equations and predates most aspects of gametheory.To understand predator-prey, think, say, foxes and hares. There issome stable ratio of populations -- say, 20 hares for each fox. If the numberof foxes gets above this ratio for any reason, they will eat too many hares,causing the hare population to crash. With little food left, the fox populationthen crashes. The hares, freed of predation by foxes, suddenly become freeto breed madly, and their population goes up again. Whereupon the fox populationstarts to climb. In a predator-prey model, you get constant oscillation, suchas shown in the graph -- in this case, the foxes are going through theircycle with something of a lag behind the hares. It's an equilibrium of adifferent sort. This too can bestable, as long as there is no outside disturbance, though there is a certaintendency for the oscillation to damp toward the Nash Equilibrium. But, becausethere are usually outside disturbances -- a bad crop of carrots, people huntingthe foxes -- many predator-prey scenarios do not damp down. It probably needsto be kept in mind that these situations can arise as easily as pure equilibriumsituations, even though they generally fall outside the range of pure game theory.
The predator-prey scenario of cycling populations has many other real-worldanalogies, for example in genetic polymorphisms (the tendency for certain traitsto exist in multiple forms, such as A-B-O blood types or blue versus brown eyes;see the article on evolution and genetics). TakeA-B-O blood, for example. Blood types A, B, and AB confer resistance to cholera,but vulnerability to malaria; type O confers resistance to malaria butvulnerability to cholera. Suppose we consider a simplified situation where theonly blood types are A and O. Then comes a cholera outbreak. The population oftype O blood is decimated; A is suddenly dominant -- and, with few type Oindividuals to support it, the cholera fades out with no one to infect. Butthere are many type A targets available for malaria to attack. Suddenly thepopulation pressure is on type A, and type O is free to expand again. Itcan become dominant -- and the situation will again reverse, with type A beingvaluable and type O undesirable. This is typically the way polymorphisms work:Any particular allele is valued because it is rare, and will tend toincrease until it ceases to be rare. In the long run, you end up with amixed population of some sort.
This discussion could be much extended. Even if you ignore polymorphisms andseek an equilibrium, biologists and mathematicianscan't agree on whether the ESS or the Nash Equilibrium is the morefundamental concept. I would argue for the Nash Equilibrium, because it'sa concept that can apply anywhere (e.g. it has been applied to economicsand even international politics). On the other hand, the fact that one canhave an ESS which is not a Nash Equilibrium, merely an equilibrium in anparticular situation, gives it a certain scope not found in the morerestricted Nash concept. And it generally deals with much larger populations,rather than two parties with two strategies.
It should also be recalled that, in biology, these strategies are onlyshort-term stable. In the long term (which may be only a few generations),evolution will change the equation -- somehow. The hare might evolve tobe faster, so it's easier to outrun foxes. The fox might evolve bettersmell or eyesight, so as to more easily spot hares. This change will forcea new equilibrium (unless one species goes extinct). If the hare runs faster,so must the fox. If the fox sees better, the rabbit needs better disguise.This is called the "red queen's race" -- everybody evolving as fastas they possibly can just to stay in the same equilibrium, just as the RedQueen in Through the Looking Glass had to run as fast as she couldto stay in the same place. It is, ultimately, an arms race with no winners;everybody has to get more and more specialized, and devote more and moreenergy to the specialization, without gaining any real advantage. But thespecies that doesn't evolve will go extinct, because the competition isevolving. Ideally, of course, there would be a way to just sit still andhalt the race -- but nature doesn't allow different species to negotiate....It is one of the great tragedies of humanity that we've evolved a competitiveattitude in response to this ("I don't have to run faster than ajaguar to avoid getting killed by a jaguar; I just have to run faster thanyou"). We don't need to be so competitive any more; we'resurpassed all possible predators. But, as I write this, Israelis andmembers of Hezbollah are trying to show whose genes are better inLebannon, and who cares about the civilians who aren't members of eithertribe's gene pool?

Let's see, where was I before I interrupted myself? Ah, yes, havinginformation about what your opponent's strategy is likely to be.Speaking of knowing what the other guy intends to do, thattakes us to the most famous game in all of game theory,the "Prisoner's Dilemma." There are a zillion variationson this -- it has been pointed out that it is, in acertain sense, a "live-fire" version of the Golden Rule.(Although, as Davis points out on p. 118, under the Golden Rule, there isan underlying assumption that both you and your neighbour are part of asingle community -- which makes it a different game.) Dawkins, p. 203,declares that "As a biologist, I agree with Axelrod and Hamilton thatmany wild animals and plants are engaged in ceaseless games of Prisoner'sDilemma, played out in evolutionary time."It's so well-known that most books don't even describe where it camefrom, although Davis, p. 109, attributes the original version to oneA. W. Tucker.

What follows is typical of the way I've encountered the game, witha fairly standard set of rules.

Two members of a criminal gang are taken into custody for some offence --say, passing counterfeit money. The police can't prove that they did thecounterfeiting; only that they passed the bills. Not really a crime if theyare innocent of creating the forged currency. The police need someone totalk. So they separate the two and make each one an offer: Implicate theother guy, and you get a light sentence. Don't talk, and risk a heavy sentence.

A typical situation would be this: If neither guy talks, they both get four years.If both talk, they both get six years in prison. If one talks and the otherdoesn't, the one who talks gets a two year term and the one who kept hismouth shut gets ten years in prison.

Now, obviously, if they were working together, the best thing todo is for both to keep their mouths shut. If they do, both get off lightly.

But this is post-Patriot Act America, where they don't just shine the lightsin your eyes but potentially send you to Guantanamo and let you rot withouteven getting visits from your relatives. A scary thought. And the two can'ttalk together. Do you really want to risk being carted off for years -- maybeforever -- on the chance that the other guy might keep his mouth shut?

Technically, if you are playing Prisoner's Dilemma only once, as in theactual prison case outlined, the optimal strategy is to condemn the other guy.The average payoff in that case is five years in prison (that being the averageof four and six years). If you keep your mouth shut, you can expect eight years ofimprisonment (average of six and ten years).

This is really, really stupid in a broader sense: Simply by refusing tocooperate, you are exposing both yourself and your colleague to a greaterpunishment. But, without communication, it's your best choice: The NashEquilibrium for one-shot Prisoner's Dilemma is to have both players betrayeach other.

This is the "story" version of Prisoner's Dilemma. You can alsotreat it simply as a card game. It's a very dull game, but you can do it. You needtwo players, four cards, and a bank. Each player has a card reading "Cooperate"and a card reading "Defect." Each picks a card; they display them at thesame time. If one player chooses "Cooperate" and the other chooses"Defect," the cooperator gets payoff A, the defector payoff B.If each plays "Cooperate," they get payoff C. If bothdefect, each gets payoff D. The payoffs A, B, C, and D are your choice, exceptthat it must be true that B > C > D > A. (If the payoffs are inany other order, the game is not a Prisoner's Dilemma but something else.)

Now suppose you're one player. Should you choose cooperate or defect? Sincethe best payoff is to cooperate, that's what you to choose that if your opponent does thesame. But you don't know what he is going to do. If you choose cooperateand he chooses defect, you're you-know-whated. If you choose defect, at the veryleast, you won't get the sucker payoff and won't come in last. So your best choiceis to choose to defect. Once again we see that the Nash Equilibrium of this game(dull as it is when played with cards) is to have both players betray each other.

Which mostly shows why game theory hasn't caused the world economy to suddenlywork perfectly: It's too cruel. This problem has caused many attempts to explainaway the dilemma. Indeed, for yearspeople thought that someone would find a way to get people to play the reasonablestrategy (nobody talks) rather than the optimal strategy (both talk). Davis, p. 113,notes that most of the discussion of Prisoner's Dilemma has not denied the result buthas tried to justify overturning the result. The failure to find a justificationfor cooperation has produced diverse reactions: One claim was that the only way the universecan operate is if it's "every individual for himself" -- John von Neumannindeed interpreted the results to say that the United States should have started WorldWar III, to get the Soviets before they could attack the west. At the far extreme isBinmore, who (p. 19) calls Prisoner's Dilemma a "toy game" and says that itcannot model the real world because if it were actually correct, then socialcooperation could not have evolved.

It's not really relevant to textual criticism, but it seems nearly certain thatboth views are wrong. Biologists have shown that it can often benefit an individual'sgenes to help one's relatives -- in other words, to contribute to the social group. It'snot one prisoner against the world, it's one group against the world. The trickis to define the group -- which is where wars start. (The parable of the Good Samaritanin fact starts with a version of this question: "who is my neighbour?")Conservatives generally define theirgroup very restrictively (opposing immigration and welfare, e.g.), liberals much moreloosely (sometimes including the whole human race, even if it means giving welfare topeople who won't do any work). In fact it is a worthwhile question to ask what is theoptimal amount of altruism required to produce the greatest social good. But, somehow,every politician I've ever heard has already decided that he or she knows the right answerand isn't interested in actual data....

But that's going rather beyond the immediate purview of game theory. If we goback to the game itself, the bottom line is, there is no way, in Prisoner's Dilemma, to inducecooperation, unless one rings in an additional factor such as collectiveinterest or conscience. (This requires, note, that we "love our neighboursas ourselves": It works only if helping them profit is something wevalue. But this is actually to change the rules of the game.)The closest one can come, in Prisoner's Dilemma, is if the game isplayed repeatedly: If done with, say, a payoff instead of punishment, players maygradually learn to cooperate. This leads to the famous strategy of "tit fortat" -- in an effort to get the other guy to cooperate, you defect in responseto his defection, and cooperate in response to his cooperation (obviously doing itone round later).

Binsmore, p. 20, makes an important point here: Don't get caught up in thedescription of two prisoners under interrogation. That is not the game. It'sa story. The actual game is a simply a set of two strategies for each of two players,with four possible outcomes and payoffs in the order described above.

Before we proceed, we should note that the motivations for repeatedPrisoner's Dilemma are very different from a one-shot game. if you playPrisoner's Dilemma repeatedly, you are looking for (so to speak) an"investment strategy" -- the best payoff if you play repeatedly.In such a case, successes and failures may balance out. Not if you playonly once -- there, you may well play the strategy that has the fewestbad effects if you lose.

In effect, playing Prisoner's Dilemma repeatedly creates a whole new game.Where one-round Prisoner's Dilemma has only a single possibility -- cooperateor defect -- multi-round has a multi-part strategy: You decide what to do onthe first round (when you have no information on what the other guy does),and then, in every round after that once you have gained information,you decide on a strategy for what to do based on the other guy's previous moves.And Binsmore observes that, in repeated Prisoner's Dilemma, the Nash Equilibriumshifts to a strategy of cooperation. But, to repeat, this is a different game.

This is a very good demonstration of how adding only slightly to the complexityof the game can add dramatically to the complexity of the strategies. One-shotPrisoner's Dilemma has only four outcomes: CC, CD, DC, DD. But the multi-partgame above has at least 64 just for two rounds. For player A, they are as follows:
Cooperate on first turn; after that, if B cooperates on previous turn, then cooperate
Cooperate on first turn; after that, if B cooperates on previous turn, then defect
Cooperate on first turn; after that, if B defects on previous turn, then cooperate
Cooperate on first turn; after that, if B defects on previous turn, then defect
Defect on first turn; after that, if B cooperates on previous turn, then cooperate
Defect on first turn; after that, if B cooperates on previous turn, then defect
Defect on first turn; after that, if B defects on previous turn, then cooperate
Defect on first turn; after that, if B defects on previous turn, then defect

Since A and B both have 8 strategies, that gives us 64 possible outcomes. And ifthey take into account two previous turns, then the number of outcomes increases stillmore. The strategy with the highest payoff remains to have both cooperate -- but thatdoesn't give us a winner, merely a higher total productivity.

Rapoport-Strategy, pp. 56-57, makes another point, about the game where the prisoners plan in advanceto cooperate: That any sort of communication or cooperation works properly only if subjectto enforceable agreement. That is, someone needs to make sure that theplayers do what they say they will do. If you don't, observe that you have simplymoved the problem back one level of abstraction: It's no longer a question of whetherthey cooperate or defect, but a question of whether they do what they say they willdo or lie about it. Davis, p. 106, describes a game in which players who communicatecan make an agreement to maximize their reward, but in which it is not possible topredict what that agreement will be; it depends in effect on how they play a secondarygame -- negotiation.

In any case, this too is a change in the rules of the game. Remember, in our originalversion, the prisoners could not communicate at all.And, to put this in the context of textual criticism, how do you enforcean agreement between a dead scribe and a living critic? They can't even communicate,which is a key to agreements!

(This, incidentally, leads us to a whole new class of game theory and economicsissues: The "agent problem." There are many classes of game in which thereare two players and an intermediate. An example offered by Rapoport-Strategy is the case ofsomeone who is seeking to sell a house, and another person seeking to buy. Supposethe seller is willing to sell for, say, $200,000, and the buyer is willing to buya house like the seller's for as much as $250,000. The logical thing would be tosell the house for, say, $225,000. That gives the seller more than he demanded,and the buyer a lower price than he was willing to pay; both are happy. Indeed,any solution between $200,000 and $250,000 leaves both satisfied. But theycannot sell to each other. Between them stands a realtor -- and if the realtordoesn't like the $225,000 offer, it won't get made. The agent controls thetransaction, even though it is the buyer and the seller who handle most of themoney. Problems of this type are extremely common; agents are often the expertsin a particular field -- investment fund managers are a typical example. In somecases, the agent facilitates an agreement. But in others, the agent can distortthe agreement badly.)

(This also reminds us of the problem of "rational expectations."We got at this above with the tennis example: What people do versuswhat they ought to do. Much of economics is based on the hypothesisthat people pursue the rational course -- that is, the one that is most likelyto bring them the highest payoff. But, of course, people's behavior is notalways rational. Advertising exists primarily to cause irrational behavior,and individual likes and dislikes can cause people to pursue a course whichis officially irrational -- and, in any case, most of the time most of us donot know enough to choose the rational course. Hence we employ agents. Andhence the agent problem.)

(As a further digression, the above is another example of how the Nashequilibrium comes about: It's the point that maximizes satisfaction. Definethe seller's satisfaction as the amount he gets above his minimum $200,000.For simplicity, let's write that as 200, not $200,000. If his satisfactionis given as x, where x is the number of thousands of dollars above 200, thenthe buyer's satisfaction is given as 50-x. We take the product of these --x(50-x) -- and seek to maximize this. It turns out, in this case, that x=25,so that our intuitive guess -- that the ideal selling price was $225,000 --was correct. It should be noted, however, that this situation is far fromguaranteed. We hit agreement at the halfway point in this case because wedescribed a "seller's market" and a situation where both playerswere simply counting dollars. Not all bargaining situations of are of thissort. Consider for instance the "buyer's market." In thatsituation, the buyer wants the best deal possible, but the seller may wellhave a strong irrational urge to get as close to the asking price aspossible. The lower the price, the more firmly the buyer resists. Supposethat we reverse our numbers: The seller listing the home for $250,000, andthe buyer making an initial offer of $200,000. If both had the samepsychological makeup with regard to money, they would settle on $225,000,as above. But, since the buyer really wants something close to his listprice, we're likely to end up at a figure closer to $240,000. Exactly wheredepends, of course, on the makeup of the individuals. Maybe we can expressit mathematically, maybe not. That's what makes it tricky....)

All the problems here do lead to an interesting phenomenon known as NashBargaining. In a non-zero-sum game, there is a simple mathematical way todetermine the "right" answer for two parties. It lies in determiningthe point at which the product of the two players' utility functions is atmaximum. Of course, determining utility functions is tricky -- one of thereasons why economics struggles so much. It is easier to envision this purelyin terms of money, and assume both parties value money equally. Take, say, theUltimatum game above, where the two have to split $100. If one player ends upwith x dollars, then the other gets 100-x dollars. So we want to choose xso that x(100-x) has the maximum value. It isn't hard to show that this is atx=$50, that is, each player gets $50. So the product of their utilities is50x50=2500. If they split $40/$60, the product would be 40x60=2400, so this is aless fair split. And so forth.

The above answer is intuitively obvious. But there are examples which aren'tso easy. For example, suppose a husband and wife have a combined income of $2000per week. Should they just each take $1000 each and spend it as they wish? Notnecessarily. Let's say, for instance, that the husband has to travel a long distanceto work each day; this costs him $25 per week. The wife spends only $5 on work-relatedtravel per week, but she needs a larger wardrobe and has to spend $100 for clothing.She also buys the week's food, and that costs another $100. The man, they haveconcluded, must pay the mortgage, which works out to $200 per week. The man wantscable television -- $20 per week -- but the wife does not particularly care. The wifeearns 20% more per week than the husband, so it is accepted that her share of theavailable spending money should berather more than his, although not necessarily 20% more, because of the cable TVargument. So if x is the fraction of $2000 given to the husband, and 2000-x is theamount given to the wife, what is the optimal x? Not nearly as obvious as in theeven-split case, is it? Here is how we would set this up. The husband's actual cashto spend is x-$25-$200, or x-$225. The wife's is ($2000-x)-$100-$100-5, or $1795-x.However, in light of the above, the "value" of a dollar is 1.2 times as much tothe husband as to the wife, but he has to subtract $20 from his post-division amountfor cable TV. So the man's utility function is 1.2(x-$225)-$20, which simplifiesdown to 1.2x-$290. The woman's is $1795-x. So we want to find the value of x whichgives us the maximum value for (1.2x-290)(1795-x), or 2154x-1.2x2-520550+290x,or 1.2x2+2444x-520550. x must, of course, be between 0 and 2000. Roundingto the nearest dollar, this turns out to be $1019 -- that is, the man gets $1019per week, the woman $981. This seems rather peculiar -- we said that they agreed thatthe wife should get more -- but remember that the man pays slightly more of the familybills, plus money is worth more to him. So his extra spending money is of greater value.The oddity is not in the result, it's in the utility function. Which is why Nashbargaining sounds easy in concept but rarely works out so easily.

(If the above seems too complicated to understand, Binmore, pp. 146-147,has a simpler version involving a husband and wife and how they divide housework.The husband thinks that one hour of housework a day is sufficient. The wife thinkstwo hours per day are required. How much housework does each do to share the job"fairly?" It turns out that it is not an even split; rather, the husbanddoes only half an hour a day; the wife does an hour and a half. This is perhapsmore intuitive than the preceding: The husband does exactly half of what he feelsneeds to be done. The wife does everything required beyond that to do whatshe needs to be done. It is not an even split. From her standpoint, it probablyis not equitable. But at least the husband is doing some of the work.So both are better off than if they just fought about it.)

Problems in utility and psychology help explain whyplaying a game like Prisoner's Dilemma repeatedlydoesn't work perfectly in real life. (And, yes, it does happen in real life --Davis, p. 110, gives the example of two companies who have the option of engagingin an advertising war. If both advertise lightly, they split the market. If oneadvertises heavily and the other lightly, the advertiser wins big. But if theyboth advertise heavily -- they still split the same market, but their advertisingcosts are higher and their profits lower. Since they have to decide advertisingbudgets year by year, this is precisely an iterated version of Prisoner's Dilemma.The optimal strategy is to have both advertise lightly, year after year; theequilibrium strategy is to advertise heavily.)

In theory, after a few rounds,players should always cooperate -- and in fact that's what happens with truerational players: Computer programs. Robert Axelrod once held a series ofPrisoner's Dilemma "tournaments," with various programmers submittingstrategies. "Tit for tat" (devised, incidentally, by Rapoport)was the simplest strategy -- but it alsowas the most successful, earning the highest score when playing against theother opponents.

It didn't always win, though -- in fact, it almost certainly would not beat anygiven opponent head-to-head; it was in effect playing for a tie. But ties are a goodthing in this contest -- the highest scores consistently came from"nice" strategies (that is, those thatopened by cooperating; Davis, p. 147). On the other hand, there were certain strategies which, thoughthey didn't really beat "tit for tat," dramatically lowered its score.(Indeed, "tit for tat" had the worst score of any strategy when competing against an opponentwho simply made its decisions at random; Davis, p. 148.)When Axelrod created an "evolutionary" phase of the contest, eliminatingthe weakest strategies, he found that "Tit for tat" was the likeliest tosurvive -- but that five others among the 63 strategies were also still around whenhe reached the finish line, although they had not captured as large a share of theavailable "survival slots" (cf. Binmore, p. 81).What's more, if you knew the the actual strategies of your opponents, you couldwrite a strategy to beat them. In Axelrod's first competition, "Tit for tat"was the clear winner -- but Axelrod showed that a particular strategy which waseven "nicer" than "Tit for tat" would have won that tournamenthad it been entered. (The contestsevolved a particular and useful vocabulary, with terms such as "nice,""forgiving," and "envious." A "nice" strategystarted out by cooperating; this compared with a "nasty" strategy whichdefected on the first turn. A strategy could also be "forgiving" or"unforgiving" -- a forgiving strategy would put up with a certainamount of defecting. An "envious" strategy was one which wanted towin. "Tit for tat" was non-envious; it just wanted to securethe highest total payout. The envious strategies would rather go down in flamesthan let someone win a particular round of the tournament. If they went downwith their opponents, well, at least the opponent didn't win.) In the initialcompetition, "Tit for tat" won because it was nice, forgiving, andnon-envious. A rule that was nicer or more forgiving could potentially have doneeven better.

But then came round two. Everyone had seen how well "Tit for tat" haddone, and either upped their niceness or tried to beat "Tit for tat." Theyfailed -- though we note with interest that it was still possible to create astrategy that would have beaten all opponents in the field. But it wasn't thesame strategy as the first time. Axelrod's "Tit for two tats," whichwould have won Round One, wouldn't even have come close in Round Two; the nicenesswhich would have beaten all those nasty strategies in the first round went downto defeat against the nicer strategies of round two: It was too nice.

And humans often don't react rationally anyway -- they're likely to betoo envious. In another early "fieldtest," described in William Poundstone's Prisoner's Dilemma (Anchor,1992, pp. 106-116), experimenters played 100 rounds of Prisoner's Dilemma between amathematician and a non-mathematician. (Well, properly, a guy who had beenstudying game theory and one who hadn't.) The non-mathematician never did really learnhow to cooperate, and defected much more than the mathematician, and in anirrational way: He neither played the optimal strategy of always defecting northe common-sense strategy of always cooperating. He complained repeatedlythat the mathematician wouldn't "share." The mathematician complained that theother fellow wouldn't learn. The outcome of the test depended less on strategythan on psychology. Davis, pp. 51-52, reports many other instances where subjects instudies showed little understanding of the actions of their opponents and pursuednon-optimal strategies, and notes on p. 126 that the usual result of tests of Prisoner'sDilemma was for players to become more and more harsh over time (although outsidecommunication between the players somewhat lessened this).

Davis goes on to look at experiments with games where there was no advantagefor defecting -- and found that, even there, defection was common. It appears, from whatI can tell, that the players involved (and, presumably, most if not all of humanity)prefers to be poor itself as long as it can assure than the guy next door is poorer.The measurement of payoffs, if there is one, is measured not by absolute wealth butby being wealthier than the other guy. And if that means harming the other guy --well, it's no skin off the player's nose. (I would add that this behavior has beenclearly verified in chimpanzees.) (As a secondary observation, I can't help but wonderif anyone tried to correlate the political affiliations of the experimental subjectswith their willingness to play Beggar My Neighbour. Davis, p. 155, mentions a test whichmakes it appear that those who are the most liberal seem to be somewhat more capable ofcooperation than those who are most conservative. But this research seems to have beendone casually; I doubt it is sufficient to draw strong conclusions.)

This sort of problem applies in almost all simple two-person games: People oftendon't seek optimal solutions. (Seethe Appendix for additional information on the other games.)

Which bring this back to textual criticism: Game theory is a system forfinding optimal strategies for winning in the context of a particularset of rules -- a rule being, e.g., that a coin shows heads 50% of thetime and that one of two players wins when two coins match. Game theoryhas proved that zero-sum games with fixed rules and a finite number ofpossible moves do have optimal solutions. Butwhat are the rules for textual criticism? You could apply them, as aseries of canons -- e.g., perhaps, "prefer the shorter reading"might be given a "value" of 1, while "prefer the lessChristological reading" might be given a value of 3. In such acase, you could create a system for mechanically choosing a text. Andthe New Testament is of finite length, so there are only so many"moves" possible. In that case, there would, theoretically,be an "optimal strategy" for recovering the original text.

But how do you get people to agree on the rules?

Game theory is helpless here. This isn't really a game. The scribeshave done whatever they have done, by their own unknown rules.The modern textual critic isn't playing against them; he is simplytrying to figure them out.

It is possible, at least in theory, to define a scribe's goal.For example, it's widely assumed that a scribe's goal is to assure retainingevery original word while including the minimum possible amountof extraneous material. This is, properly, not game theory at all buta field called "utility theory," but the two are close enoughthat they are covered in the same textbooks; utility theory is a topicunderlying game theory. Utility theory serves to assign relative values tothings not measured on a single scale. For example, a car buyer might haveto choose between a faster (or slower) car,a more or less fuel efficient car,a more or less reliable car, and a more or less expensive car.You can't measure speed in dollars, nor cost in miles/kilometers per hour;there is no direct way to combine these two unrelated statistics into onevalue. Utility theory allows a combined calculation of "what it's worthto you."

In an ideal world, there is a way to measure utility. This goes all the way back towhen von Neumann and Morganstern were creating game theory. They couldn't find a propermeasure of payoffs. Von Neumann proposed the notion of best outcome, worst outcome, andlottery: The best possible outcome in a game was worth (say) 100, and the worst was worth0 to the player who earned it. To determine the value of any other outcome to the player,you offered lottery tickets with certain probabilities of winning. For example, would youtrade outcome x for a lottery ticket which gave you a 20% chance of the optimaloutcome? If yes, then the value of x is 20 "utiles." If you would not tradeit for a ticket with a 20% chance of the optimal outcome, but would trade it for a ticketwith a 30% chance, then the value is 30 utiles. And so on. (The preceding is paraphrasedfrom Binmore, p. 8.)

Unfortunately, this has two problems. One is that not everyone agrees on utility. Theother is that some people are natural gamblers and some prefer a sure thing. So the lotteryticket analogy may not in fact measure inherent utility. I bring this up because it showsthat all utility equations are personal.I don't drive fast, so I don't care about a fast car, but I do care about goodgas mileage. But you may be trying to use your car to attract girls (or teenageboys, if you're a Catholic priest). Utility for cars varies.

Davis, pp. 64-65, describes half a dozen requirements for a utility function to work:
1. Everything is comparable. You have to be able to decide whether you like one thingbetter than another, whether it be two models of car or petting a kitten versus watching a sunset.For any two items, either one must be more valuable than the other or they must havethe same value.
2. Preference and indifference are transitive. That is, all values must be in order.If you like Chaucer better than Shakespeare, and Shakespeare better than Jonson, then youmust also like Chaucer better than Jonson.
3. A player is indifferent when equivalent prizes are substituted in a lottery.Since the whole notion of value is built on lotteries and lottery tickets, equivalentprizes must imply equivalent values of the lottery.
4. A player will always gamble if the odds are good enough. This should be true intheory, but of course there are all sorts of practical "a bird in the hand"objections. (There may also be religious objections if we call the contest a "lottery,"but remember that we could just as well call it an "investment.")
5. The more likely the preferred prize, the better the lottery. This, at least, posesfew problems.
6. Players are indifferent to gambling. What this really means is that players don'tcare about the means by which the lottery is conducted -- they're as willing (or unwilling)to risk all on the throw of the dice, or on which horse runs faster, as on the workings ofthe stock market or next year's weather in crop-growing regions.This runs into the same problems as #4.

And so does utility for scribes.We can't know what method the scribe might use to achievemaximum utility. A good method for achieving the above goal mightbe for the scribe, when in doubt about a reading, to consult threereputable copies and retain any reading found in any of the three.But while it's a good strategy, we have no idea if our scribe employedit. (What is more, it has been observed that, in gambling situations,gamblers tend to wager more as the time spent at the poker table orracetrack increases; Davis, p. 70. So even if we knew the scribe'srules at the start of a day's work, who can say about the end?)

Rapoport-Strategy, p. 75, explains why this is a problem: "Here we are inthe realm of the non-zero-sum game. [In the case of textual criticism,this is because the critic and scribe have no interaction at all. Thescribe -- who is long dead! -- doesn't gain or lose by the critic's action,and the critic has no particular investment in the scribe's behavior.]It is our contention that in this context no definition of rationalitycan be given which remains intuitively satisfactory in all contexts.One cannot, therefore, speak of a normative theory in this contextunless one invokes specific extra-game-theoretical considerations....A normative theory of decision which claims to be 'realistic,' i.e.purports to derive its prescripts exclusively from 'objectivereality,' is likely to lead to delusion."

Rapoport-Strategy, pp. 86-87, also gives an example of why this is sodifficult. It would seem as if saving human lives is a "puregood," and so the goal should always be to maximize lives. Andyet, he points out the examples of the highways -- traffic accidents,after all, are the leading cause of death in some demographic groups.The number of highway deaths could certainly be reduced. And yet,society does not act, either because the changes would limitindividual freedom (seat belt rules, stricter speed limits, stricterdrunk driving laws) or because they cost too much (safer highwaydesigns). Thus saving lives is not treated as a pure good; it is simplya factor to consider, like the amount to spend in a household budget.Rapoport does not add examples such as universal health care orgun control or violent crime -- but all are instances where the valueof human lives is weighed against other factors, and a compromise isreached somehow. Then he notes (p. 88) how society often reactsstrenuously when a handful of miners are trapped in a mine. Thus evenif we somehow define the value of a life in one context, the value isdifferent in another context!

This is what is known as the "criterion-trouble." Williams,pp. 22-24, writes, "What is the criterion in terms of which theoutcome of the game is judged? Or should be judged? ... Generallyspeaking, criterion-trouble is the problem of what to measure and howto base behavior on the measurements. Game theory has nothing to say onthe first topic.... Now the viewpoint of game theory is that [Thefirst player] wishes to act in such a manner that the least number hecan win is as great as possible, irrespective of what [the other player]does.... [The second player's] comparable desire is to make the greatestnumber of valuables he must relinquish as small as possible, irrespectiveof [the first player's] action. This philosophy, if held by the players,is sufficient to specify their sources of strategy.... The above argumentis the central one in Game Theory. There is a way to play every two-persongame that will satisfy this criterion. However... it is not the onlypossible criterion; for example, by attributing to the enemy variousdegrees of ignorance or stupidity, one could devise many others."

(Incidentally, this is also why economics remains an inexact field.The mathematics of economics -- largely built on game theory -- iselegant and powerful. But it has a criterion problem: Essentially, ittries to reduce everything to money. But people do not agree on thetrue value of different things, so there is no way to assign relativevalues to all things in such a way that individual people will all wantto pay the particular assigned values. We would do a little better if weused energy rather than money as the value equivalent, and brought inbiology to calculate needs as well as values -- but in the end there isstill the question of "how much is a sunset worth?")

As Rapoport-Strategy sums it up (p. 91), "The strategist' expertnesscomes out to best advantage in the calculation of tangibles. These peopleare usually at home with physics, logistics, and ballistics.... The problemof assigning utilities to outcomes is not within their competence orconcern." He goes on to warn that this problem forces them tooversimplify psychological factors -- and what is a decision about aparticular variant reading if not psychological?

Even if we could solve the criterion problem for a particular scribe,we aren't dealing with just one scribe. We're dealing withthe thousands who produced our extant manuscripts, and the tens ofthousands more who produced their lost ancestors. Not all of whomwill have followed the same strategies.

And even if we could find rules which covered all scribes, each scribewould be facing the task of copying a particular manuscript with a particularset of errors. This is getting out of the area of game theories; it strikes meas verging on the area of mathematics known as linear programming (cf. Luce/Raiffa,pp. 17-18) -- although this is a field much less important now than in the past;these days, you just have a computer run approximations.

And even if we can figure out an individual scribe's exact goal, it still won'tgive us an exact knowledge of how he worked -- because, of course, the scribeis a human being. As game theory advances, it is paying more and more attentionto the fact that even those players who know their exact strategy will sometimesmake mistakes in implementing it. Binmore, p. 55, gives the example of aperson working with a computer program who presses the wrong key; he informally refersto this class of errors as "typos." But it can happen in copying, too;we can't expect scribes to be as accurate as computers even if we know what theyare trying to do!

This illustrates the problem we have with applying statisticalprobability to readings, and hence of applying game or utility theoryto textual criticism. If textual criticstruly accepted the same rules (i.e. the list and weight of theCanons of Criticism), chances are thatwe wouldn't need an optimal strategy much; we'd have achieved nearconsensus anyway. Theoretically, we could model the actions of a particularscribe (though this is more a matter of modeling theory than gametheory), but again, we don't know the scribe's rules.

And, it should be noted, second-guessing can be singularly ineffective.If you think you know the scribe's strategy in detail, but you don't,chances are that your guesses will be worse than guesses basedon a simplified strategy. We can illustrate this with a very simplegame -- a variation of one suggested by John Allen Paulos. Suppose youhave a spinner or other random number generator that produces randomresults of "black" or "white" (it could be yes/noor heads/tails or anything else; I just wanted something different).But it's adjustable -- instead of giving 50% black and 50% white, youcan set it to give anything from 50% to 90% black. Suppose you set itat 75%, and set people to guessing when it will come up black. Mostpeople, experience shows, will follow a strategy of randomly guessing black75% of the time (as best they can guess) and white 25% of the time.If they do this, they will correctly guess the colour five-eighthsof the time (62.5%). Note that, if they just guessed black everytime, they would guess right 75% of the time. It's easy to showthat, no matter what the percentage of black or white, you get betterresults by guessing the more popular shade. For example, if thespinner is set to two-thirds black, guessing two-thirds white and one-thirdblack will result in correct guesses five-ninths of the time (56%);guessing all black will give the correct answer two-thirds (67%) ofthe time. Guessing is a little more accurate as you approach theextremes of 50% and 100%; at those values, guessing is as good asalways picking the same shade. But guessing is never moreaccurate than guessing the more popular shade. Never. Tryingto construct something (e.g. a text) based on an imperfect system ofprobabilities will almost always spell trouble.

This is not to say that we couldn't produce computer-generatedtexts; I'd like to see it myself, simply because algorithms are repeatableand people are not. But I don't think game theory has the toolsto help in that quest.

Addendum. I don't know if the above has scared anyone awayfrom game theory. I hope not, in one sense, since it's an interestingfield; I just don't think it has any application to textual criticism.But it's a field with its own terminology, and -- as often happens inthe sciences and math -- that terminology can be rather confusing,simply because it sounds like ordinary English, but isn't really.For example, a word commonly encountered in game theory is"comparable." In colloquial English, "comparable"means "roughly equal in value." In game theory,"comparable" means simply "capable of being compared."So, for example, the odds, in a dice game, of rolling a 1 are one in six;the odds of rolling any other number (2, 3, 4, 5, or 6) are five in six.You're five times as likely to roll a not-one as a one. In a non-game-theorysense, the odds of rolling a one are not even close to those ofrolling a not-one. But in a game theory context, they are comparable,because you can compare the odds.

Similarly, "risky" in ordinary English means "havinga high probability of an undesired outcome." In game theory,"risky" means simply that there is some danger, nomatter how slight. Take, for example, a game where you draw a cardfrom a deck of 52. If the card is the ace of spades, you lose a dollar.Any other card, you gain a dollar. Risky? Not in the ordinary sense;you have over a 98% chance of winning. But, in game theory, thisis a "risky" game, because there is a chance, although asmall one, that you will lose. (There is at least one other meaning of"risk," used in epidemiology and toxicology studies, where wefind the definition risk = hazard x exposure. Evidentlyusing the word "risk" without defining it is risky!)

Such a precise definition can produce secondary definitions. For example,having a precise definition of "risk" allows precise definitions of termssuch as "risk-averse." A person is considered "risk-neutral" ifhe considers every dollar to have equal value -- that is, if he'll fight just as hardto get a $10,000 raise from $100,000 to $110,000 as he would to get a $10,000 dollarraise from $20,000 to $30,000. A person who considers the raise from $20,000 to $30,000to be worth more utiles is "risk-averse" (Binmore, pp. 8-9).

You can also get a precise definition of something like "cheap talk.""Cheap talk" is what one does in the absence of sending a meaningfulsignal (a meaningful signal being something that demonstrates an actualinvestment, e.g. raising a bet in poker or, perhaps, speeding up in Chicken; Binmore,p. 100. A raise in poker or an acceleration in Chicken may be a bluff, but it isstill a commitment to risk). The goal of every bargainer is to make cheap talk appear to be ameaningful signal -- in other words, to cheat. This is why advertisers always pushthe limits, and why politicians are blowhards: They're trying to make their talk seemless cheap.

Even the word "strategy" has a specific meaning in game theory.In its common context, it usually means a loose plan for one particular setof circumstances -- e.g. "marcharound the enemy's left flank and strike at his rear." In game theoryterms, this is not a strategy -- it is at most a part of a strategy. Itdoes not cover the case where the enemy retreats, or has a strong flank guard,or attacks before the outflanking maneuver is completed. Williams, TheCompleat Strategyst, p. 16, defines the term thus: A strategy is a planso complete that it cannot be uspet by enemy action or Nature. In otherwords, a set of actions is a strategy only if it includes a response forevery action the enemy might take. This is of course impossible in anactual military context, but it is possible in the world of "games"where only certain actions are possible. But it is important to keep thisdefinition in mind when one encounters the word "strategy." Itincludes a wide range of moves, not just the action you take in response toa particular enemy action.

It should also be kept in mind that a strategy can be really, really stupid.In Prisoner's Dilemma, for instance, "Always Cooperate" is a strategy,but it's a really stupid one against an opponent who will defect. In chess, avalid strategy would be,"among pieces permitted to move, take the piece farthest to the right andforward, and move it by the smallest possible move right and forward, with preferenceto forward." This strategy is simple suicide, but it is a strategy.

(We might add incidentally that games are often divided into two classes based onthe number of strategies possible. Some games have infinitely many strategies -- or,at least, no matter how many strategies you list, you can always come up with anotherone; a battle probably fits this description. Other games have a finite number ofstrategies; tic-tac-toe would be an example of this. For obvious reasons, much morework has been done on the "finite games" than the "infinite games,"and much of what has been done on infinite games is based on trying to simplify themdown to finite games. Given that much economic modelling consists of trying to turnthe infinite game of the economy into something simpler, it will be evident thatthis doesn't always work too well.)

An even more interesting instance of a non-intuitive definitionis the word "player." In gametheory jargon, a player must be a person who has a choice of strategies.Rapoport-Strategy, p. 34, illustrateshow counter-intuitive this is: When a human being plays a slot machine, theslot machine is a player but the human isn't. The slot machine has a seriesof strategies (jackpot, small payout, no payout), but the person, having pulledthe lever, has no strategy; he's just stuck awaiting the result. He is anobserver. Rapoport-Person, p. 21, adds that the player in solitaire isn't a playereither, at least in the technical sense, because there is no other player. Theplayer does not have any reason to respond to another player.

You could argue that there is a higher-level game, between the house andthe potential player, in which the house has to set the payout odds in theslot machine and the potential player has to decide whether to use the slotmachine in the face of those odds. This is indeed a valid two-player game.But it is not the same game! In the specific context of "playing the slots,"as opposed to "setting the slots," the slot machine is the onlyplayer.

Rapoport-Strategy, pp. 39-40, also points out that, to a game theorist, "complexity"can be a strange term. He offers as examples chess and matching pennies. In termsof rules, chess is very complex. Matching pennies is not -- each player puts out acoin, and the winner depends on whether they both put out heads or tails, or if thetwo put out difference faces. But in game theory terms, chess is a "game ofperfect information," whereas you don't know what the other person will do inmatching pennies. Therefore matching pennies is morecomplex -- it can be shown that chess has a single best strategy for both white andblack (we don't know what it is, and it is doubtless extremely involved, but it isabsolutely fixed). Matching pennies has no such strategy -- the best strategy ifyou cannot "read" your opponent's strategy is to make random moves, thoughyou may come up with a better response if you can determine your opponent's behavior.In game theory terms, a complex game is not one with complex rules but with complexstrategic choices.

A good way to judge the complexity of a game is to look at the ranking of theoutcomes. Can they be ranked? To show what we mean by ranking, look at thegame rock, paper, scissors. Rock beats scissors, so rock > scissors. Paper beatsrock, so paper > rock. But scissors beats paper, so scissors > paper.

You thus cannot say that rock or paper or scissors is the #1 outcome. Thereis no preferred outcome. Any such game will be complex, and requires a mixedstrategy.

Let's give the last word on game theory and its applications to Rapoport-Strategy.The key is that there are places where game theory isn't very relevant:"Unfortunately, decision theory has been cast in another role, namely,that of a prop for rationalizing decisions arrived at by processes farfrom rational.... [In] this role decision theory can become a source ofdangerous fixations and delusions."

Appendix I: The 2x2 Games

You may not have noticed it, but several of the examples I used aboveare effectively the same game. For example, the "odds and evens" gameabove, and the tennis game, have the same payoff matrix and the same optimalstrategy. Having learned the strategy for one, you've learned the strategy forall of them.

Indeed, from a theoretical perspective, the payoffs don't even have to bethe same. If you just have a so-called "2x2 game" (one with twoplayers and two options for each player), and payoffs a, b, c, and d (asin one of our formulae above), it can be shown that the same generalstrategy applies for every two-player two-strategy game so long as a, b, c, and d havethe same ordering. (That is, as long as the same outcome, say b, is considered"best," and the same outcome next-best, etc.)

It can be shown (don't ask me how) that there are exactly 78 so-called2x2 games. (They were catalogued in 1966 by Melvin J. Guyerand Anatol Rapoport.) Of these 78 games, 24 are symmetric -- that is, bothplayers have equal payouts. Odds and Evens is such a game. These games canbe characterized solely by the perceived value of the outcomes -- e.g.a>b>c>d, a>b>d>c, a>c>b>d, etc., throughd>c>b>a.

A different way to characterize these is in terms of cooperation anddefection, as in Prisoner's Dilemma. In that case, instead of a, b, c, d,the four payoffs are for strategies CC, CD, DC, and DD.

It turns out that, of the 24 possible symmetric 2x2 games, fully 20 arein some sense degenerate -- either CC>CD, DC>DD, or DD is the worstchoice for all players. There is no interest in such games; if you playthem again and again, the players will always do the same thing.

(Digression: At least, that's the theory. There is a game that David P. Baraschcalls "Cooperate, Stupid!" It is a degenerate game; the payoffs are asfollows:

Player 2
Player 1cooperates4, 41, 3
defects3, 10, 0

Even though, in this game, the player always gets more for cooperating than defecting, studies have shown that people who play the game may defect as much as 50% of the time because they want to win over their opponent. And you wonder why people fight wars? Even when the system is set up so that they can only win by cooperating, they still don't want to do it.)

That leaves the four cases which are not degenerate. These are familiarenough that each one has a name and a "story." The four:

DC>DD>CC>CD: "Deadlock."
DC>CC>DD>CD: "Prisoner's Dilemma"
DC>CC>CD>DD: "Chicken"
CC>DC>DD>CD: "Stag Hunt"

The names derive from real-world analogies. You've met Prisoner's Dilemma."Deadlock" is so-called because its analogy is to, say, an armsrace and arms limitation treaties. Both parties say, on paper, they want todisarm. But neither wants to be disarmed if the other is disarmed. So (lookingback to the days of the Cold War), for the Americans, their preferred outcomeis to have the Soviets disarm while the Americans keep their weapons. (DC:the Americans defect, the Soviets cooperate). The next best choice is forboth to retain their weapons (DD): At least the Americans still have theirweapons -- and, since they do, they don't have to worry about the Sovietscheating. The third-best choice is for both to disarm (CC): At least neitherside has an armaments advantage (and there is probably a peace dividend). Ifyou could trust the Soviets, this might be a good choice -- but the fear inthat case was that the Americans would disarm and the Soviets wouldn't (CD).That would leave the Americans helpless. (It is the fear of the CD scenariothat causes the Americans to prefer DD, where both are still armed, to CC,where the Americans know they are disarmed but aren't sure about the Soviets.)

The obvious outcome of deadlock is that neither side disarms. And, lo andbehold, that's exactly what happened for half a century: It took forty yearseven to get both sides to reduce their number of weapons, and they kept themat levels high enough to destroy each other many times over even after theU.S.S.R. collapsed.

You may have seen "Chicken," too. The canonical version has twocars driving straight toward each other, as if to collide, with the loserbeing the one who swerves first. In Chicken, the most desired outcome fora particular player is that the other guy swerves, then that both swerve,then that you swerve; last choice is that you both stay the course andend up dead. One notes that there is no truly optimal strategy for this game.Though there are interesting considerations of metastrategy. Rapoport-Strategy, p. 116,notes that the goal in Chicken is to announce your strategy -- and, somehow,to get your opponent to believe it (in other words, to accept that you areserious about being willing to die rather than give in).

The dangerous problem about Chicken is that it encourages insanebehavior. The player more willing to die is also the one more likely to win!Mathematically, however, it is interesting to analyze in real world contextsbecause it turns out to be very difficult to assess one's strategy if one doesnot know the other player's payoff for winning the game. Oddly enough, theanalysis is easier if neither player's payoff is known! (Binmore, pp. 96-98).What's more, the risk of disaster often increases if someone reveals one player'sexact preferences (Binmore, p. 99). Thus the interest in Chicken is less in theactual outcome (likely to be disastrous) than in the way people decide whetherto, or when to, back down.

"Stag Hunt" is probably the most interesting of the games afterPrisoner's Dilemma. It has a number of analogies (e.g. Poundstone, p. 218mentions a bet between two students to come to schoolwith really strange haircuts), but the original goes back to a tale fromRousseau (Binmore, p. 68). The original version involves a pair ofcave men. Their goal is to hunt a stag. But catching stags is difficult --the animal can outrun a human, so the only way to kill one is to have oneperson chase it while another waits and kills it as it flees. And both huntershave alternatives: Rather than wait around and chase the stag, they candefect and chase a rabbit. If both hunt the stag, they get the highestpayoff. If one defects to hunt a rabbit, the defector gets somemeat, while the cooperator gets nothing. If both defect, both getrabbits and neither can boast of being the only one to get meat. So thehighest reward is for cooperating; the next-highest reward goes to thedefector when only one defects, next is when both defect, and dead last isthe reward to the cooperator when both defect.

Stag Hunt is fascinating because it has two equilibria when played repeatedly:The players can both cooperate or both defect. Players who cooperate regularlycan expect continued cooperation, and hence the highest payoff. But once oneestablishes a reputation for defecting, then it becomes extremely difficult tore-establish cooperation. Rousseau's solution was to suggest, "If you wouldhave the general will be accomplished, bring all the particular wills intoconformity with it" (Binmore, p. 68). This is pretty close to impossible,of course, but one can sometimes adjust the rules of the game (via a methodknown as Nash Bargaining)to get close. I can't help but think that American politics has reduced itself to astag hunt where both parties are convinced the other is always defectingCompromise should be possible ("We'll help you reduce the number of abortions ifyou'll help us control global warming") -- but it is very hard to convertfrom an equilibrium of defection to one of cooperation because, as Binmore sayson p. 69, you can't rely on someone in the game who says "trust me"; itis always in the interest of one part to induce the other to cooperate.The only way to assure that they do it is the external enforcer -- and the politicalparties don't have any such (except the voters, who have of course shown that theydo not enforce such agreements).

The non-symmetrical games (where the objectives or rewards for the twoparties differ) are too diverse to catalog. One example of the type is knownas "Bully." Poundstone (p. 221) calls it a combination of Chickenand Deadlock, in which one player is playing the "Chicken" strategywhile the other plays "Deadlock" strategy. In a real-world scenario,if two nations are considering war at each other, it's a case where one player wantswar, period, while the other wants peace but is afraid to back down. Bully hasreal-world analogies -- consider, e.g., the behavior of the Habsburg Empireand Serbia before World War I. Or Saddam Hussein before the (second)Iraq-American war. Or Spain before the Spanish-American War. The situationbetween Poland and Germany before World War II wasn't quite the same, butit was close.

Not all games of Bully result in wars; World War I had been preceded bya series of games of Bully in which the bully backed down and peace waspreserved. But whereas Prisoner's Dilemma and Stag Hunt, when played repeatedlyand with good strategy, tend to result in cooperation, the long-termresult of Bully tends to be increased tension, more bullying incidents, and,eventually, the actual war.

Incidentally, Poundstone points out that there is a Biblical game of Bully:Solomon and the two prostitutes (1 Kings 3). When Solomon faces the two womenand one child, and threatens to cut the child in two, the woman who agrees tocut the child in half is playing Bully strategy. What's more, in theory shewins. If it weren't for Solomon's second judgment, changing the rules ofthe game, she would have had what she wanted.

Binmore, pp. 104-105, having noted that Solomon's questioning did not in factassure that the correct mother was given the baby, suggests a scheme of questionswhich could have assured the correct outcome. This is, however, a good example ofa problem in utility. Binmore's goal is to assure that the correct mother gets thebaby. But I would suggest that this is not the higher-utility outcome. What wewant is for the child to have a good mother. And that Solomon achieved.Solomon didn't have to be sure that he knew which woman was the mother of thechild: by giving the baby to the more humane prostitute, he assured that thebaby wouldn't be brought up by a bully.

Note that, like Prisoner's Dilemma, it is possible to play other 2x2 gamesrepeatedly. Unlike Prisoner's Dilemma, these need not involve playingthe same strategy repeatedly. (Yet another thing to make life complicated intrying to guess what a scribe might have done!) Rapoport-Strategy, pp. 64-65, gives aninteresting example in which a husband and wife wish to go on a vacation. Theyhave the choice, individually or separately, of going camping or going to aresort. The husband prefers camping, the wife prefers the resort -- but theyboth prefer going together to going alone. Obviously there are four possibleoutcomes: They go camping together (big payoff for husband, small payoff forwife), they both go to a resort (small payoff for husband, big payoff forwife), the man goes camping and the wife goes to a resort (small deficit forboth), or the man goes to a resort and the wife goes camping (big deficitfor both). The third choice is silly, and the fourth extremely so (though itsometimes happens in practice, if both insist on being "noble" or"making a sacrifice") -- but the likely best answer is to go sometimesto the resort and sometimes camping, with the correct ratio being determinedby the relative values the two partners place on the two outcomes.

This still produces some amazing complications, however. Rapoport-Strategypresents various methods of "arbitration" in the above scenario,and while all would result in a mixture of camping and resort trips, theratio of the one to the other varies somewhat. In other words, the problemcannot be considered truly solved.

Appendix II: Multi-Player Games

You may have noticed that most of the games we have examined are two-player games --there are two sides in Prisoner's Dilemma; only two players really matter to theDollar Auction; the tennis example featured two teams. This is because the mathematicsof multi-player games is much trickier.

To demonstrate this point, consider the game of "Rock, Paper, Scissors,"often used to allow two people decide who gets a dirty job. Each of the two players havethree options: Rock (usually shown with a hand in a fist), Paper (the hand held outflat like a sheet of paper) or Scissors (two fingers held out from the fist). The ruleis that "paper covers [beats] rock, scissors cuts [beats] paper, rock breaks[beats] scissors."

SInce Rock, Paper, Scissors is played by two players, as long as the two playerspick different items, there is always a winner (unless the two players choose thesame strategy, in which case they play again). And, because each choice is as likelyto win as to lose, the proper strategy is to choose one of the three randomly.

But now try to generalize this to three players. We now have three possible outcomes:All three players make the same choice (say, Rock). In this case, the contest is obviouslyindecisive. Or two may choose one strategy and the third another (say, two choose rock andthe third chooses paper). Paper beats Rock, but which of the two Rock players is eliminated?(Or do you simply eliminate the player who picked the odd choice? If so, you might as wellplay odds and evens and forget the rock, paper, scissors; odds and evens would assure aresult.) Or what if one chooses Rock, one Paper, one Scissors?

In none of the three cases does Rock, Paper, Scissors produce a decisive result. Ofcourse, the obvious suggestion is to broaden the list of possibilities -- say, Rock,Paper, Scissors, Hammer. This again assures that there will be one possibility un-chosen,so there will always be a top choice and a bottom choice.

But with four choices, the odds of our three players picking three distinct choicesare small -- there are 64 possible outcomes (player 1 has any of four choices, as doesplayer two, and also player three), but only 24 of these are distinct (player 1 hasfour choices, player 2 has three, player 3 only two). If you have to keep playing untilyou get a decisive result, monotony may result. You can, perhaps, add an additionalrule ("if two players pick the same result, the player with the other result loses").But then how to generalize to the case of four players? You now have five options(let's just call them A, B, C, D, E, where B beats A, C beats B, D beats C, E beats D,and A beats E). Now you have even more possible classes of outcomes:
1. All four players choose different options. This is decisive.
2. Three players choose three different options; the fourth player chooses one of the first three. In this case, you may not even have a continuous string of three choices (e.g. the players might choose BDDE, or BCCE, or BBDE, or BDEE, or BCDD, or BCCD, or BBCD). All of these cases will require some sort of tiebreak.
3. Two players choose one option, and two players a second. These may be consecutive (AABB) or non-consecutive (AACC). Here again you need tiebreak rules.
4. Three players choose one option and the fourth player a second. These again may or may not be consecutive (AAAB or AAAC).
5. All four players choose the same option.

It is possible in this case to write tiebreak rules which will be decisive inmost cases (that is, not requiring players to play again), or which will atminimum reduce the game to a simple Rock, Paper, Scissors playoff between twoplayers. But the vital advantage of Rock, Paper, Scissors is its simplicity:the rules are the equivalent of two sentences long. Fourplayer Rock/Paper/Scissors/Hammer/Blowtorch (or whatever it's called) willrequire several paragraphs of instruction, and most people will be foreverlooking back to review the rules.

And it still doesn't generalize to five players!

Plus the strategy is no longer simple. Once you have to decide how to resolvetwo-way and three-way ties, it is perfectly possible that the tiebreak rules maychange the strategy from simple random choices to something else.

There is another problem with multi-player: Collusion. Take our rock/paper/scissorscase with three people. Depending on the tiebreak rule involved, they may be able toalways force the third player to lose, or at least force him to take inordinatenumbers of losses, by picking their own strategies properly. Von Neumann addressedthis in part by converting a three-party game with collusion into a two-party gamewith different rules (making the colluding parties into one party). But this still ignoresthe question of whether they should collude....

The case of colluding players is not directly analogous to the problem ofmulti-player games, but it showsthe nature of the problem. Indeed, von Neumann's approach to multi-player games wassomewhat like ours: To create them as a complex game which resolved down to individualgames.

Perhaps a good example of the effects of multi-player games is to re-examine the four2x2 games above in the light of multiple players. "Deadlock" hardly changes at all; since ittakes only one player refusing to disarm, all the others will be even more afraid to do so."Chicken" gains an added dimension: The winner is the last to swerve, but the first to swervewill doubtless be considered the worst loser. So the pressure is ratcheted up -- one wouldexpect more accidents. In "Stag Hunt," if you need more cooperators to win the big prize, thetemptation to defect will be higher, since it takes just one defector to blow thewhole thing.

"Stag Hunt" can at least be converted to a more interesting game -- supposeyou have five players and it takes three to catch a stag. Now coalitions become a veryimportant part of the game -- and if there are two strong coalitions and one relativelyfree agent, then the coalition which buys that free agent will win. This version of "StagHunt" gets to be unpleasantly like World War I. "Prisoner's Dilemma" suffers the sameway: The greater the number of players, the greater the ability of a single defector tobring down any attempt at cooperation. In essence, multi-player Prisoner's Dilemma is thesame as the two-player version, but with the payoffs dramatically shifted and with newcomputations of risk. It is a much trickier game -- especially if played iteratively andwith no knowledge of who is betraying you.

To be sure, the von Neumann method of converting multi-player games to two-playergames can sometimes work if all the players in fact have similar and conjoined strategies.Binmore, p. 25, describes the "Good Samaritan" game. In this, there is a Victimand a group of passers-by. The passers-by want the Victim to be helped -- in Binmore'sexample, all passers-by earn ten utiles if even one passer-by helps Victim.They earn nothing if no one helps. But helping is an inconvenience, so a helper earnsonly nine utiles, instead of the ten utiles he earns if someone else helps.

Note what this means: If no one else helps, your best payoff is to help yourself. Butif anyone else is going to help, your best payoff is not to help.

So what action has the best average payoff? It is, clearly, a mixed strategy, of helpingsome fraction of the time. We note that your payoff for helping every time in 9 utils.So whatever strategy you adopt must have a value with a payoff equal to or greater thanthat. For example, if there are two players and you each respond 90% of the time,then the probability that both of you respond is 81%, with a value to you of9 utils; the probability that you respond and the other passer-by doesn't is 9%, with avalue to you of 9 utils; the probability that the other guy responds and you don't is9%, with a value to you of 10 utils; and there is a 1% chance that neither of you responds,with a value of 0 utils. Adding that up, we have a payoff of(.81*9)+(.09*9)+(.09*10)+(0*0)=9 -- exactly the same payoff as if you responded every time,but with 10% less effort. (This is known as beingindifferent to the outcome, which admittedly is a rather unfortunate term in context of thegame.)

Suppose we responded 95% of the time. Then our payoff becomes(.952*9)+(.95*.05*9)+(.95*.05*10)+(0*0*0)=9.025. This turns out to be themaximum possible reward.

That's for the case of n=2 (or, alternately, n=1 plus you). You can equally well solvefor n=3, or n=4, or n=5. For a three-player game, for instance, the maximum papyoff is around82% of passers-by responding, which has a payoff of 9.12 (assuming I did my algebra correctly;note that you will need to use the binomials theorem to calculate this).You can find a similar solution solution for any n. Obviously the probability p of havingto help goes down as the number of players n goes up.

Appendix III: Differential Games

If you look at the information above, every instance is of either a discreteone-time game or of an iterative game. That is, you either have one decision to make,or you make a series of decisions but all of similar nature.

There are two reasons why I presented the matter this way. First, it's easier(a big plus), and second, if game theory has any application to textual criticism,it is to discrete games. You make decisions about particular readings one at a time.You may do this iteratively -- "I chose the Alexandrian reading there, so I'llchoose it here also" -- but each decision is individual and separate.

This is also how most economic decisions are made -- "I'll buy this stock"or "I'll support this investment project."

But not all decisions are made this way. Isaacs, p. 3, mentions several classes ofactivities in which each player's actions are continuously varying, such as a missiletrying to hit an aircraft. The aircraft is continuously trying to avoid being hit(while performing some other task); the missile is continuously trying to hit theaircraft. So each is constantly adjusting what it is doing. This is a differentialgame -- a game in which you do not so much make a decision but try to produce a rulewhich can be continuously applied.

Differential games involve much heavier mathematics, including a lot of theoryof functions. I will not attempt to explain it here. But you should probably be awarethat there is such a field.

Appendix IV: Bibliography

For those who want a full list of the books cited here, the ones I recall citing are:

The Golden Ratio (The Golden Mean, The Section)

The Golden Ratio, sometimes called the Golden Mean orφ, isone of those "special numbers." There are various definitions.For example, it can be shown that

φ =(1 + √5)/2.

Alternately, φcan be defined as the ratio of a/b where a and b are chosen to meet the condition

A   A + B- = ----- B     A

This turns out to be an irrational number (that is, an infinite non-repeatingdecimal), but the approximate value is 1.618034.

So why does this matter? Well, this turns out to be a very useful number -- andthough many of the uses were not known to the ancients (e.g. they would not have knownthat it was the limit of the ratio of terms in the Fibonacci sequence), they didknow of its use in "sectioning" lines. Euclid refers to "the section"(the Greekname for this concept of proportional division) at several points in The Elements.And while Greek artists may not have known about the mathematical significance of"the section," they assuredly used it. Because another trait of the GoldenRatio is that it seems to be aesthetically pleasing.

This means that the Golden Ratio is very common, for instance, in the layout ofpages. Most modern books have pages with a ratio of length to width that approximatesthe golden ratio. And so, we note, did ancient books -- including the very first printedbook, the Gutenberg Bible. To see what I mean, considerthis general layout of an open codex:

+---------+---------+  ^|         |         |  ||         |         |  h|         |         |  e|         |         |  i|         |         |  g|         |         |  h|         |         |  t|         |         |  |+---------+---------+  v          <--width-->

It may not be evident on your screen (much depends on the way your screen drawsfonts), but most pages will be laid out so that either height/width is equalto φ, andtwice the width (i.e. the width of two facing pages) divided by the height isequal to φ.

The other use of the Golden Ratio may be in individual artwork. The BritishLibrary publication The Gospels of Tsar Ivan Alexander (by EkaterinaDimitrova), p. 35, claims that the single most important illustration in this Bulgarianmanuscript, the portrait of the Tsar and his family, is laid out based on theGolden Ratio. I can't say I'm entirely convinced; the claim is based on a sort ofredrawing of the painting, and none of the other illustrations seem to be in thisratio (most are much wider than they are tall). But it might be something tolook for in other illustrated manuscripts.

As an aside, the logarithm of the Golden Mean is known to mathematicians asλ, which is closely related to the famous Fibonacci Sequence.

Curve Fitting, Least Squares, and Correlation

Experimental data is never perfect. It never quite conforms to the rules.If you go out and measure a quantity -- almost any quantity found in nature-- and then plot it on a graph, you will find that there is no way to plota straight line through all the points. Somewhere along the way, somethingintroduced an error. (In the case of manuscripts, the error probably comesfrom mixture or scribal inattentiveness, unlike physics where the faultis usually in the experimental equipment or the experimenter,but the point is that it's there.)

That doesn't mean that there is no rule to how the points fall on thegraph, though. The rule will usually be there; it's just hidden under theimperfections of the data. The trick is to find the rule when it doesn'tjump out at you.

That's where curve fitting comes in. Curve fitting is the process offinding the best equation of a certain type to fit your collected data.

At first glance that may not sound like something that has much to dowith textual criticism. But it does, trust me. Because curve fitting, inits most general forms, can interpret almost any kind of data.

Let's take a real world example. For the sake of discussion, let's trycorrelating the Byzantine content of a manuscript against its age.

The following table shows the Byzantine content and age of a numberof well-known manuscripts for the Gospels. (These figures are real, basedon a sample of 990 readings which I use to calculate various statistics.The reason that none of these manuscripts is more than 90% Byzantine is that there are a numberof variants where the Byzantine text never achieved a fixed reading.)


We can graph this data as follows:

Scatter Chart of Byzantine Percents

At first glance it may appear that there is no rule to the distributionof the points. But if you look again, you will see that, on the whole,the later the manuscript is, the more Byzantine it is. We can establisha rule -- not a hard-and-fast rule, but a rule.

The line we have drawn shows the sort of formula we want to work out.Since it is a straight line, we know that is is of the form

Byzantine % = a(century) + b

But how do we fix the constant a (the slope) and b (theintercept)?

The goal is to minimize the total distance between the points and theline. You might think you could do this by hand, by measuring the distancebetween the points and the line and looking for the a and bwhich make it smallest. A reasonable idea, but it won't work. It is difficultto impossible to determine, and it also is a bad "fit" on theoreticalgrounds. (Don't worry; I won't justify that statement. Suffice it to saythat this "minimax" solution gives inordinate weight to erroneousdata points.)

That being the case, mathematicians turn to what is called leastsquares distance. (Hence the word "least squares" in ourtitle.) Without going into details, the idea is that, instead of minimizingthe distance between the points and the line, you minimize the square rootof the sum of the squares of that distance.

Rather than beat this dog any harder, I hereby give you the formulaeby which one can calculate a and b. In this formula, nis the number of data points (in our case, 31) and the pairs x1,y1 ... xn, yn are our data points.

a=n(x1y1 + x2y2 + ... + xnyn) - (x1 + x2 + ... + xn)(y1 + y2 + ... + yn)
n(x12 + x22 + ... + xn2) - (x1 + x2 + ... + xn)2


b=(x12 + x22 + ... + xn2)(y1 + y2 + ... + yn) - (x1y1 + x2y2 + ... + xnyn)(x1 + x2 + ... + xn)
n(x12 + x22 + ... + xn2) - (x1 + x2 + ... xn)2

In the shorthand known as "sigma notation," this becomes

nxy - xyx2y - xyx
nx2 - [ ∑x ]2nx2 - [ ∑x ]2

If we go ahead and grind these numbers through our spreadsheet (or whatevertool you use; there are plenty of good data analysis programs out therethat do this automatically, but that's hardly necessary; Excel has the LINEST()function for this), we come up with (to three significant figures)

a = 4.85
b = 29.4

Now we must interpret this data. What are a and b?

The answer is, a is the average rate of Byzantine corruptionand b is the fraction of the original text which was Byzantine.That is, if our model holds (and I do not say it will), the original textagreed with the Byzantine text at 29.4% of my points of variation. In thecenturies following their writing, the average rate of Byzantine readingswent up 4.85 percent per century. Thus, at the end of the first centurywe could expect an "average" text to be 29.4+(1)(4.85)= 34.25%Byzantine. After five centuries, this would rise to 29.4+(5)(4.85)=53.65%Byzantine. Had this pattern held, by the fifteenth century we could expectthe "average" manuscript to be purely Byzantine (and, indeed,by then the purely Byzantine Kr text-type was dominant).

It is possible -- in fact, it is technically fairly easy -- to constructcurve-fitting equations for almost any sort of formula. That is, instead offitting a line, there are methods for fitting a parabola, or hyperbola, or anyother sort of formula; the only real requirement is that you have more datapoints than you have parameters whose value you want to determine. However, the basisof this process is matrix algebra and calculus, so we will leave mattersthere. You can find the relevant formulae in any good numerical analysisbook. (I lifted this material from Richard L. Burden, J. DouglasFaires, and Albert C. Reynolds's Numerical Analysis, Second edition,1981.) Most such books will give you the general formula forfitting to a polynomial of arbitrary degree, as well as the informationfor setting up a system for dealing with other functions such as exponentialsand logs. In the latter case, however, it is often easier to transformthe equation (e.g. by taking logs of both sides) so that it becomes a polynomial.

There is also a strong warning here: Correlation is not causality.That is, the fact that two things follow similar patterns does not mean thatthey are related. John Allen Paulos reports an interesting example. Accoringto A Mathematician Plays the Stock Market, p. 29, aneconomist once set out to correlate stock prices to a variety of other factors.What did he find? He found that the factor which best correlated with thestock market was -- butter production in Bangladesh.

Coincidence, obviously. A model must be tested. If two things correspondover a certain amount of data, you really need to see what they predict forother data, then test them on that other data to see if the predictionshold true.

Mean, Median, and Mode

What is the "typical" value in a list? This can be a trickyquestion.

An example I once saw was a small company (I've updated this a bit forinflation). The boss made $200,000 a year, his vice-president made $100,000a year, his five clerks made $30,000 a year, and his six assemblers made$10,000 a year. What is the typical salary? You might say "take theaverage." This works out to $39,230.76 per employee per year. Butif you look, only two employees make that much or more. The other ten makefar less than that. The average is not a good measure of what you willmake if you work for the company.

Statisticians have defined several measures to determine "typicalvalues." The simplest of these are the "arithmetic mean,"the "median," and the "mode."

The arithmetic mean is what most people call the "average."It is defined by taking all the values, adding them up, and then dividingby the number of items. So, in the example above, the arithmetic mean iscalculated by

1x$200,000 + 1x$100,00 + 5x$30,000 + 6x$10,000



giving us the average value already mentioned of $39,230.76 per employee.

The median is calculated by putting the entire list in orderand finding the middle value. Here that would be

 30000 ****

There are thirteen values here, so the middle one is the seventh, whichwe see is $30,000. The median, therefore, is $30,000. If there had beenan even number of values, the mean is taken by finding the middle two andtaking their arithmetic mean.

The mode is the most common value. Since six of the thirteenemployees earn $10,000, this is the mode.

In many cases, the median or the mode is more "typical" thanis the arithmetic mean. Unfortunately, the arithmetic mean is easy to calculate,but the median and mode can only be calculated by sorting the values. Sorting is,by computer standards, a slow process. Thus median and mode arenot as convenient for computer calculations, and you don't seethem quoted as often. But their usefulness should not be forgotten.

Let's take an example with legitimate value to textual critics. Thetable below shows the relationships of several dozen manuscripts to themanuscript 614 over a range of about 150 readings in the Catholic Epistles.Each reading (for simplicity) has been rounded to the nearest 5%. I havealready sorted the values for you.

2412100% 249260% 04950%

There are 24 manuscripts surveyed here. The sum of these agreementsis 1375. The mean rate of agreement, therefore, is 57.3%.To put that another way, in this sample, the "average" rate ofagreement with 614 is 57.3%. Looking at the other two statistics,the median is the mean of the twelfth and thirteenth data points, or 52.5%.The mode is 50%, which occurs seven times. Thus we see that mean, median,and mode can differ significantly, even when dealing with manuscripts.

A footnote about the arithmetic mean: We should give the technical definitionhere. (There is a reason; I hope it will become clear.) If d1, d2, d3,...dnis a set of n data points, then the arithmetic mean is formally definedas

d1 + d2 + d3+ ... + dn

This is called the "arithmetic mean" because you just addthings up to figure it out. But there are a lot of other types of mean.One which has value in computing distance is what I learned to call the"root mean square mean." (Some have, I believe, called it the"geometric mean," but that term has other specialized uses.)

(d12 + d22+ d32 + ... + dn2)1/2

You probably won't care about this unless you get into probability distributions,but it's important to know that the "mean" can have different meaningsin different contexts.

There are also "weighted means." A "weighted mean" is onein which data points are not given equal value. A useful example of this(if slightly improper, as it is not a true mean) might be determining the"average agreement" between manuscripts. Normally you would simplytake the total number of agreements and divide by the number of variants. (Thisgives a percent agreement, but it is also a mean, with the observationthat the only possible values are 1=agree and 0=disagree.) But variants fall intovarious classes -- for example, Fee ("On the Types, Classification, andPresentation of Textual Variation," reprinted in Eldon J. Epp & GordonD. Fee, Studies in the Theory and Method of New Testament Textual Criticism)admits three basic classes of meaningful variant -- Add/Omit, Substitution, WordOrder (p. 64). One might decide, perhaps, that Add/Omit is the most importantsort of variant and Word Order the least important. So you might weightagreements in these categories -- giving, say, an Add/Omit variant 1.1 timesthe value of a Substitution variant, and a Word Order variant only .9 timesthe value of a Substitution variant. (That is, if we arbitrarily assign aSubstitution variant a "weight" of 1, then an Add/Omit varianthas a weight of 1.1, and a Word Order variant has a weight of .9.)

Let us give a somewhat arbitrary example from Luke 18:1-7, where we will compare the readings of A, B, and D. Only readings supported by three or more major witnesses in the Nestle apparatus will be considered. (Hey, you try to find a good example of this.) Our readings are:

Using unweighted averages we find that A agrees with B 2/5=40%; A agrees with D 4/5=80%; B agrees with D 1/5=20%. If we weigh these according to the system above, however, we get

Agreement of A, B = (1.1*0 + 1.1*1 + .9*0 + 1*0 + 1*1)/5 = 2.1/5 = .42
Agreement of A, D = (1.1*1 + 1.1*0 + .9*1 + 1*1 + 1*1)/5 = 4.0/5 = .80
Agreement of B, D = (1.1*0 + 1.1*0 + .9*0 + 1*0 + 1*1)/5 = 1.0/5 = .20

Whatever that means. We're simply discussing mechanisms here. The point is, differentsorts of means can give different values....


This isn't really a mathematical term; it's a scientific fault: Creating significant outcomes that aren't really there. That is, hunting through data for a correlation until you find one that meets the criterion for statistical significance.

To explain: a p-value is a measure of the likelihood that something is "meaningful" -- that is, that there is an underlying reason for an observed statistic. The alternative is that it just happened by chance.

But this only works when you are testing a proper hypothesis for evaluating the data. Citing p-values only works when you have an hypothesis in advance. Otherwise, you can funnel through your data until you find some interesting relationship and call it a correlation.

The problem is, correlations happen by chance. Does the number of sunny days in a week correlate to the number of stripes on a zebra's left front leg? Sure -- if you pick the right week and the right zebra. It doesn't mean it's true in general. That's why you need to have the hypothesis first.

What's more, the value of p is dependent on the data. There is a rule of thumb that p=0.05 is statistically significant -- that is, if there is a 95% probability that the data is meaningful. But keep in mind what this means: First, a 95% probability that it's meaningful still means that there is a 5% probability that it isn't. Second, what about something where there is a 94% probability that it's meaningful? That's still a 94% probability, which is pretty dang high! A p of 0.05, or even 0.005, does not guarantee meaning, and a p of 0.055, or even 0.5, does not guarantee lack of meaning. It's a rule of thumb only, and every result must be confirmed, and results with a p that doesn't attain the .05 standard may still be meaningful and worth investigating. When a result is reported, scientists then set out to replicate the finding, and as it is replicated, the p-value will get closer and closer to 0. (Or won't, because it was a false positive.)

There is actually a great deal of debate over whether the .05 (5%) criterion for significance is meaningful. It is an arbitrary number. What's more, it applies primarily to experiments you can repeat indefinitely, as in particle physics, not in manuscript studies, where you have a finite number of readings!

Let's take a New Testament example. Suppose you took, say, chapter 24 of Luke, and examined A, B, and D, and made a list of all readings where the three do not agree. Suppose you hypothesized that A and B are related, and you found that, in your sample readings (however many there are), A and B agree against D 60% of the time. B and D agree against A 15% of the time. A and D agree against B 15% of the time. And all three disagree 10% of the time. Without having details of the sample, we can't say exactly what the p-value is, but it would very likely be high enough for significance.

So does that mean that the hypothesis is correct and A and B are related? No, it patently does not. What it means is that Luke 24 was rewritten in D, and A and B look related because they haven't been rewritten! A p-value does not tell you if a hypothesis is correct; it merely tells you how unlikely it is that something arose by chance.

p-hacking as such is rare in textual criticism, simply because textual critics are too allergic to mathematics to know how to apply a p-test. But parallels are all over the place -- as, for instance, in the Colwell-Tune 70% Definition of a Text-Type. Colwell and Tune basically produced a p-hacked definition of a text-type, and others since have p-hacked to get revised Colwell-Tune definitions. And it hasn't worked.

We also see flagrant misunderstanding of the notion of statistical significance. For example, Gerald J. Donker, The Text of the Apostolos in Athanasius of Alexandria, on p. 219 shows a normal distribution and then comes forth with "Any level of dissimilarity less than -2SE [standard deviations below the mean] indicates that there is a statistically significant agreement (=significant similarity) between the respective witnesses." But this is flatly not true. If the measure of the manuscripts is indeed less than -2SE, it means that there is a 95% chance that the measured similarity is meaningful -- and a 5% chance that it isn't. If Donker were taking a sample, his 95% probability would be enough to let him publish and let others take other samples. But Donker isn't taking a sample; he's comparing texts. Their agreement is what it is, and it must then be explained. But statistical significance is not meaningful when one is taking the entire population, and statistical significance is a measure of probability, not certainty.

This is not to argue against using statistics to compare manuscripts. It isn't even arguing against carefully selected statistics. It is arguing that the statistics must be chosen in advance and evaluated with intelligence, then recalculated as we learn more. Given that the data set in textual criticism is inherently limited (huge, but limited), p-hacking simply won't lead to proper manuscript evaluation.


Probability is one of the most immense topics in mathematics, used byall sorts of businesses to predict future events. It is the basis of theinsurance business. It is what makes most forms of forecasting possible.

It is much too big to fit under a subheading of an article on mathematics.

But it is a subject where non-mathematicians make many brutal errors,so I will make a few points.

Probability measures the likelihood of an event. The probabilityof an event is measured from zero to one (or, if expressed as a percentage,from 0% to 100%). An event with a zero probability cannot happen;an event with a probability of one is certain. So if an event hasa probability of .1, it means that, on average, it will take place onetime in ten.

Example: Full moons take place (roughly) every 28 days. Therefore thechances of a full moon on any given night is one in 28, or .0357, or 3.57%.

It is worth noting that the probability of all possible outcomes ofan event will always add up to one. If e is an event and p()is its probability function, it therefore follows that p(e) +p(not e)= 1. In the example of the full moon, p(full moon)=.0357.Therefore p(not full moon) = 1-.0357, or .9643. That is,on any random night there is a 3.57% chance of a full moon and a 96.43%chance that the moon will not be full. (Of course, this is slightly simplified,because we are assuming that full moons take place at random. Also, fullmoon actually take place about every 28+ days. But the ideas areright.)

The simplest case of probability is that of a coin flip. We know that,if we flip an "honest" coin, the probability of getting a headis .5 and the probability of getting a tail is .5.

What, then, are the odds of getting two heads in a row?

I'll give you a hint: It's not .5+.5=1. Nor is it .5-.5=0.Nor is it. .5.

In fact, the probabity of a complex event (an event composedof a sequence of independent events) happening is the product of the probabilitiesof the simple events. So the probability of getting two heads in a rowis .5 times .5=.25. If more than two events are involved, just keep multiplying.For example, the probability of three heads in a row is .5 times .5 times.5 = .125.

Next, suppose we want to calculate the probability that, in two throws,we throw one head and one tail. This can happen in either of two ways:head-then-tail or tail-then-head. The odds of head-then-tail are .5 times.5=.25; the odds of tail-then-head are also .5 times .5=.25. We add theseup and find that the odds of one head and one tail are .5.

(At this point I should add a word of caution: the fact that the oddsof throwing a head and a tail are .5 does not mean that, if youthrow two coins twice, you will get a head and a tail once and only once.It means that, if you throw two coins many, many times, the numberof times you get a head and a tail will be very close to half the numberof times. But if you only throw a few coins, anything can happen. To calculatethe odds of any particular set of results, you need to study distributionssuch as the binomial distributionthat determines coin tosses and die rolls.)

The events you calculate need not be the same. Suppose you toss a coinand roll a die. The probability of getting a head is .5. The probabilityof rolling a 1 is one in 6, or .16667. So, if you toss a coin and rolla die, the probability of throwing a head and rolling a 1 is .5 times .16667,or .08333. The odds of throwing a head and rolling any number other thana 1 is .5 times (1-.16667), or .42667. And so forth.

We can apply this to manuscripts in several ways. Here's an instancefrom the gospels. Suppose, for example, that we have determined that theprobability that, at a randomly-chosen reading, manuscript L is Byzantine is .55, or 55%. Suppose that we know that manuscript 579 is 63% Byzantine.We can then calculate the odds that, for any given reading,

Note that the probabilities of the outcomes add up to unity: .3465+.2035+.2835+.1665=1.

The other application for this is to determine how often mixed manuscriptsagree, and what the basis for their agreement was. Let's take the caseof L and 579 again. Suppose, for the sake of the argument, that they hadancestors which were identical. Then suppose that L suffered a 55% Byzantineoverlay, and 579 had a 63% Byzantine mixture.

Does this mean that they agree all the time except for the 8% of extra"Byzantine-ness" in 579? Hardly!

Assume the Byzantine mixture is scattered through both manuscripts atrandom. Then we can use the results given above to learn that

Thus L and 579 agree at only .3465+.1665=.513=51.3% of all points ofvariation.

This simple calculation should forever put to rest the theory that closelyrelated manuscripts will always have close rates of agreement! Notice thatL and 579 have only two constituent elements (that is, both contain amixture of two text-types: Byzantine and Alexandrian). But the effect of mixtureis to lower their rate of agreement to a rather pitiful 51%. (This factmust be kept in mind when discussing the "Cæsarean" text.The fact that the "Cæsarean" manuscripts do not have high ratesof agreements means nothing, since all of them are heavily mixed.The question is, how often do they agree when they are not Byzantine?)

To save scholars some effort, the table below shows how often two mixedmanuscripts will agree for various degrees of Byzantine corruption. Touse the table, just determine how Byzantine the two manuscripts are, thenfind those percents in the table and read off the resulting rate of agreement.

100% 0%10%20%30%40%50%60%70%80%90%100%

It should be noted, of course, that these results only applyat points where the ancestors of the two manuscripts agreed andwhere that reading differs from the Byzantine text.

That, in fact, points out the whole value of probability theory fortextual critics. From this data, we can determine if the individualstrands of two mixed manuscripts are related. Overall agreements don'ttell us anything. But agreements in special readings are meaningful.It is the profiles of readings -- especially non-Byzantine readings --which must be examined: Do manuscripts agree in their non-Byzantinereadings? Do they have a significant fraction of the non-Byzantine readingsof a particular type, without large numbers of readings of other types?And do they have a high enough rate of such readings to be statisticallysignificant?

Arithmetic, Exponential, and Geometric Progressions

In recent years, the rise of the Byzantine-priority movement hasled to an explosion in the arguments about "normal"propagation -- most of which is mathematically very weak. Often the argumentsare pure fallacy.

"Normal" is in fact a meaningless term when referring to sequences (in this case, reproductive processes). There are manysorts of growth curves,often with real-world significance -- but each applies in only limitedcircumstances. And most are influenced by outside factors such as"predator-prey" scenarios.Sequences

The two most common sorts of sequences are arithmetic and geometric.Examples of these two sequences, as well as twoothers (Fibonacci and power sequences, described below) are shownat right. In the graph, the constant in the arithmetic sequence is 1,starting at 0; the constant in the geometric sequence is 2,starting at 1; the exponent in the power sequence is 2.Note that we show three graphs, over the range 0-5, 0-10, 0-20,to show how the sequences start, and how some of them growmuch more rapidly than others.

The arithmetic is probably the best-known type of sequence;it's just a simple counting pattern, such as 1, 2, 3, 4, 5...(this is the one shown in the graph) or 2, 4, 6, 8, 10.... Asa general rule, if a1, a2, a3, etc.are the terms of an arithmetic sequence, the formula for a given termwill be of this form:

an+1 = an+d


an = d*n+a0

Where d is a constant and a0 is the starting point ofthe sequence.

In the case of the integers 1, 2, 3, 4, 5, for instance, d=1 anda1=0. In the case of the even numbers 2, 4, 6, 8, 10...,d=2 and a0=0.

Observe that d and a0 don't have to be whole numbers.They could be .5, or 6/7, or even 2π. (Thelatter, for instance, would give the total distance you walk asyou walk around a circle of radius 1.)

In a text-critical analogy, an arithmetic progression approximatesthe total output of a scriptorium. If it produces two manuscripts amonth, for instance, then after one month you have two manuscripts,after two months, you have four; after three months, six, etc.

Note that we carefully refer to the above as a sequence.This is by contrast to a series, which refers to the valuesof thesums of terms of a sequence. (And yes, a series is a sequence, andso can be summed into another series....) The distinction may seemminor, but it has importance in calculus and numerical analysis, whereirrational numbers (such as sines and cosines and the value of theconstant e) are approximated using series. (Both sequences and seriescan sometimes be lumped under the term "progression.")

But series have another significance. Well-known rules will often let us calculate the values of a series by simple formulae. For example, for an arithmetic sequence, it can be shown that the sum s of the terms first n terms a0, a1, a2, a3 ... an is

s=(n+1)*(a0 + an)/2



Which, for the simplest case of 0, 1, 2, 3, 4, 5, etc. simplifiesdown to


A geometric sequence is similar to an arithmetic sequence inthat it involves a constant sort of increase -- but the increase ismultiplicative rather than additive. That is, each term in the sequenceis a multiple of the one before. Thus the basic definition ofgn+1 takes the form

gn+1 = c*gn

So the general formula is given by

gn = g0*cn

(where c is the constant multiple. cn is, of course,c raised to the n power, i.e. c multiplied by itself n times).

It is often stated that geometric sequences grow very quickly. Thisis not inherently true. There are in fact seven cases:

The last case is usually what we mean by a geometric sequence. Sucha sequence may start slowly, if c is barely greater than one, but italways starts climbing eventually. And it can climb very quickly if cis large. Take the case of c=2. If we start with an initial valueof 1, then our terms become 1, 2, 4, 8, 16, 32, 64, 128... (you'veprobably seen those numbers before). After five generations, you'reonly at 32, but ten generations takes you to 1024, fifteen generationsgets you to over 32,000, twenty generations takes you past one million,and it just keeps climbing.

And this too has a real-world analogy. Several, in fact. If, forinstance, you start with two people (call them "Adam" and"Eve" if you wish), and assume that every couple has fouroffspring then dies, then you get exactly the above sequence exceptthat the first term is 2 rather than 1: 2 (Adam and Eve), 4 (theirchildren), 8 (their grandchildren), etc. (Incidentally, the humanrace has now reached this level: The population is doubling roughlyevery 40 years -- and that's down from doubling every 35 years orso in the mid-twentieth century.)

The text-critical analogy would be a scriptorium which,every ten years (say) copies every book in its library.If it starts with one book, at the end of ten years, it willhave two. After twenty years (two copying generations), it willhave four. After thirty years, it will have eight. Forty yearsbrings the total to sixteen. Fifty years ups the total to 32,and maybe it's time to hire a larger staff of scribes. After ahundred years, they'll be around a thousand volumes, after 200years, over a million volumes, and if they started in the fifthcentury and were still at it today, we'd be looking at convertingthe entire planet into raw materials for their library. Thatis how geometric sequences grow.

The sum of a geometric sequence is given by


(where, obviously, c is not equal to 0).

We should note that there is a more general form of a geometric sequence, andthe difference in results can be significant. This version has a second constantparameter, this time in the exponent:

gn = g0*c(d*n)

If d is small, the sequence grows more slowly; if d is negative,the sequence gradually goes toward 0. For example, the sequence

gn = 1*2(-1*n)

has the values

1, .5, .25, .125, ...,

and the sum of the sequence, if you add up all the terms, is 2.

An exponential sequence is a sort of an odd and specialrelative of a geometric sequence. It requires a parameter, x. Inthat case, the terms en are defined by the formula

en = xn/n!

where n! is the factorial, i.e. n*(n-1)*(n-2)*...3*2*1.

So if we take the case of x=2, for instance, we find
[e0 = 20/0! = 1/1 = 1]
e1 = 21/1! = 2/1 = 2
e2 = 22/2! = 4/2 = 2
e3 = 23/3! = 8/6 = 1.3333...
e4 = 24/4! = 16/24 = .6666...
e5 = 25/5! = 32/120 = .2666...

This sequence by itself isn't much use; its real value isthe associated series, which becomes the exponential functionex. But let's not get too deep into that....

We should note that not all sequences follow any of the abovepatterns -- remember, a sequence is just a list of numbers, althoughit probably isn't very meaningful unless we can find a patternunderlying it. But there are many possible patterns.Take, for instance, the famous fibonacci sequence1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144.... This sequence isdefined by the formula

an+1 = an+an-1

It will be observed that these numbers don't follow any of theabove patterns precisely. And yet, they have real-world significance(e.g. branches of plants follow fibonacci-like patterns), and thesequence was discovered in connection with a population-like problemsuch as we are discussing here: Fibonacci wanted to know thereproductive rate of rabbits, allowing that they needed time tomature: If you start with a pair of infant rabbits, they need onemonth (in his model) to reach sexual maturity. So the initialpopulation was 1. After a month, it's also 1. After another month,the rabbits have had a pair of offspring, so the population is now2. Of these 2, one is the original pair, which is sexually mature;the other is the immature pair. So the sexually mature pair hasanother pair of offspring, but the young pair doesn't. Now you havethree pair. In another month, you have two sexually mature pairs,and they have one pair of offspring, for a total of five. Etc.

This too could have a manuscript analogy. Suppose -- notunreasonably -- that a scriptorium insists that only "good"copies are worthy of reproduction. And suppose that thedefinition of "good" is in fact old. Suppose thatthe scriptorium has a regular policy of renewing manuscripts, andcreating new manuscripts only by renewal. And suppose a manuscriptbecomes "old" on its thirtieth birthday.

The scriptorium was founded with one manuscript. Thirty years later,it's still new, and isn't copied. After another thirty years, it hasbeen copied, and that's two. Thirty years later, it's copied again,and that's three. Etc. This precise process isn't really likely -- butit's a warning that we can't blythely assume manuscripts propagatein any particular manner.

And believe it or not, the geometric sequence is by no means thefastest-growing sequence one can construct using quite basic math.Consider this function:

hn = nn

The terms of that sequence (starting from h0) are
00=1, 111, 22=4,33=27, 44=256, 55=3125....

It can be shown that this sequence will eventually overtakeany geometric sequence, no matter how large the constantmultiplier in the geometric sequence. The graph shows this point.Observe that, even for n=4, it dwarfs the geometric sequence weused above, gn=2n. It would take somewhat longerto pass a geometric sequence with a higher constant, but it willalways overtake a geometric sequence eventually, when n issufficiently larger than the constant ratio of the geometric sequence.

These sequences may all seem rather abstract, despite the attempts to link the results to textual criticism. But this discussion has real-world significance. A major plank of the Byzantine Priority position is that numbers of manuscripts mean something. The idea is, more or less, that the number of manuscripts grows geometrically, and that the preponderance of Byzantine manuscripts shows that they were the (largest) basic population.

Observe that this is based on an unfounded assumption. We don't know the actual nature of the reproduction of manuscripts. But this model, from the numbers, looks false. (And if you are going to propose a model, it has to fit the numbers.) The simplest model of what we actually have does not make the Byzantine the original text. Rather, it appears that the Alexandrian is the original text, but that it had a growth curve with a very small (perhaps even negative) multiplier on the exponent. The Byzantine text started later but witha much larger multiplier.

Is that what actually happened? Probably not. The Fallacy of Number cuts both ways: It doesn't prove that the Byzantine text is early or late or anything else. But this is a warning to those who try to make more of their models than they are actually worth. In fact, no model proves anything unless it has predictive power -- the ability to yield some data not included in the original model. Given the very elementary nature of the data about numbers of manuscripts, it seems unlikely that we can produce a predictive model. But any model must at least fit the data!

One more point: We alluded to exponential or geometric decay, but we didn't do much with it. However, this is something of great physical significance, which might have textual significance too. Exponential decay occurs when a population has a growth parameter that is less than one. We gave the formula above:

gn = g0*c(d*n)

For 0 < c < 1.

More specifically, if the number of generations is n, the initial population is k, and the growth rate is d, then the population after n generations is

gn = kdn

A typical example of this is a single-elimination sports tournament. In this case, decay rate is one-half, and the starting population is the number of teams (usually 256, or 128, or 64, or 32, or 16). If we start with 128, then g0 is given by

g0 = 128*(.50) = 128After one generation, we have

g1 = 128*(.51) = 64

And so forth:

g2 = 128*(.52) = 32
g3 = 128*(.53) = 16
g4 = 128*(.54) = 8
g5 = 128*(.55) = 4
g6 = 128*(.56) = 2
g7 = 128*(.57) = 1

In other words, after seven rounds, you have eliminated all but one team, and declare a champion.

Instead of expressing this in generations, we can also express it in terms of time. The most basic physical example of this is the half-life of radioactive isotopes. The general formula for this is given by

N = N0e-γt

where N is the number of atoms of the isotope at the time t, N0 is the original sample (i.e. when t=0), e is the well-known constant, and γ is what is known as the "decay constant" -- the fraction of the sample which decays in a unit time period.

Usually, of course, we don't express the lifetime of isotopes in terms of decay constants but in terms of half-lives. A half-life is the time it takes for half the remaining sample to decay -- in terms of the above formula, the time t at which N=N0/2.

From this we can show that the half-life is related to the decay constant by the formula half-life = -ln(.5)/γ.

So if the half-life of our isotope is given as h, then the formula for decay becomes

N = N0eln(.5)t/h

Example: Let's say we start with 4000 atoms of an isotope (a very, very small sample, too small to see, but I'd rather not deal with all the zeroes we'd have if we did a real sample of an isotope). Suppose the half-life is 10 years. Then the formula above would become:

N = 4000*eln(.5)t/10

So if we choose t=10, for instance, we find that N=2000
At t=20, we have N=1000
At t=30, N=500

At t=100, we're down to about 4 atoms; after 120 years, we're down to about one atom, and there is no predicting when that last one will go away.

Of course, you could work that out just by counting half-lives. But the nice thing about the decay formula is that you can also figure out how many atoms there are after 5 years (2828), or 25 years (707), or 75 years (22).

And while this formula is for radioactive decay, it also applies to anything with a steady die-off rate. I seem to recall reading, somewhere, of an attempt to estimate the half-life of manuscripts. This is, of course, a very attractive idea -- if we could do it, it would theoretically allow us to estimate the number of manuscripts of a given century based on the number of surviving manuscripts (note that the above formula can be run both ways: It can give us the number of atoms/manuscripts fifty or a hundred or a thousand years ago).

In a very limited way, the idea might be useful: A papyrus manuscript can only survive a certain amount of use, so we could estimate the rate at which manuscripts would reach the end of their useful life by some sort of formula. But this would apply only to papyri, and only to papyri during the period when they are being used. Unfortunately, it seems unlikely that such a model could actually predict past manuscript numbers.

For more on this concept, see the section on Carbon Dating in the article on Chemistry.

Rigour, Rigorous Methods

Speaking informally (dare I say "without rigour?") rigour is the mathematical term for "doing it right." To be rigourous, a proof or demonstration must spell out all its assumptions and definitions, must state its goal, and must proceed in an orderly way to that goal. All steps must be exactly defined and conform to the rules of logic (plus whatever other axioms are used in the system).

The inverse of a rigourous argument is the infamous "hand-waving" proof, in which the mathematician waves his or her hand at the blackboard and says, "From here it is obvious that...."

It should be noted that rigour is not necessarily difficult; the following proof is absolutely rigorous but trivially simple:

To Prove: That (a-b)(a+b) = a2 - b2  PROOF:  (a-b)(a+b) = a(a+b) - b(a+b)    Distributing             = a2 + ab - ba - b2  Distributing             = a2 - b2            Adding  Q.E.D.

It should be noted that rigour is required for results to be considered mathematically correct. It is not enough to do a lot of work! It may strike textual critics as absurd to say that the immense and systematic labours of a Zuntz or a Wisse are not rigorous, while the rather slapdash efforts of Streeter are -- but it is in fact the case. Streeter worked from a precise definition of a "Cæsarean" reading: A reading found in at least two "Cæsarean" witnesses and not found in the Textus Receptus. Streeter's definition is poor, even circular, but at least it is a definition -- and he stuck with it. Wisse and Zuntz were more thorough, more accurate, and more true-to-life -- but they are not rigourous, and their results therefore cannot be regarded as firm.

Let us take the Claremont Profile Method as an example. A portion of the method is rigorous: Wisse's set of readings is clearly defined. However, Wisse's groups are not defined. Nowhere does he say, e.g., "A group consists of a set of at least three manuscripts with the following characteristics: All three cast similar profiles (with no more than one difference per chapter), with at least six differences from Kx, and at least three of these differences not shared by any other group." (This probably is not Wisse's definition. It may not be any good. But at least it is rigourous.)

Mathematical and statistical rigour is necessary to produce accurate results. Better, mathematically, to use wrong definitions and use them consistently than to use imprecise definitions properly. Until this standard is achieved, all results of textual criticism which are based on actual data (e.g. classification of manuscripts into text-types) will remain subject to attack and interpretation.

The worst problem, at present, seems to be with definitions. We don't have precise definitions of many important terms of the discipline -- including even such crucial things as the Text-Type.

In constructing a definition, the best place to start is often with necessary and sufficient conditions. A necessary condition is one which has to be true for a rule or definition to apply (for example, for it to be raining, it is necessary that it be cloudy. Therefore clouds are a necessary condition for rain). Note that a necessary condition may be true without assuring a result -- just as it may be cloudy without there being rain.

A sufficient condition ensures that a rule or definition applies (for example, if it is raining, we know it is cloudy. So rain is a sufficient condition for clouds). Observe that a particular sufficient condition need not be fulfilled for an event to take place -- as, e.g., rain is just one of several sufficient conditions for clouds.

For a particular thing to be true, all necessary conditions must be fulfilled, and usually at least one sufficient condition must also be true. (We say "usually" because sometimes we will not have a complete list of sufficient conditions.) A comprehensive definition will generally have to include both. (This does not mean that we have to determine all necessary and sufficient conditions to work on a particular problem; indeed, we may need to propose incomplete or imperfect definitions to test them. But we generally are not done until we have both.)

Let's take an example. Colwell's "quantitative method" is often understood to state that two manuscripts belong to the same text-type if they agree in 70% of test readings. But this is demonstrably not an adequate definition. It may be that the 70% rule is a necessary condition (though even this is subject to debate, because of the problem of mixed manuscripts). But the 70% rule is not a sufficient condition. This is proved by the Byzantine text. Manuscripts of this type generally agree in the 90% range. A manuscript which agrees with the Byzantine text in only 70% of the cases is a poor Byzantine manuscript indeed. It may, in fact, agree with some other text-type more often than the Byzantine text. (For example, 1881 agrees with the Byzantine text some 70-75% of the time in Paul. But it agrees with 1739, a non-Byzantine manuscript, about 80% of the time.) So the sufficient condition for being a member of the Byzantine text is not 70% agreement with the Byzantine witnesses but 90% agreement.

As a footnote, we should note that the mere existence of rigour does not make a conclusion correct. A rigorous proof is only as accurate as its premises. Let us demonstrate this by assuming that 1=0. If so, we can construct the following "proof":

To Prove: That 2+2=5    PROOF:    2+2 = 4    [Previously known]So  2+2 = 4+0  [since x=x+0 for any x]        = 4+1  [since 1=0]        = 5    [by addition]  Q.E.D.

But it should be noted that, while a rigorous demonstration is only as good as its premises, a non-rigorous demonstration is not even that good. Thus the need for rigour -- but also for testing of hypotheses. (This is where Streeter's method, which was rigorous, failed: He did not sufficiently examine his premises to see if they made sense in the real world.)

Sampling and Profiles

Sampling is one of the basic techniques in science. Its purpose is to allow intelligent approximations of information when there is no way that all the information can be gathered. For example, one can use sampling to count the bacteria in a lake. To count every bacterium in a large body of water is generally impractical, so one takes a small amount of liquid, measures the bacteria in that, and generalizes to the whole body of water.

Sampling is a vast field, used in subjects from medicine to political polling. There is no possible way for us to cover it all here. Instead we will cover an area which has been shown to be of interest to many textual critics: The relationship between manuscripts. Anything not relevant to that goal will be set aside.

Most textual critics are interested in manuscript relationships, and most will concede that the clearest way to measure relationship is numerically. Unfortunately, this is an almost impossible task. To calculate the relationship between manuscripts directly requires that each manuscript be collated against all others. It is easy to show that this cannot be done. The number of collation operations required to cross-compare n manuscripts increases on the order of n2 (the exact formula is (n2-n)÷2). So to collate two manuscripts takes only one operation, but to cross-collate three requires three steps. Four manuscripts call for six steps; five manuscripts require ten steps. To cross-collate one hundred manuscripts would require 4950 operations; to cover six hundred manuscripts of the Catholic Epistles requires 179,700 collations. To compare all 2500 Gospel manuscripts requires a total of 3,123,750 operations. All involving some tens of thousands of points of variation.

It can't be done. Not even with today's computer technology. The only hope is some sort of sampling method -- or what textual scholars often call "profiling."

The question is, how big must a profile be? (There is a secondary question, how should a profile be selected? but we will defer that.) Textual scholars have given all sorts of answers. The smallest I have seen was given by Larry Richards (The Classification of the Greek Manuscriptsof the Johannine Epistles, Scholars Press, 1977, page 189), who claimed that he could identify a manuscript of the Johannine Epistles as Alexandrian on the basis of five readings! (It is trivially easy to disprove this; the thoroughly Alexandrian minuscules 33 and 81 share only two and three of these readings, respectively.)

Other scholars have claimed that one must study every reading. One is tempted to wonder if they are trying to ensure their continued employment, as what they ask is neither possible nor necessary.

A key point is that the accuracy of a sample depends solely on the size of the sample, not on the size of the population from which the sample is taken. (Assuming an unbiased sample, anyway.) In other words, what matters is how many tests you make, not what percentage of the population you test. As John Allen Paulos puts it (A Mathematician Reads the Newspaper, p. 137), "[W]hat's critical about a random sample is its absolute size, not its percentage of the population. Although it may seem counterintuitive, a random sample of 500 people taken from the entire U. S. population of 260 million is generally far more predictive of its population (has a smaller margin of error) than a random sample of 50 taken from a population of 2,600."

What follows examines how big one's sample ought to be. For this, we pull a trick. Let us say that, whatever our sample of readings, we will assign the value one to a reading when the two manuscripts we are examining agree. If the two manuscripts disagree, we assign the value zero.

The advantage of this trick is that it makes the Mean value of our sample equal to the agreement rate of the manuscripts. (And don't say "So what?" This means that we can use the well-established techniques of sampling, which help us determine the mean, to determine the agreement rate of the manuscripts as well.)

Our next step, unfortunately, requires a leap of faith. Two of them, in fact, though they are both reasonable. (I have to put this part in. Even though most of us -- including me -- hardly know what I'm talking about, I must point out that we are on rather muddy mathematical ground here.) We have to assume that the Central Limits Theorem applies to manuscript readings (this basically requires that variants are independent -- a rather iffy assumption, but one we can hardly avoid) and that the distribution of manuscripts is not too pathological (probably true, although someone should try to verify it someday). If these assumptions are true, then we can start to set sample sizes. (If the assumptions are not true, then we almost certainly need larger sample sizes. So we'd better hope this is true).

Not knowing the characteristics of the manuscripts, we assume that they are fairly typical and say that, if we take a sample of 35-50 readings, there is roughly a 90% chance that the sample mean (i.e. the rate of agreement in our sample) is within 5% of the actual mean of the whole comparison. That is, for these two manuscripts, if you take 50 readings, there is a 90% chance that the rate of agreement of these two manuscripts in the sample will be within 5% of their rate of agreement everywhere.

But before you say, "Hey, that's pretty easy; I can live with 50 readings," realize that this is the accuracy of one comparison. If you take a sample of fifty and do two comparisons, the percent that both are within 5% falls to 81% (.9 times .9 equals .81). Bring the number to ten comparisons (quite a small number, really), and you're down to a 35% chance that they will all be that accurate. Given that a 5% error for any manuscript can mean a major change in its classification, the fifty-reading sample is just too small.

Unfortunately, the increase in sample accuracy goes roughly as the root of the increase in sample size. (That is, doubling your sample size will increase your accuracy by less than 50%). Eventually taking additional data ceases to be particularly useful; you can't add enough data to significantly improve your accuracy.

Based on our assumptions, additional data loses most of its value at about 500 data points (sample readings in the profile). At this point our accuracy on any given comparison is on the order of 96%.

Several observations are in order, however.

First, even though I have described 500 as the maximum useful value, in practice it is closer to the minimum useful value for a sample base in a particular corpus. The first reason is that you may wish to take subsamples. (That is, if you take 500 samples for the gospels as a whole, that leaves you with only 125 or so for each gospel -- too few to be truly reliable. Or you might want to take characteristically Alexandrian readings; this again calls for a subset of your set.) Also, you should increase the sample size somewhat to account for bias in the readings chosen (e.g. it's probably easier to take a lot of readings from a handful of chapters -- as in the Claremont Profile Method -- than to take, say, a dozen from every chapter of every book. This means that your sample is not truly random).

Second, remember the size of the population you are sampling. 500 readingsin the Gospels isn't many. But it approximates the entire base of readingsin the Catholics. Where the reading base is small, you can cut back thesample size somewhat.

The key word is "somewhat." Paulos's warning is meaningful. 10% of significant variants is probably adequate in the Gospels, where there are many, many variants. That won't work in the Catholics. If, in those books, you regard, say, 400 points of variation as significant, you obviously can't take 500 samples. But you can't cut back to 40 test readings, because that's too small a sample to be statistically meaningful, and it's too small a fraction of the total to test the whole "spectrum" of readings.

On this basis, I suggest the following samples sizes if they can be collected:

To those who think this is too large a sample, I point out the example of political polling: It is a rare poll that samples fewer than about a thousand people.

To those who think the sample is too large, I can only say work the math. For the Münster "thousand readings" information, for instance, there are about 250 variants studied for Paul. That means about a 94% chance that any given comparison is accurate to within 5%. However, their analysis shows the top 60 or so relatives for each manuscript, that means there is a 97% chance that at least one of those numbers is off by 5%. As a measure of which manuscripts are purely Byzantine it's probably (almost) adequate, as long as you don't care about block-mixed manuscripts and don't try to look at individual books, but it is not sufficient to determine complete kinship.

An additional point coming out of this is that you simply can't determine relationships in very small sections -- say, 2 John or 3 John. If you have only a dozen test readings, they aren't entirely meaningful even if you test every variant in the book. If a manuscript is mixed, it's perfectly possible that every reading of your short book could -- purely by chance -- incline to the Alexandrian or Byzantine text. Results in these short books really need to be assessed in the light of the longer books around them. Statisticians note that there are two basic sorts of errors in assessing data, which they prosaically call "Type I" and "Type II." A Type I error consists of not accepting a true hypothesis, while a Type II error consists of accepting a false hypothesis. The two errors are, theoretically, equally severe, but different errors have different effects. In the context of textual criticism and assessing manuscripts, the Type II error is clearly the more dangerous. If a manuscript is falsely included in a text grouping, it will distort the readings of that group (as when Streeter shoved many Byzantine groups into the "Cæsarean" text). Failing to include a manuscript, particularly a weak manuscript, in a grouping may blur the boundaries of a grouping a little, but it will not distort the group. Thus it is better, in textual criticism, to admit uncertainty than to make errors.

At this point we should return to the matter of selecting a sample. There are two ways to go about this: The "random sample" and the "targeted sample." A random sample is when you grab people off the street, or open a critical apparatus blindly and point to readings. A targeted sample is when you pick people, or variants, who meet specific criteria.

The two samples have different advantages. A targeted sample allows you to get accurate results with fewer tests -- but only if you know the nature of the population you are sampling. For example, if you believe that 80% of the people of the U.S. are Republicans, and 20% are Democrats, and create a targeted sample which is 80% Republican and 20% Democratic, the results from that sample aren't likely to be at all accurate (since the American population, as of when this is written, is almost evenly divided between Democrats, Republicans, and those who prefer neither party). Whereas a random survey, since it will probably more accurately reflect the actual numbers, will more accurately reflect the actual situation.

The problem is, a good random sample needs to be large -- much larger than a targeted sample. This is why political pollsters, almost without exception, choose targeted samples.

But political pollsters have an advantage we do not have: They have data about their populations. Census figures let them determine how many people belong to each age group, income category, etc. We have no such figures. We do not know what fraction of variants are Byzantine versus Western and Alexandrian, or Alexandrian versus Western and Byzantine, or any other alignment. This means we cannot take a reliable target sample. (This is the chief defect of Aland's "Thousand Readings" as well as of Hutton's "Triple Readings": We have no way of knowing if these variants are in any way representative. Indeed, in Hutton's case, there is good reason to believe that they are not.) Until we have more data than we have, we must follow one of two methods: Random sampling, or complete sampling of randomly selected sections. Or, perhaps, a combination of the two -- detailed sampling at key points to give us a complete picture in that area, and then a few readings between those sections to give us a hint of where block-mixed manuscripts change type. The Thousand Readings might serve adequately as these "picket" readings -- though even here, one wonders at their approach. In Paul, at least, they have too many "Western"-only readings. Our preference would surely be for readings where the Byzantine text goes against everything else, as almost all block-mixed manuscripts are Byzantine-and-something-else mixes, and we could determine the something else from the sections where we do detailed examination.


"Saturation" is a word used in all sorts of fields, sometimes for amazingly dissimilar concepts, but it has a specific use in science (and related mathematics) which is highly relevant to textual criticism. It refers to a situation in which meaningful data is overwhelmed by an excess of non-meaningful data. As some would put is, the "signal" is overwhelmed by the "noise."

An example of where this can be significant comes from biology, in the study of so-called "junk DNA." (A term sometimes used rather loosely for non-coding DNA, but I am referring specifically to DNA which has no function at all.) Junk DNA, since it does not contain any useful information, is free to mutate, and the evidence indicates that it mutates at a relatively constant rate. So, for relatively closely related creatures, it is possible to determine just how closely related they are by looking at the rate of agreement in their junk DNA.

However, because junk DNA just keeps mutating, over time, you get changes to DNA that has already been changed, and changes on top of changes, and changes that cause the DNA to revert to its original state, and on and on. Eventually you reach a point where there have been so many changes that too little of the original DNA is left for a comparison to be meaningful: Many of the agreements between the two DNA sets are coincidental. This point is the saturation point. It's often difficult to know just what this point is, but there can be no real doubt that it exists.

This concept is an important one to textual critics concerned with just which variants are meaningful. The general rule is to say that orthographic variants are not meaningful, but larger variants are. This is probably acceptable as a rule of thumb, but it is an oversimplification of the concept of saturation. A scribe has a certain tendency to copy what is before him even if it does not conform to his own orthographic rules. It's just that the tendency is less than in the case of "meaningful" variants. W. L. Richards, The Classification of the Greek Manuscripts of the Johannine Epistles, went to a great deal of work to show that variants like ν-movable and itacisms were not meaningful for grouping manuscripts, but his methodology, which was always mathematically shaky, simply ignored saturation. The high likelihood is that, for closely-related manuscripts, such variants aremeaningful; they simply lose value in dealing with less-related manuscripts because of saturation. In creating loose groups of manuscripts, such as Richards was doing, orthographic variants should be ignored. But we should probably at least examine them when doing stemmatic studies of closely-related manuscripts such as Kr.

Significant Digits

You have doubtless heard of "repeating fractions" and "irrational numbers" -- numbers which, when written out as decimals, go on forever. For example, one-third as a decimal is written .3333333..., while four-elevenths is .36363636.... Both of these are repeating fractions. Irrational numbers are those numbers like π and e and √2 which have decimals which continue forever without showing a pattern. Speaking theoretically, any physical quantity will have an infinite decimal -- though the repeating digit may be zero, in which case we ignore it.

But that doesn't mean we can determine all those infinite digits!

When dealing with real, measurable quantities, such as manuscript kinship, you cannot achieve infinite accuracy. You just don't have enough data. Depending on how you do things, you may have a dozen, or a hundred, or a thousand points of comparison. But even a thousand points of comparison only allows you to carry results to three significant digits.

A significant digit is the portion of a number which means something. You start counting from the left. For example, say you calculate the agreement between two manuscripts to be 68.12345%. The first and most significant digit here is 6. The next most significant digit is 8. And so forth. So if you have enough data to carry two significant digits (this requires on the order of one hundred data points), you would express your number as 68%. If you had enough data for three significant digits, the number would be 68.1%. And so forth.

See also Accuracy and Precision.

Standard Deviation and Variance

Any time you study an experimental distribution (that is, a collection of measurementsof some phenomenon), you will notice that it "spreadsout" or "scatters" a little bit. You won't get the same outputvalue for every input value; you probably won't even get the same output value forthe same input value if you make repeated trials.

This "spread" can be measured. The basic measure of "spread"is the variance or its square root, the standard deviation.(Technically, the variance is the "second moment about the mean,"and is denoted μ2; the standard deviation is σ. Butwe won't talk much about moments; that's really a physics term, and doesn'thave any meaning for manuscripts.) Whatever you call them,larger these numbers, the more "spreadout" the population is.

Assume you have a set of n data points, d1, d2, d3,...dn.Let the arithmetic mean of this set be m. Then the variance canbe computed by either of two formulae,


(d1-m)2 + (d2-m)2+ ... + (dn-m)2


n(d12 + d22+ ... + dn2) - (d1 + d2+ ... + dn)2

To get the standard deviation, just take the square root of either ofthe above numbers.

The standard deviation takes work to understand. Whether a particularvalue for σ is"large" or "small" depends very much on the scale ofthe sample. Also, the standard deviation should not be misused. It is oftensaid that, for any sample, two-thirds of the values fall within one standarddeviation of the mean, and 96% fall within two. This is simply not true.It is only true in the case of special distributions, most notablywhat is called a "normal distribution"-- that is, one that has the well-known "bell curve" shape.

A "bell curve" looks something like this:

Normal Curve

Notice that this bell curve is symmetrical and spreads out smoothlyon both sides of the mean. (For more on this topic, see the sectionon Binomials and the Binomial Distribution).

Not so with most of the distributions we will see. As an example, let'stake the same distribution (agreements with 614in the Catholics) that we used in the section on the meanabove. If we graph this one, it looks as follows:

O |
c |
c |
u |                 *
r |                 *
e |                 *
n |                 *
c |                 * * *
e |               * * * * *     *
s |         *   * * * * * *     * *     *
%   1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1
    0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0

This distribution isn't vaguely normal (note that the mode is at 50%, butthe majority of values are larger than this, with very few manuscripts havingagreements significantly below 50%), but we can still compute thestandard deviation. In the section on the mean we determined the averageto be 57.3. If we therefore plug these values into the first formula forthe variance, we get


Doing the math gives us the variance of 5648.96÷24=235.37 (your number mayvary slightly, depending on roundoff). The standard deviationis the square root of this, or 15.3.

Math being what it is, there is actually another "standard deviation"you may find mentioned. This is the standard deviation for a sample ofa population (as opposed to the standard deviation for an entire population).It is actually an estimate -- a guess at what the limits of thestandard deviation would be if you had the entire population rather thana sample. Since this is rather abstract, I won't get into it here; sufficeit to say that it is calculated by taking the square root of the samplevariance, derived from modified forms of the equations above


(d1-m)2 + (d2-m)2+ ... + (dn-m)2


n(d12 + d22+ ... + dn2) - (d1 + d2+ ... + dn)2

It should be evident that this sample standard deviation is always slightlylarger than the population standard deviation.

How much does all this matter? Let's take a real-world example -- not onerelated to textual criticism, this time, lest I be accused of cooking things (since Iwill have to cook my next example).This one refers to the heights of men and women ages 20-29 in the United States(as measured by the 2000 Statistical Abstract of the United States).The raw data is as follows:

Height (cm/feet and inches)Men %Women % Men TotalWomen Total
under 140 (under 4'8"")00.600.6
140-145 (4'8"-4'10")00.601.2
145-150 (4'10"-5'0")
150-155 (5'0"-5'2")0.415.80.521.8
155-160 (5'2"-5'4")2.927.13.448.9
160-165 (5'4"-5'6")8.325.111.774.0
165-170 (5'6"-5'8")20.318.43292.4
170-175 (5'8"-5'10")
175-180 (5'10"-6'0")22.51.481.2100
180-185 (6'0"-6'2")13.5094.7100
Over 1855.30100100

The first column gives the height range. The second gives the totalpercent of the population of men in this height range. The third givesthe percent of the women. The fourth gives the total percentage of menno taller than the height in the first column; the fifth is the total women notaller than the listed height.

The median height for men is just about 174 centimeters; for women,160 cm. Not really that far apart, as we will see if we graph the data(I will actually use a little more data than I presented above):

Height Graph

On the whole, the two graphs (reddish for women, blue for men)are quite similar: Same general shape, with the peaks slightly separatebut only slightly so -- separated by less than 10%.

But this general similarity conceals some real differences. If you see someone 168 cm. tall, for instance (the approximate point at which the two curves cross), you cannot guess, based on height, whether the person is male or female; it might be a woman of just more than average height, or a man of just less than average. But suppose you see someone 185 cm. tall (a hair over 6'2"). About five percent of men are that tall; effectively no women are that tall. Again, if you see a person who is 148 cm. (4'11"), and you know the person is an adult, you can be effectively sure that the person is female.

This is an important and underappreciated point. So is the effect of the standard deviation. If two populations have the same mean, but one has a larger standard deviation than the other, a value which is statistically significant in one sample may not be in another sample.

Why does this matter? It very much affects manuscript relationships. If it were possible to take a particular manuscript and chart its rates of agreements, it will almost certainly result in a graph something like one of those shown below:

  |O |c |                                 *c |                                 *u |                                 *r |                                 *e |                                 *n |                                **c |                               ***e |                             ******s |                      **************-------------------------------------------%   1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1    0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0                                        0

O |c |c |u |r |                         *e |                        **n |                        **c |                       ****e |                      ******** *s |                   ***************-------------------------------------------%   1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1    0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0                                        0

O |c |c |u |r |e |                     **n |                    ****c |                    ******e |                   *********s |              *    ********* * *-------------------------------------------%   1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1    0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0                                        0

The first of these is a Byzantine manuscript of some sort -- the large majority of manuscripts agree with it 80% of the time or more, and a large fraction agree 90% of the time or more. The second is Alexandrian -- a much flatter curve (one might almost call it "mushy"), with a smaller peak at a much lower rate of agreements. The third, which is even more mushy, is a wild, error-prone text, perhaps "Western." Its peak is about as high as the Alexandrian peak, but the spread is even greater.

Now several points should be obvious. One is that different manuscripts have different rates of agreement. If a manuscript agrees 85% with the first manuscript, it is not a close relative at all; you need a 90% agreement to be close. On the other hand, if a manuscript agrees 85% with manuscript 2, it probably is a relative, and if it agrees 85% with manuscript 3, it's probably a close relative.

So far, so good; the above is obvious (which doesn't mean that people pay any attention, as is proved by the fact that the Colwell 70% criterion still gets quoted). But there is another point, and that's the part about the standard deviation. The mean agreement for manuscript 1 is about 85%; the standard deviation is about 7%. So a manuscript that agrees with our first manuscript 8% more often than the average (i.e. 93% of the time) is a very close relative.

But compare manuscript 3. The average is about 62%. But this much-more-spread distribution has a standard deviation around 15%. A manuscript which agrees with #3 8% more often than the average (i.e. 70%) is still in the middle of the big clump of manuscripts. In assessing whether an agreement is significant, one must take spread (standard deviation) into account.

Statistical and Absolute Processes

Technically, the distinction we discuss here is scientific rather than mathematical. But it also appears to be a source of great confusion among textual critics, and so I decided to include it.

To speak informally, a statistical process is one which "tends to be true," while an absolute process is one which is always true. Both, it should be noted, are proved statistically (by showing that the rule is true for many, many examples) -- but a single counterexample does not prove a statistical theory wrong, while it does prove an absolute theory wrong.

For examples, we must turn to the sciences. Gravity, for instance, is an absolute process: The force of gravitational attraction is always given by F=gm1m2/r2 (apart from the minor modifications of General Relativity, anyway). If a single counterexample can be verified, that is the end of universal gravitation.

But most thermodynamic and biological processes are statistical. For example, if you place hot air and cold air in contact, they will normally mix and produce air with an intermediate temperature. However, this is a statistical process, and if you performed the experiment trillions of trillions of times, you might find an instance where, for a few brief moments, the hot air would get hotter and the cold colder. This one minor exception does not prove the rule. Similarly, human children are roughly half male and half female. This rule is not disproved just because one particular couple has seven girl children and no boys.

One must be very careful to distinguish between these two sorts of processes. The rules for the two are very different. We have already noted what is perhaps the key difference: For an absolute process, a single counterexample disproves the rule. For a statistical process, one must have a statistically significant number of counterexamples. (What constitutes a "statistically significant sample" is, unfortunately, a very complex matter which we cannot delve into here.)

The processes of textual criticism are, almost without exception, statistical processes. A scribe may or may not copy a reading correctly. A manuscript may be written locally or imported. It may or may not be corrected from a different exemplar. In other words, there are no absolute rules. Some have thought, e.g., to dismiss the existence of the Alexandrian text because a handful of papyri have been found in Egypt with non-Alexandrian texts. This is false logic, as the copying and preservation of manuscripts is a statistical process. The clear majority of Egyptian papyri are Alexandrian. Therefore it is proper to speak of an Alexandrian text, and assume that it was dominant in Egypt. All we have shown is that its reign was not "absolute."

The same is true of manuscripts themselves. Manuscripts can be and are mixed. The presence of one or two "Western" readings does not make a manuscript non-Alexandrian; what makes it non-Alexandrian is a clear lack of Alexandrian readings. By the same argument, the fact that characteristically Byzantine readings exist before the fourth century does not mean that the Byzantine text as a whole exists at that date. (Of course, the fact that the Byzantine text cannot be verified until the fifth century does not mean that the text is not older, either.)

Only by a clear knowledge of what is statistical and what is absolute are we in a position to make generalizations -- about text-types, about manuscripts, about the evolution of the text.

Statistical Significance

Loosely speaking, a term for "it means something." That is, something that is statistically significant is something we believe is "real," not just the result of coincidence or noisy data. Note that something can appear to be significant without actually being so, or it can appear insignificant when in fact there is a correlation. Unfortunately, this fact, and the term itself, is often misunderstood in textual criticism; for some background on this, see the entry on p-Hacking.

Tree Theory

A branch of mathematics devoted to the construction of linkages between items -- said linkages being called "trees" because, when sketched, these linkages look like trees.

The significance of tree theory for textual critics is that, using tree theory, one can construct all possible linkages for a set of items. In other words, given n manuscripts, tree theory allows you to construct all possible stemma for these manuscripts.

Trees are customarily broken up into three basic classes: Free trees, Rooted trees, and Labelled trees. Loosely speaking, a free tree is one in which all items are identical (or, at least, need not be distinguished); rooted trees are trees in which one item is distinct from the others, and labelled trees are trees in which all items are distinct.

The distinction between tree types is important. A stemma of manuscripts is a labelled tree (this follows from the fact that each manuscript has a particular relationship with all the others; to say, for instance, that Dabs is copied from Dp is patently not the same as to say that Dp is copied from Dabs!), and for any given n, the number of labelled trees with n elements is always greater or equal to the number of rooted trees, which is greater than or equal to the number of free trees. (For real-world trees, with more than two items, the number of labelled trees is always strictly greater than the others).

The following demonstrates this point for n=4. We show all free and labelled trees for this case. For the free trees, the items being linked are shown as stars (*); the linkages are lines. For the labelled trees, we assign letters, W, X, Y, Z.

Free Trees for n=4 (Total=2)

*|*     *   *|      \ /*       *|       |*       *

Labelled Trees for n=4 (Total=16)

W     W     W     W     W     W     X     X|     |     |     |     |     |     |     |X     X     Y     Y     Z     Z     W     Y|     |     |     |     |     |     |     |Y     Z     X     Z     X     Y     Y     W|     |     |     |     |     |     |     |Z     Y     Z     X     Y     X     Z     ZY     Y     Y     Y|     |     |     |W     W     Z     X     X   Y     W   Y     W   X     W   X|     |     |     |     |  /      |  /      |  /      |  /X     Z     W     W     | /       | /       | /       | /|     |     |     |     |/        |/        |/        |/Z     X     X     Z     W---Z     X---Z     Y---Z     Z---Y

We should note that the above is only one way to express these trees. For example, the first labelled tree, W--X--Y--Z, can also be written as

W---X     W   Y     W---X     W   Z   /      |  /|         |     |   |  /       | / |         |     |   | /        |/  |         |     |   |Y---Z     X   Z     Z---Y     X---Y

Perhaps more importantly, from the standpoint of stemmatics, is the fact that the following are equivalent:

B   C      C   D    B   D    B   C|  /       |  /     |  /     |  /| /        | /      | /      | /|/         |/       |/       |/A---D      A        A        A           |        |        |           |        |        |           |        |        |           B        C        D

And there are other ways of drawing this. These are all topologically equivalent. Without getting too fancy here, to say that two trees are topologically equivalent is to say that you can twist any equivalent tree into any other. Or, to put it another way, while all the stemma shown above could represent different manuscript traditions, they are one and the same tree. To use the trees to create stemma, one must differentiate the possible forms of the tree.

This point must be remembered, because the above trees do not have a true starting point (a root). The links between points have no direction, and any one could be the ancestor. For example, both of the following stemma are equivalent to the simple tree A--B--C--D--E:

   B           C  / \         / \ /   \       /   \A     C     B     D      |     |     |      D     A     E      |      E

Thus the number of possible stemma for a given n is larger than the number of labelled trees. Fortunately, if one assumes that only one manuscript is the archetype, then the rest of the tree sorts itself out once you designate that manuscript. (Think of it like water flowing downstream: The direction of each link must be away from the archetype.) So the number of possible stemma for a given n is just n times the number of possible trees.

Obviously this number gets large very quickly. Tree theory has no practical use in dealing with the whole Biblical tradition, or even with a whole text-type. Its value lies in elucidating small families of manuscripts. (Biblical or non-Biblical.) Crucially, it lets you examine all possible stemma. Until this is done, one cannot be certain that your stemma is correct, because you cannot be sure that an alternate stemma does not explain facts as well as the one you propose. (Even the stemma produced by computer algorithms are not guaranteed to be the best; these programs try many, many stemma and look for the best of those they have tried. They don't automatically find the best stemma; they find the best of those they have examined, which is usually close to the absolute best.)

There is a theorem, Cayley's Theorem, which allows us to determine the number of spanning trees (topologically equivalent potential stemma). This can be used to determine whether tree theory is helpful. The formula says that the number of spanning trees s for a set of n items is given by n raised to the power n minus two, that is, s = n(n-2). So, for example, when n=4, the number of spanning trees is 42, or 16 (just as we saw above). For n=5, the number of trees is 53, or 125. For n=6, this is 64, or 1296. Obviously examining all trees for n much larger than 6 is impractical by hand. For the number of Biblical manuscripts, it's pretty impractical even by computer, which is why we tend to simplify the problem by reducing the nearly-alike Byzantine manuscripts to a sample. (It might prove possible to do it by computer, if we had some method for eliminating trees. Say we had eight manuscripts, A, B, C, D, E, F, G, H. If we could add rules -- e.g. that B, C, D, and G are later than A, E, F, and H, that C is not descended from D, F, G, or H, that E and F are sisters -- we might be able to reduce the number of stemma to some reasonable value.)

The weakness with using tree theory for stemmatics is one found in most genealogical and stemmatic methods: It ignores mixture. That is, a tree stemma generally assumes that every manuscript has only one ancestor, and that the manuscript is a direct copy, except for scribal errors, of this ancestor. This is, of course, demonstrably not the case. Many manuscripts can be considered to have multiple ancestors, with readings derived from exemplars of different types. We can actually see this in action for Dabs, where the "Western" text of D/06 has been mixed with the Byzantine readings supplied by the correctors of D. This gives us a rather complex stemma for the "Western" uncials in Paul. Let α be the common ancestor of these uncials, η be the common ancestor of F and G, and K be the Byzantine texts used to correct D. Then the sketch-stemma, or basic tree, for these manuscripts is

      α     / \    /   \   η     D     K  / \     \   / /   \     \ /F     G    Dabs

But observe the key point: Although this is a tree of the form

F \  \G--η--α--D--Dabs--K

we observe that the tree has two root points -- that is, two placeswhere the lines have different directions: at α and at Dabs. And it will be obvious that, for each additional root point we allow, we multiply the number of possible stemma by n-p (where n is the number of points and p is the number of possible root points).

For a related theory, see Cladistics.

Another point relevant to mixture and stemma is not really relevant to tree theory, since tree theory does not address mixture as such, but it's probably worth noting here. That is that any tree which contains a closed loop -- that is, any tree by which you can trace a path from one note of the tree back to itself without re-crossing a link between nodes -- contains mixture. So, for instance, take this tree:

                W               / \              /   \             /     \            X       Y             \     /              \   /               \ /                Z

Observe that, if we start from W, we can go around a loop W>Y>Z>X>W or W>X>Z>Y>W. The information here does not allow us to know which is the ancestral manuscript, but we know that one of the four must be a mixture derived from two of the others. Presumably the stemma is something like this:

  Archetype       | --------------- |              | V              W               / \              /   \             /     \            X       Y             \     /              \   /               \ /                Z

Appendix: Assessments of Mathematical Treatments of Textual Criticism

This section attempts to examine various mathematical arguments about textual criticism. No attempt is made to examine various statistical reports such as those of Richards. Rather, this reviews articles covering mathematical methodology. The length of the review, to some extent, corresponds to the significance of the article. Much of what follows is scathing. I don't like that, but any textual critic who wishes to claim to be using mathematics must endeavor to use it correctly!

E. C. Colwell & Ernest W. Tune: "Method in Establishing Quantitative Relationships Between Text-Types of New Testament Manuscripts"

This is one of the classic essays in textual criticism, widely quoted -- and widely misunderstood. Colwell and Tune themselves admit that their examination -- which is tentative -- only suggests their famous definition:

This suggests that the quantitative definitions of a text-type is a group of manuscripts that agree more than 70 per cent of the time and is separated by a gap of about 10 per cent from its neighbors.

(The quote is from p. 59 in the reprint in Colwell, Studies in Methodology)

This definition has never been rigorously tested, but let's ignore that and assume its truth. Where does this leave us?

It leaves us with a problem, is where it leaves us. The problem is sampling. The sample we choose will affect the results we find. This point is ignored by Colwell and Tune -- and has been ignored by their followers. (The fault is more that of the followers than of Colwell. Colwell's work was exploratory. The work of the followers resembles that of the mapmakers who drew sea monsters on their maps west of Europe because one ship sailed west and never came back.)

Let's take an example. Suppose we have a manuscript which we find -- after comprehensive examination -- agrees with the Alexandrian text in 72% of, say, 5000 readings. This makes it, by the definition, Alexandrian. But let's assume that these Alexandrian readings are scattered more or less randomly -- that is, in any reading, there is a 72% chance that it will be Alexandrian. It doesn't get more uniform than that!

Now let's break this up into samples of 50 readings -- about the size of a chapter in the Epistles. Mathematically, this makes our life very simple: To be Alexandrian 70% of the time in the sample, we need to have exactly 35 Alexandrian readings. If we have 36 Alexandrian readings, the result is 72% Alexandrian; if we have 34, we are at 68%, etc. This means that we can estimate the chances of these results using the binomial distribution.

Let's calculate the probabilities for getting samples with 25 to 50 Alexandrian readings. The first column shows how many Alexandrian readings we find. The second is the percentage of readings which are Alexandrian. The third shows the probability of the sample comtaining that many Alexandrian readings. The final column shows the probability of the sample showing at least that many Alexandrian readings.

of this result

Note what this means: In our manuscript, which by definition is Alexandrian, the probability is that 31.2% of our samples will fail to meet the Colwell criterion for the Alexandrian text -- that is, in the sample of 50 readings, they have 34 or fewer. It could similarly be shown that a manuscript falling short of the Alexandrian criterion (say, one that was 68% Alexandrian) would come up as an Alexandrian manuscript in about 30% of tested sections.

Another point: In any of those sections which proves non-Alexandrian, there is almost exactly a 50% chance that either the first reading or the last, possibly both, will be non-Alexandrian. If we moved our sample by one reading, there is a 70% chance that the added reading would be Alexandrian, and our sample would become Alexandrian. Should our assessment of a manuscript depend on the exact location of a chapter division?

This is not a nitpick; it is a fundamental flaw in the Colwell approach. Colwell has not given us any measure of variance. Properly, he should have provided a standard deviation, allowing us to calculate the odds that a manuscript was in fact a member of a text-type, even when it does not show as one. Colwell was unable to do this; he didn't have enough data to calculate a standard deviation. Instead, he offered the 10% gap. This is better than nothing -- in a sample with no mixed manuscripts, the gap is a sufficient condition. But because mixed manuscripts do exist (and, indeed, nearly every Alexandrian manuscript in fact has some mixed readings), the gap is not and cannot be a sufficient condition. Colwell's definition, at best, lacks rigour.

The objection may be raised that, if we can't examine the text in small pieces, we can't detect block mixture. This is not true. The table above shows the probability of getting a sample which is, say, only 50% Alexandrian, or less, is virtually nil (for a manuscript which is 70% Alexandrian overall) There is an appreciable chance (in excess of 4%) of getting a sample no more than 60% Alexandrian -- but the odds of getting two in a row no more than 60% Alexandrian are very slight. If you get a sample which is, say, 40% Alexandrian, or three in a row which are 60% Alexandrian, you have block mixture. The point is just that, if you have one sample which is 72% Alexandrian, and another which is 68% Alexandrian, that is not evidence of a change in text type. That will be within the standard deviation for almost any real world distribution.

The Colwell definition doesn't cover everything -- for example, two Byzantine manuscripts will usually agree at least 90% of the time, not 70%. But even in cases where it might seem to apply, one must allow for the nature of the sample. Textual critics who have used the Colwell definition have consistently failed to do so.

Let's take a real-world example, Larry W. Hurtado's Text-Critical Methodology and the Pre-Caesarean Text: Codex W in the Gospel of Mark. Take two manuscripts which everyone agrees are of the same text-type: ℵ and B. The following list shows, chapter by chapter, their rate of agreement (we might note that Hurtado prints more significant digits than his data can possibly support; I round off to the nearest actual value):

ChapterAgreement %

The mean of these rates of agreement is 79%. The median is 80%. The standard deviation is 3.97.

This is a vital fact which Hurtado completely ignores. His section on "The Method Used" (pp. 10-12) does not even mention standard deviations. It talks about "gaps" -- but of course the witnesses were chosen to be pure representatives of text-types. There are no mixed manuscripts (except family 13), so Hurtado can't tell us anything about gaps (or, rather, their demonstrable lack; see W. L. Richards, The Classification of the Greek Manuscripts of the Johannine Epistles) in mixed manuscripts. The point is, if we assume a normal distribution, it follows that roughly two-thirds of samples will fall within one standard deviation of the mean, and over nine-tenths will fall within two standard deviations of the mean. If we assume this standard deviation of almost 4 is no smaller than typical, that means that, for any two manuscripts in the fifteen sections Hurtado tests, only about ten chapters will be within an eight-percentage-point span around the mean, and only about fourteen will be within a sixteen point span. This simple mathematical fact invalidates nearly every one of Hurtado's conclusions (as opposed to the kinships he presupposed and confirmed); at all points, he is operating within the margin of error. It is, of course, possible that variant readings do not follow a normal distribution; we shouldn't assume that fact without proof. But Hurtado cannot ignore this fact; he must present distribution data!

"The Implications of Statistical Probability for the History of the Text"

When Wilbur N. Pickering published The Identity of the New Testament Text, he included as Appendix C an item, "The Implications of Statistical Probability for the History of the Text" -- an attempt to demonstrate that the Majority Text is mostly likely on mathematical grounds to be original. This is an argument propounded by Zane C. Hodges, allegedly buttressed by mathematics supplied by his brother David M. Hodges. We will see many instances, however, where Zane Hodges has directly contradicted the comments of David.

This mathematical excursus is sometimes held up as a model by proponents of the Byzantine text. It is therefore incumbent upon mathematicians -- and, more to the point, scientists -- to point out the fundamental flaws in the model.

The flaws begin at the very beginning, when Hodges asserts

Provided that good manuscripts and bad manuscripts will be copied an equal number of times, and that the probability of introducing a bad reading into a copy made from a good manuscript is equal to the probability of reinserting a good reading into a copy made from a bad manuscript, the correct reading would predominate in any generation of manuscripts. The degree to which the good reading would predominate depends on the probability of introducing the error.

This is all true -- and completely meaningless. First, it is an argument based on individual readings, not manuscripts as a whole. In other words, it ignores the demonstrable fact of text-types. Second, there is no evidence whatsoever that "good manuscripts and bad manuscripts will be copied an equal number of times." This point, if it is to be accepted at all, must be demonstrated. (In fact, the little evidence we have is against it. Only one extant manuscript is known to have been copied more than once -- that one manuscript being the Codex Claromontanus [D/06], which a Byzantine Prioritist would surely not claim is a good manuscript. Plus, if all manuscripts just kept on being copied and copied and copied, how does one explain the extinction of the Diatessaron or the fact that so many classical manuscripts are copied from clearly-bad exemplars?) Finally, it assumes in effect that all errors are primitive and from there the result of mixture. In other words, the whole model offered by Hodges is based on what he wants to have happened. This is a blatant instance of Assuming the Solution.

Hodges proceeds,

The probability that we shall reproduce a good reading from a good manuscript is expressed as p and the probability that we shall introduce an erroneous reading into a good manuscript is q. The sum of p and q is 1.

This, we might note, makes no classification of errors. Some errors, such as homoioteleuton or assimilation of parallels, are common and could occur independently. Others (e.g. substituting Lebbaeus for Thaddaeus or vice versa) are highly unlikely to happen independently. Thus, p and q will have different values for different types of readings. You might, perhaps, come up with a "typical" value for p -- but it is by no means assured (in fact, it's unlikely) that using the same p for all calculations will give you the same results as using appropriate values of p for the assorted variants.

In other words, Hodges offers us a bunch of assumptions, and zero data, and expects this garbage in to produce something other than garbage out.

It's at this point that Hodges actually launches into his demonstration, unleashing a machine gun bombardment of deceptive symbols on his unsuspecting readers. The explanation which follows is extraordinarily unclear, and would not be accepted by any math professor I've ever had, but it boils down to an iterative explanation: The number of good manuscripts (Gn) in any generation k, and the number of bad manuscripts (Bn), is in proportion to the number of good manuscripts in the previous generation (Gn-1), the number of bad manuscripts in the previous generation (Bn-1), the rate of manuscript reproduction (k, i.e. a constant, though there is no reason to think that it is constant), and the rate of error reproduction defined above (p and q, or, as it would be better denoted, p and 1-p).

There is only one problem with this stage of the demonstration, but it is fatal. Again, Hodges is treating all manuscripts as if composed of a single reading. If the Majority Text theory were a theory of the Majority Reading, this would be permissible (if rather silly). But the Majority Text theory is a theory of a text -- in other words, that there is a text-type consisting of manuscripts with the correct readings.

We can demonstrate the fallacy of the Good/Bad Manuscript argument easily enough. Let's take a very high value for the preservation/introduction of good readings: 99%. In other words, no matter how the reading arose in a particular manuscript, there is a 99% chance that it will be the original reading. Suppose we say that we will take 500 test readings (a very small number, in this context). What are the chances of getting a "Good" manuscript (i.e. one with all good readings?). This is a simple binomial; this is given by the formula p(m,n) as defined in the binomial section, with m=500, n=500, and p(good reading)=.99. This is surprisingly easy to calculate, since when n=m, the binomial coefficient vanishes, as does the term involving 1-p(o) (since it is raised to the power 0,and any number raised to the power 0 equals 1). So the probability of 500 good readings, with a 99% accuracy rate, is simply .99500=.0066. In other words, .66%. Somehow I doubt this is the figure Hodges was hoping for.

This is actually surprisingly high. Given that there are thousands of manuscripts out there, there probably would be a good manuscript. (Though we need to cut the accuracy only to 98% to make the odds of a good manuscript very slight -- .004%.) But what about the odds of a bad manuscript? A bad manuscript might be one with 50 bad readings out of 500. Now note that, by reference to most current definitions, this is actually a Majority Text manuscript, just not a very pure one. So what are the odds of a manuscript with 50 (or more) bad readings?

I can't answer that. My calculator can't handle numbers small enough to dothe intermediate calculations. But we can approximate. Looking at the terms of the binomial distribution, p(450,500) consists of a factorial term of the form (500*499*498...453*452*451)/(1*2*3...*48*49*50), multiplied by .99450, multiplied by .0150. I set up a spreadsheet to calculate this number. It comes out to (assuming I did this all correctly) 2.5x10-33. That is, .0000000000000000000000000000000025.Every other probability (for 51 errors, 52 errors, etc.) will be smaller. We'reregarding a number on the order of 10-31. So the odds of a Family π manuscript are infinitesimal. What are the odds of a manuscript such as B?

You can, of course, fiddle with the ratios -- the probability of error. But this demonstration should be enough to show the point: If you set the probabilities high enough to get good manuscripts, you cannot get bad. Similarly, if you set the probabilities low enough to get bad manuscripts, you cannot get good! If all errors are independent, every manuscript in existence will be mixed.

Now note: The above is just as much a piece of legerdemain as what Hodges did. It is not a recalculation of his results. It's reached by a different method. But it does demonstrate why you cannot generalize from a single reading to a whole manuscript! You might get there by induction (one reading, two readings, three readings...), but Hodges did not use an induction.

If you want another demonstration of this sort, see the section on Fallacies. This demonstrates, unequivocally, that the Hodges model cannot explain the early age of either the Alexandrian or the "Western" texts of the gospels.

Readers, take note: The demonstration by Hodges has already been shown to be completely irrelevant. A good mathematician, presented with these facts, would have stopped and said, "OK, this is a bunch of garbage." It will tell you something about Hodges that he did not.

Having divorced his demonstration from any hint of reality, Hodges proceeds to circle Robin Hood's Barn in pursuit of good copies. He wastes two paragraphs of algebra to prove that, if good reading predominate, you will get good readings, and if bad reading predominate, you will get bad readings. This so-called proof is a tautology; he is restating his assumptions in different form.

After this, much too late, Hodges introduces the binomial distribution. But he applies it to manuscripts, not readings. Once again, he is making an invalid leap from the particular to the general. The numbers he quotes are not relevant (and even he admits that they are just an example).

At this point, a very strange thing occurs: Hodges actually has to admit the truth as supplied by his brother: "In practice, however, random comparisons probably did not occur.... As a result, there would be branches of texts which would be corrupt because the majority of texts available to the scribe would contain the error." In other words, David Hodges accepts -- even posits -- the existence of text-types. But nowhere does the model admit this possibility. Instead, Zane C. Hodges proceeds to dismiss the problem: "In short, then, our theoretical problem sets up conditions for reproducing an error which are somewhat too favorable to reproducing the error." This is pure, simple, and complete hand-waving. Hodges offers no evidence to support his contention, no mathematical basis, no logic, and no discusison of probabilities. It could be as he says. But there is no reason to think it is as he says.

And at about this point, David Hodges adds his own comment, agreeing with the above: "This discussion [describing the probability of a good reading surviving] applies to an individual reading and should not be construed as a statement of probability that copied manuscripts will be free of error." In other words, David Hodges told Zane Hodges the truth -- and Zane Hodges did not accept the rebuttal.

Zane Hodges proceeds to weaken his hand further, by saying nothing more than, It's true because I say it is true: "I have been insisting for quite some time that the real crux of the textual problem is how we explain the overwhelming preponderance of the Majority text in the extant tradition." This is not a problem in a scientific sense. Reality wins over theory. The Majority Text exists, granted. This means that an explanation for it exists. But this explanation must be proved, not posited. Hodges had not proved anything, even though the final statement of his demonstration is that "[I]t is the essence of the scientific process to prefer hypotheses which explain the available facts to those which do not!" This statement, however, is not correct. "God did it" explains everything -- but it is not a scientific hypothesis; it resists proof and is not a model.The essence of the scientific process is to prefer hypotheses which are testable. The Hodges model is not actually a model; it is not testable.

Hodges admits as much, when he starts answering "objections." He states,

1. Since all manuscripts are not copied an even [read: equal] number of times, mathematical demonstrations like those above are invalid.
But this is to misunderstand the purpose of such demonstrations. Of course [this] is an "idealized" situation which does not represent what actually took place. Instead, it simply shows that all things being equal statistical probability favors the perpetuation in every generations of the original majority status of the authentic reading.

The only problems with this are that, first, Hodges has shown no such thing; second, that he cannot generalize from his ideal situation without telling how to generalize and why it is justified; and third, that even if true, the fact that the majority reading will generally be correct does not mean that it is always correct -- he hasn't reduced the need for criticism; he's just proved that the the text is basically sound. (Which no serious critic has disputed; TC textbooks always state, somewhere near the beginning, that much the largest part of the New Testament text is accepted by all.)

The special pleading continues in the next "objection:"

2. The majority text can be explained as the outcome of a "process...." Yet, to my knowledge, no one has offered a detailed explanation of exactly what the process was, when it began, or how -- once begun -- it achieved the result claimed for it.

This is a pure irrelevance. An explanation is not needed to accept a fact. It is a matter of record that science cannot explain all the phenomena of the universe. This does not mean that the phenomena do not exist.

The fact is, no one has ever explained how any text-type arose. Hodges has no more explained the Majority text than have his opponents -- and he has not offered an explanation for the Alexandrian text, either. A good explanation for the Byzantine text is available (and, indeed, is necessary even under the Hodges "majority readings tend to be preserved" proposal!): That the Byzantine text is the local text of Byzantium, and it is relatively coherent because it is a text widely accepted, and standardized, by a single political unit, with the observation that this standardization occurred late. (Even within the Byzantine text, variation is more common among early manuscripts -- compare A with N with E, for instance -- than the late!) This objection by Hodges is at once irrelevant and unscientific.

So what exactly has Hodges done, other than make enough assumptions to prove that black is white had that been his objective? He has presented a theory as to how the present situation (Byzantine manuscripts in the majority) might have arisen. But there is another noteworthy defect in this theory: It does not in any way interact with the data. Nowhere in this process do we plug in any actual numbers -- of Byzantine manuscripts, of original readings, of rates of error, of anything. The Hodges theory is not a model; it's merely a bunch of assertions. It's mathematics in the abstract, not reality.

For a theory to have any meaning, it must meet at least three qualifications:
1. It must explain the observed data
2. It must predict something not yet observed
3. This prediction must be testable. A valid theory must be capable of disproof. (Proof, in statistical cases such as this, is not possible.)

The Hodges "model" fails on all three counts. It doesn't explain anything, because it does not interact with the data. It does not predict anything, because it has no hard numbers. And since it offers no predictions, the predictions it makes are not testable.

Let me give another analogy to our historical problem, which I got from Daniel Dennett. Think of the survival of manuscripts as a tournament -- like a tennis tournament or a chess tournament. In the first round, you have a lot of tennis players, who play each other, and one wins and goes on to the next round, while the other is out. You repeat this process until only one is left. In our "manuscript tournament," we eliminate a certain number of manuscripts in each round.

But here's the trick. In tennis, or chess, or World Cup Football playoffs, you play the same sport (tennis or chess or football) in each round. Suppose, instead, that the rules change: In the first round, you play tennis. Then chess in the second round. Then football in each round after that.

Who will win? In a case like that, it's almost a coin flip. The best chess player is likely to be eliminated in the tennis round. The best tennis player could well go down in the chess round. And the best football players would likely be eliminated by the tennis or chess phases. The early years of Christianity were chaotic. Thus the "survival pressures" may have -- probably did -- change over the years.

Note: This does not mean the theory of Majority Text originality is wrong. The Majority Text, for all the above proves or disproves, could be original. The fact is just that the Hodges "proof" is a farce (even Maurice Robinson, a supporter of the Majority Text, has called it "smoke and mirrors"). On objective, analytical grounds, we should simply ignore the Hodges argument; it's completely irrelevant. It's truly unfortunate that Hodges offered this piece of voodoo mathematics -- speaking as a scientist, it's very difficult to accept theories supported by such crackpot reasoning. (It's on the order of accepting that the moon is a sphere because it's made of green cheese, and green cheese is usually sold in balls. The moon, in fact, is a sphere, or nearly -- but doesn't the green cheese argument make you cringe at the whole thought?) Hodges should have stayed away from things he does not understand.

L. Kalevi Loimaranta: "The Gospel of Matthew: Is a Shorter Text preferable to a Longer One? A Statistical Approach"

Published in Jacob Neusner, ed., Approaches to Ancient Judaism, Volume X

This is, at first glance, a fairly limited study, intended to examinethe canon of criticism, "Prefer the Shorter Reading," and secondarilyto examine how this affects our assessmentof text-types. In one sense, it is mathematically flawless; there are noevident errors, and the methods are reasonably sophisticated. Unfortunately, itsmathematical reach exceeds its grasp -- Loimaranta offers some very interestingdata, and uses this to reach conclusions which have nothing to do with said data.

Loimaranta starts by examining the history of the reading lectio breviorpotior. This preface to the article is not subject to mathematical argument,though it is a little over-general; Loimaranta largely ignores all the restrictionsthe best scholars put on the use of this canon.

The real examination of the matter begins in section 1, Statistics onAdditions and Omissions. Here, Loimaranta states, "The canon lectiobrevior potior is tantamount to the statement that additions are more commonthan omissions" (p. 172). This is the weak point in Loimaranta's wholeargument. It is an extreme overgeneralization. Without question, omissions aremore common in individual manuscripts than are additions. But many suchomissions would be subject to correction, as they make nonsense. The question isnot, are additions more common than omissions (they are not), but are additionsmore commonly preserved? This is the matter Loimaranta must address. It isperfectly reasonable to assume, for instance, that the process of manuscriptcompilation is one of alternately building up and wearing down: Periodically,a series of manuscripts would be compared, and the longer readings preserved,after which the individual manuscripts decayed (see the article onDestruction and Reconstruction). Simply showing that manuscriptstend to lose information is not meaningful whendealing with text-types. The result may generalize -- but this, withoutevidence, is no more than an assumption.

Loimaranta starts the discussion of the statistical method to be used witha curious statement: "The increasing number of MSS investigated also raisesthe number of variant readings, and the relation between the frequencies ofadditions and omisions is less dependent on the chosen baseline, the hypotheticaloriginal text" (p. 173). This statement is curious because there is no reason givenfor it. The first part, that more manuscripts yield more variants, is obviouslytrue. The rest is not at all obvious. In general, it is true that increasinga sample size will make it more representative of the population it is sampling.But it is not self-evidence that it applies here -- my personal feeling is that itis not. Certainly the point needs to be demonstrated. Loimaranta is not adding variants;he is adding manuscripts. And manuscripts may have particular "trends,"not representative of the whole body of tradition. Particularly since the datamay not be representative.

Loimaranta's source certainly gives us reason to wonder about itspropriety as a sample; on p. 173 we learn, "As the textfor our study we have chosen chapters 2-4, 13, and 27 in the Gospel of Matthew....For the Gospel of Matthew we have an extensive and easy-to-use apparatus in theedition of Legg. All variants in Legg's apparatus supported by at least oneGreek MS, including the lectionaries, were taken as variant readings." This isdisturbing on many counts. First, the sample is small. Second, the apparatus ofLegg is not regarded as particularly good. Third, Legg uses a rather biased selectionof witnesses -- the Byzantine text is under-represented. This means thatLoimaranta is not using a randomly selected or a representativeselection. The use of singular readings and lectionaries is also peculiar.It is generally conceded that most important variants were in existence by thefourth century, and it is a rare scholar who will adopt singular readings nomatter what their source. Thus any data from these samples will not reflectthe reality of textual history. The results for late manuscripts have meaningonly if scribal practices were the same throughout (they probably were not; many latemanuscripts were copied in scriptoria by trained monks, a situation which didnot apply when the early manuscripts were created), or if errors do notpropagate (and if errors do not propagate, then the study loses all point).

Loimaranta proceeds to classify readings as additions (AD), omissions (OM; thesetwo to be grouped as ADOM), substitutions (SB), and transpositions (TR). Loimarantaadmits that there can be "problems" in distinguishing these classes ofvariants. This may be more of a problem than Loimaranta admits. It is likely -- indeed,based on my own studies it appears certain -- that some manuscript variantsof the SB and TR varieties derive from omissions which were later restored; it is alsolikely that some ADOM variants derive from places where a corrector noted a substitutionor transposition, and a later scribe instead removed words marked for alteration. ThusLoimaranta's study solely of AD/OM variants seemingly omits many actual ADOM variantswhere a correction was attempted.

On page 174, Loimaranta gives us a tabulation of ADOM variants inthe studied chapters. Loimaranta also analyses these variants by comparing themagainst three edited texts: the Westcott/Hort text, the UBS text, and theHodges/Farstad text. (Loimaranta never gives a clear reason for using these"baseline" texts. The use of a "baseline" is almost certainto induce biases.)This tabulation of variants reveals, unsurprisingly, that the Hort text is most likelyto use the short text in these cases, and H&F edition is most likely to use thelong text. But what does this mean? Loimaranta concludes simply that WH isa short text and HF is long (p. 175). Surely this could be made much more certain,and with less effort, by simply counting words! I am much more interested in somethingLoimaranta does not think worthy of comment: Even in the "long" HF text,nearly 40% of ADOM variants point to a longer reading than that adopted by HF.And the oh-so-short Hort text adopts the longer reading about 45% of the time. Thedifference between the WH and HF represents only about 10% of the possible variants. Thereisn't much basis for decision here. Not that it really matters -- we aren't interestedin the nature of particular editions, but in the nature of text-types.

Loimaranta proceeds from there to something much more interesting: A table ofwords most commonly added or omitted. This is genuinely valuable information,and worth preserving. Roughly half of ADOM variants involve one of twelve single words --mostly articles, pronouns, and conjunctions. These are, of course, the most commonwords, but they are also short and frequently dispensable. This may be Loimaranta'smost useful actual finding: that variations involving these words constitute annotably higher fraction of ADOM variants than they constitute of the New Testamenttext (in excess of 50% of variants, only about 40% of words, and these words willalso be involved in other variants. It appears that variants involving these words arenearly twice as common as they "should" be). What's more, the listdoes not include some very common words, such as εν andεις. This isn't really surprising,but it is important: there is a strong tendency to make changes in such smallwords. And Loimaranta is probably right: When a scribe is trying to correctly reproducehis text, the tendency will be to omit them. (Though this will not beuniversal; a particular scribe might, for instance, always introduce a quote withοτι, and so tend to add such aword unconsciously. And, again, this only applies to syntactically neutral words.You cannot account, e.g., for the addition/omission of the final "Amen"in the Pauline Epistles this way!)

Loimaranta, happily, recognizes these problems:

In the MSS of Matthew there are to be found numerous omissions of smallwords, omissions for which it is needless to search for causes other than thescribe's negligence. The same words can equally well be added by a scribe to make thetext smoother. The two alternatives seem to be statistically indistinguishable.

(p. 176). Although this directly contradicts the statement (p. 172) thatwe can reach conclusions about preferring the shorter reading "statistically -- andonly statistically," it is still a useful result. Loimaranta has found a class ofvariants where the standard rule prefer the shorter reading is not relevant.But this largely affirms the statement of this rule by scholars such as Griesbach.

Loimaranta proceeds to analyse longer variants of the add/omit sort, examiningunits of three words or more. The crucial point here is an analysis of the typeof variant: Is it a possible haplography (homoioteleutonor homoioarcton)? Loimaranta collectively calls theseHOM variants. Loimaranta has 366 variants of threeor more words -- a smaller sample than we would like, but at least indicative.Loimaranta muddies the water by insisting on comparing these against the UBS textto see if the readings are adds or omits; this step should have been left out. Thekey point is, what fraction of the variants are HOM variants, potentially caused byhaplography? The answer is, quite a few: Of the 366, 44 involve repetitions of asingle letter, 79 involve repetitions of between two and five letters, and 77involve repetitions of six or more letters. On the other hand, this means that166 of the variants, or 45%, involve no repeated letters at all. 57% involverepetitions of no more than one letter. Only 21% involve six letter repetitions.

From this, Loimaranta makes an unbelievable leap (p. 177):

We have further made shorter statistical studies, not presented here,from other books of the New Testament and with other baselines, the result beingthe same throughout: Omissions are as common as or more common than additions.Our investigation thus confirms that:
The canon lectio brevior potior is definitely erroneous.

It's nice to know that Loimaranta has studied more data. That's the onlygood news. It would be most helpful if this other data were presented. Therest is very bad. Loimaranta still has not given us any tool for generalizing frommanuscripts to text-types. And Loimaranta has already conceded that the conclusionsof the study do not apply in more than half the cases studied (the addition/omissionof short words). The result on HOM variants cut off another half of the cases, sinceno one ever claimed that lectio brevior applied in cases of haplography.

To summarize what has happened so far: Loimaranta has given us some useful data:We now know that lectio breviorprobably should not apply in cases of single, dispensable words. It of course doesnot apply in cases of homoioteleuton. But we have not been given a whit of data toapply in cases of longer variants not involving repeated letters. And thisis where the canon lectio brevior is usually applied. Loimaranta has confirmedwhat we already believed -- and then gone on to make a blanket statement with absolutelyno support. Remember, the whole work so far has simply counted omissions -- it hasin no case analysed the nature of those omissions. Loimaranta's argument is circular.Hort is short, so Hort is bad. Hort is bad, so short readings are bad.

Let's try to explain this by means of example of how this applies. It is well-known that theAlexandrian text is short, and that, of all the Alexandrian witnesses, B isthe shortest. It is not uncommon to find that B has a short reading not foundin the other Alexandrian witnesses. If this omission is of a single unneededword, the tendency might be to say that this is the "Alexandrian"reading. Loimaranta has shown that this is probably wrong. But if theAlexandrian text as a whole has a short reading, and the Byzantine text(say) has a longer one, Loimaranta has done absolutely nothing to helpus with this reading. Lectio brevior has never been proved; it's apostulate adopted by certain scholars (it's almost impossible to prove acanon of criticism -- a fact most scholars don't deign to notice). Loimarantahas not given us any real reason to reject this postulate.

Loimaranta then proceeds to try to put this theory to the test, attempting toestimate the "true length" of the Gospel of Matthew (p. 177). This isa rather curious idea; to this point, Loimaranta has never given us an actualcalculation of what fraction of add/omit variants should in fact be settled infavour of the longer reading. Loimaranta gives the impression that estimatingthe length is like using a political poll to sample popular opinon.But this analogy does not hold. In the case of the poll, we know the exact listof choices (prefer the democrat, prefer the republican, undecided, etc.) and theexact population. For Matthew, we know none of these things.This quest may well be misguided -- but, fortunately,it gives us much more information about the data Loimaranta was using. On page 178,we discover that, of the 545 ADOM variants in the test chapters of Matthew,261 are singular readings! This is extraordinary -- 48% of the variants testedare singular. But it is a characteristic of singular readings that they aresingular. They have not been perpetuated. Does it follow that thesereadings belong in the study?

Loimaranta attempts to pass off this point by relegating it to an appendix,claiming the need for a "more profound statistical analysis" (p. 178). This"more profound analysis" proceeds by asking, "Are the relativefrequencies of different types of variants, ADs, OMs, SBs, and TRs, independentof the number of supporting MSS?" (p. 182). Here the typesetter appears tohave betrayed Loimaranta, using an ℵ instead of a χ. But it hardly matters.The questions requiring answers are, what is Loimaranta trying to prove?And is the proof successful?The answer to the first question is never made clear. It appears that the claim isthat, if the number of variants of each type is independent of the number ofwitnesses supporting each, (that is, loosely speaking, if the proportion, e.g., ofADOMs is the same among variants with only one supporter as among variants withmany, then singular readings must be just like any other reading. I see no reasonto accept this argument, and Loimaranta offers none. It's possible -- but possibilityis not proof. And Loimaranta seems to go to great lengths to make it difficult toverify the claim of independence. For example, on page 184, Loimaranta claims ofthe data set summarized in table A2, "The chi-square value of 4.43 is belowthe number of df, 8-2=6 and the table is homogeneous." Loimaranta does not evengive us percentages of variants to show said homogeneity, and presents the data ina way which, on its face, makes it impossible to apply a chi-squared test (thoughpresumably the actual mathematical test lumped AD and OM variants, allowing thecalculation to be performed). This sort of approach always makes me feel as ifthe author is hiding something. I assume that Loimaranta's numbers are formallyaccurate. I cannot bring myself to believe they actually mean anything. Evenif the variables are independent, how does it follow that singular readingsare representative? It's also worth noting that variables can be independent as awhole, and not independent in an individual case (that is, the variables couldbe independent for the whole data set ranging from one to many supporters, but notindependent for the difference between one and two supporters).

And, again, Loimaranta does not seem to have considered is the fact thatLegg's witnesses are not a representative sample. Byzantine witnessesare badly under-represented. This might prejudice the nature of the results.Loimaranta does not address this point in any way.

On page 178, Loimaranta starts for the first time to reveal what seems to bea bias. Loimaranta examines the WH, UBS, and HF texts and declares, e.g., ofUBS, "The Editorial Committee of UBS has corrected the omissions in thetext of W/H only in part." This is fundamentally silly. We are to determinethe length of the text, and then select variants to add up to that length?The textual commentary on the UBS edition shows clearly that the the shorterreading was not one of their primary criteria. They chose the variants theythought best. One may well disagree with their methods and their results --but at least they examined the actual variants.

Loimaranta proceeds to this conclusion (p. 179):

The Alexandrian MSSℵand B, and with them the texts of W/H and UBS, are characterized bya great number of omissions of all lengths. The great majority ofthese omissions are obviously caused by scribes' negligence. Theconsiderably longer Byzantine text also seems to be too short.

Once again, Loimaranta refuses to acknowledge the difference betweenscribal errors and readings of text-types. Nor do we have any reason to thinkthere is anything wrong with those short texts, except that they are short.Again and again, Loimaranta has just counted ADOMs.

And if the final sentence iscorrect, it would seem to imply that the only way to actually reconstructthe original text is by Conjectural Emendation.Is this really what Loimaranta wants?

This brings us back to another point: Chronology. The process by which all of thisoccurs. Loimaranta does not make any attempt to date the errors he examines.

But time and dates are very important in context. Logically, if omissionsare occurring all the time, the short readings Loimaranta so dislikes shouldconstantly be multiplying. Late Byzantine manuscripts should have more thanearly. Yet the shortest manuscripts are, in fact, the earliest, P75and B. Loimaranta's model must account for this fact -- and it doesn't. Itdoesn't even admit that the problem exists. If there is a mechanism formaintaining long texts -- and there must be, or every late manuscript wouldbe far worse than the early ones -- then Loimaranta must explain why itdidn't operate in the era before our earliest manuscripts. As it stands,Loimaranta acts as if there is no such thing as history -- all manuscriptswere created from nothing in their exact present state.

A good supplement to Loimaranta's study would be an examination of therate at which scribes create shorter readings. Take a series of manuscriptscopied from each other -- e.g., Dp and Dabs,205 and 205abs. Or just look at a close group such as the manuscriptswritten by George Hermonymos. For that matter, a good deal could be learnedby comparing P75 and B. (Interestingly, of these two, P75seems more likely to omit short words than B, and its text does not seem to belonger.) How common are omissions in these manuscripts?How many go uncorrected? This would give Loimaranta some actual data onuncorrected omissions.

Loimaranta's enthusiasm for the longer reading shows few bounds. Havingdecided to prefer the longer text against all comers, the author proceeds touse this as a club to beat other canons of criticism. On p. 180, we are toldthat omissions can produce harder readings and that "consequently therule lectio difficilior potior is, at least for ADOMs, false." Inthe next paragraph, we are told that harmonizing readings should be preferredto disharmonious readings!

From there, Loimaranta abandons the mathematical arguments and startsrebuilding textual criticism (in very brief form -- the whole discussion isonly about a page long). I will not discuss this portion of the work, as itis not mathematically based. I'm sure you can guess my personal conclusions.

Although Loimaranta seems to aim straight at the Alexandrian text, and Hort,it's worth noting that all text-types suffer at the hands of thislogic. The Byzantine text is sometimes short, as is the "Western,"and there are longer readings not really characteristic of any text-type.A canon "prefer the longer reading" does not mean any particulartext-type is correct. It just means that we need a new approach.

The fundamental problem with this study can be summed up in two words: TooBroad. Had Loimaranta been content to study places where the rule lectiobrevior did not apply, this could have been a truly valuable study. ButLoimaranta not only throws the baby out with the bathwater, but denies thatthe poor little tyke existed in the first place. Loimaranta claims that lectiobrevior must go. The correct statement is, lectio brevior at bestapplies only in certain cases, not involving haplography or common dispensable words.Beyond that, I would argue that there are at least certain cases where lectiobrevior still applies: Christological titles, for instance, or liturgicalinsertions such as the final Amen. Most if not all of these would doubtless fallunder other heads, allowing us to "retire" lectio brevior. Butthat does not make the canon wrong; it just means it is of limited application.Loimaranta's broader conclusions, for remaking the entire text, are simplytoo much -- and will probably be unsatisfactory to all comers, since theyargue for a text not found in any manuscript or text-type, and which probablycan only be reconstructed by pure guesswork. Loimaranta's mathematics, unlikemost of the other results offered by textual critics, seems to be largelycorrect. But mathematics, to be useful, must be not only correct but applicable.Loimaranta never demonstrates the applicability of the math.

G. P. Farthing: "Using Probability Theory as a Key to Unlock Textual History"

Published in D. G. K. Taylor, ed., Studies in the Early Text of the Gospelsand Acts (Texts and Studies, 1999).

This is an article with relatively limited scope: It concerns itself withattempts to find manuscript kinship. Nor does it bring any particularpresuppositions to the table. That's the good news.

Farthing starts out with an extensive discussion of the nature ofmanuscript stemma. Farthing examines and, in a limited way, classifiespossible stemma. This is perfectly reasonable, though it adds little toour knowledge and has a certain air of unreality about it -- not manymanuscripts have such close stemmatic connections.

Having done this, Farthing gets down to his point: That there are manypossible stemma to explain how two manuscripts are related, but that onemay be able to show that one is more probable than another. And he offersa method to do it.

With the basic proposition -- that one tree might be more probable thananother -- it is nearly impossible to argue. (See, for instance,the discussion on Cladistics.) It's the next step --determining the probabilities -- where Farthing stumbles.

On page 103 of the printing in Taylor, we find this astonishing statement:

If there are N elements and a probability p of each element beingchanged (and thus a probability of 1-p of each element not being changed) then:
N x p elements will be changed in copying the new manuscript and
N x (1 - p) elements will not be changed.

This is pure bunk, and shows that Farthing does not understand thesimplest elements of probability theory.

Even if we allow that the text can be broken up into independentcopyable elements (a thesis for which Farthing offers no evidence, andwhich strikes me as most improbable), we certainly cannot assume thatthe probability of variation is the same for every element. But evenif we could assume that, Farthing is still wrong. This is probabilitytheory. There are no fixed answers. You cannot say how many readingswill be correct and how many will be wrong, or how many changed and how manyunchanged. You can only assign a likelihood.(Ironically, only one page before this statement, Farthing more or lessexplains this.) It is true that the most likely value, in the case ofan ordinary distribution, will be given by N*p, and that this will be themedian. So what? This is like saying that, because a man spends one-fourthof his time at work, two-thirds at home, and one-twelfth elsewhere, thebest place to find him is somewhere on the road between home and work.Yes, that's his "median" location -- but he may never have beenthere in his life!

Let's take a simple example, with N=8 and p=.25 (there is, of course,no instance of a manuscript with such a high probability of error. But we wanta value which lets us see the results easily). Farthing's write-upseems to imply a binomial distribution. He saysthat the result in this case will be two changed readings (8 times .25 isequal to 2). Cranking themath:

of changes
Probability of
this many changes
Probability of at least
this many changes

Thus we see that, contra Farthing, not only is it not certainthat the number of changes is N*p, but the probability is less thanone-third that it will be N*p. And the larger the value of N, thelower the likelihood of exactly N*p readings (though the likelihoodactually increases that the value will be close to N*p).

It's really impossible to proceed in analysing Farthing. Make the mathematicsright, and maybe he's onto something. But what can you do when the mathematicsisn't sound? There is no way to assess the results. It's sad; probabilitycould be quite helpful in assessing stemma. But Farthing hasn'tyet demonstrated a method.

Cameron Boyd-Taylor, Peter C. Austin, and Andrey Feuerverger: "The Assessment of Manuscript Affiliation Within a ProbabalisticFramework: A Study of Alfred Rahlfs's Core Manuscript Groupings for theGreek Psalter"

Published in Robert J. V. Hiebert, Claude E. Cox, and Peter J. Gentry, editors,The Old Greek Psalter: Studies in Honour of Albert Pietersma.

Here again, an attempt to use probability theory to assess manuscript kinship. And, as with Farthing, flaws.

At least the mathematics is better. Not great, but better. The expression is incredibly bad -- there is an appendix, "The Likelihood Function," which attempts to explain what they're doing. This is utterly bollixed -- at first glance, I thought it was pure nonsense. Then I realized that the thing they wrote as


is in fact meant to be a Π -- they're using "pi notation" for repeated products, but they didn't know what it was! Similarly, they expressed things using power notation without using powers.

Bottom line is that they're playing with a tool they don't understand.

It shows. They take a sample of readings and try to calculate the likelihood of relatedness. It's basically just binomials. At first glance, this appears to be correctly done.

Unfortunately, that doesn't make the result meaningful. The authors deliberately chose readings where the number of readings is greater than two. But this means the sample is biased. That doesn't mean that they can't get data this way, but it means they need a larger sample (and, I would suggest, more witnesses).

And it appears the authors do not understand what their probabilities mean. The probability of a particular outcome based on probabilities of individual events is not the same as the probability that a certain pattern of results comes from certain probabilities of relationships. (I'm not sure I said that any better than they did, but the point is that they're using a forward reasoning tool for backward reasoning.) I would be much more interested in a result that shows degree of relationship than this calculation. That doesn't mean it's wrong. But I don't trust it.

And their method of determining significance is simply wrong. They use the famous p-value to determine if they've found something. But they don't understand what a p-value is. What's more, the authors treat p=.05 as if this were a magic number for significance. It is not; see above under p-hacking.

Thus, ultimately, this examination is flawed both in its idea (probabilities don't determine the past) and in its measure of statistical significance (showing that the null hypothesis is unlikely to explain something does not mean that you've found the explanation).