The world is so unpredictable. Things happen suddenly, unexpectedly. We want to feel we are in control of our own existence. In some ways we are, in some ways we’re not. We are ruled by the forces of chance and coincidence.
attr. Paul Auster (Criminal Minds, s.11 ep.2)
It is vital, fundamental even, to pose the research question correctly. Even if tests of statistical significance are not ultimately going to be used, there is an inherent philosophical value in posing the question as a Popperian hypothesis. The following exercise in the probability of the assertion at the centre of Sutton’s claim that the idea of natural selection was conveyed via a network of naturalists is the perfect case in point.

*Hey, come back! The stats are no more difficult than adding fractions*.


When Sutton cannot disprove his tree network, he has to conclude it existed, but all he has achieved is to jump between stepping stones set within the complicated mire of historical detail. Only by making a huge presumption, that the reason for instances of a phrase to reappear must be connected in their source, can Sutton move from one to the next. But, and here is the crux: in doing so, he has presumed causality. Like, presuming a dead parrot is really just, “Well, he’s…he’s, ah…probably pining for the fjords”.

Establishing causality is the whole purpose of science. How did this thing happen? So, Sutton patently is not using science as he claims, because he is uninterested in how those phrases ended up in the publications where they appear. In the absence of suitable analysis to provide him with proof, all he is doing is making paper chains out of coincidences.

How Many Coincidences Sum to the Point Where They Render the Defence of Multiple Coincidence Totally Implausible?

The answer to this question is none. Coincidences, chance phenomena, by definition, are  of no interest, and insignificant as quantified by scientific measures. Science rejects coincidental events, just as it does supernatural causation, because there is no fathomable causal agent, so nothing to be investigated. “How did it happen?” “Weeeell, there was this loud, …”


“… and, what wasn’t there before, now was”. That is the whole purpose of statistics, to test whether an event occurred simply through chance, or as a result of a treatment effect, where “treatment” is a measurable process that happened to an entity, in contrast to that entity existing “untreated”.

Apologies for this obvious bit; just making sure everybody’s with us…


The likelihood that something happened through a chain of causation is expressed in terms of probability. Probability values run from zero to one, equivalently, 0% to 100%. You’ll remember that the side a coin lands is a chance event, with equal probability of 1/2 = 0.5 = 50% for either side. Although science is always revising what counts as the threshold for being confident in having identified the correct causal agent, historically it has been 95%, up to 99% and sometimes more: i.e., that’s how much confidence science needs to have to accept information as knowledge. This is why chance events cannot be predicted scientifically. Which side up will this coin land? I do not know, I have no expectation, it is impossible to say, I can only guess knowing that I have, roughly, a 1 in 2 chance of getting it correct[1]. What will the weather be like tomorrow? Ah, well I’ve got masses of information about today, and what the day after days like today have been in the past, so I can make a more informed estimate of an outcome. There! That’s science: assimilate data from careful and accurate observation, construct a model (even saying there’s a 50:50 chance of a certain outcome is still a model: a binomial model). What this means for Sutton’s question about “How many coincidences…” can be looked at in two ways, probabilistically and metaphysically.

    Now, old lady — you have one last chance. Confess the heinous sin of heresy, reject the works of the ungodly — *two* last chances. And you shall be free — *three* last chances. You have three last chances, the nature of which I have divulged in my previous utterance.
    Cardinal Ximinez
    Even at our most tolerant level of acceptance, the room for an outcome occurring by chance alone is 1.00 – 0.95 = 0.05 = 5%. So, the literal answer to Sutton’s unscientific question, “How Many Coincidences Sum to the Point Where They Render the Defence of Multiple Coincidence Totally Implausible?” is 95 / 5 = 19. But this assumes that the entity whose existence is in question, is the product of all of those events, no fewer, no more, and all 19 are coexisting simultaneously, having come about entirely independent of each other.
    In terms of Sutton’s “information contamination pathways” transferring Matthew’s ideas to Darwin, this means for that estimated likelihood, each pathway cannot overlap, by their sharing individual naturalists. If they do, as he has in the schema he proposes, then for each overlap, the likelihood of the overlapping pathways will in some way be pooled, increasing their likelihood overall.
    “WHAT!”, I hear you cry, “You’ve just argued that there’s more chance the pathways existed”. Well, no. It’s a counterintuitive logic, but remember, we’re calculating the probability in terms of the pathways arising by chance alone. We can derive the probability of one, independent pathway, occurring by accident directly from the probability of it not occurring by accident, i.e., by subtracting the latter value from one, as above, 1.00 – 0.95 = 0.05. But, when pathways become interdependent, we suddenly don’t know what number to subtract the new, pooled value from, because the probability will be redistributed unequally across all the interconnected pathways. The deciding factor then is the influence of each node where the pathways join.
    Think of this like a network of waterways, if the resistance to flow is lower, i.e., because one of the directions leads downhill from that point, then the water will flow that way. Similarly, if one naturalist, shared across pathways, is more likely to communicate information about Matthew’s evolution mechanism with one other naturalist, more than any other that they are in contact with, then the information will be passed towards its destination, in that direction, because it has least resistance to the flow.
    Apologies, again, for this basic, but necessary revision in calculating chance. If the explanation above wasn’t clear enough, then this section should help. Mathematically, coincidence is dealt with as the multiplicativity of events. Two sixes rolled on a dice one after the other is 1/6 x 1/6 = 1/36. Rolling any number in succession introduces alternatives, which will improve the chances of any one event occurring. Thus, the additive possibility of getting a sequence of two rolls of the same number, two 1’s, OR two 2’s, OR, two 3’s, etc., each with the same, 1 in 36 chance of occurring, is the sum of their individual probabilities, 1/36 + 1/36 + 1/36…, etc. giving 6/36, also 1/6.
The point being made here, is that without alternatives, the likelihood of a series of events happening, gets very unlikely, very quickly. For example, the final roll to make six 6’s will happen only once every 46,656 turns. When alternative outcomes exist, the chances that one of them will occur improves. Obviously, for that to be the case, each one of those versions, must be shown to be possible. Scientific uncertainty isn’t the shy mumblings of a wiry-haired boffin, but this structuring the estimates of chance, and setting stringent levels of acceptance for being confident about having identified the cause of a phenomenon.
Sutton’s proposition is mundane at best. He artificially improves his chances by proposing that any one of a number of potentialities will prove him correct, but does not then demonstrate that those outcomes have any validity, beyond his coincidental occurrences of words and phrases.

By mooting, “How Many Coincidences Sum to the Point Where They Render the Defence of Multiple Coincidence Totally Implausible?“, Sutton is suggesting that there’s a certain threshold in coincidence as the potential of an event occurring, that translates directly into reality. He probably has in mind an accumulation of events, such as, 1/36 + 1/36 + 1/36…, etc., but this is wrong on two counts: philosophically and probabilistically. First, it’s not possible, according to the consistent effects of our Universe’s physical laws, for random events to suddenly transform into caused outcomes, as a density-dependent response. Instead each of Sutton’s ‘tree network’ connections must be established independently, always starting with the hypothetical proposition that it does not exist. If this is not the null hypothesis starting point, and you start by assuming a connection does exist, how are you going to know you did not disprove it, by finding no evidence because there isn’t any, or finding no evidence because you didn’t look in the correct places? (remember the green and blue balls? Same thing; some natural and artificial patterns are too alike to distinguish which is which).

Presuming something will occur just because it could is a logical fallacy called appeal to possibility, also known as, appeal to probability. This brings us to the second difficulty in Sutton’s cumulative approach, the probabilities. The numbers are stacked against you, so it’s going to take some serious pleading to persuade anyone it could happen, because travelling along the branches of the tree defines a sequence of events: event 1 AND event 2 AND event 3 AND so on. In probabilistic terms, the chance of tossing a coin and getting tails is 1 of 2 outcomes (tail / tail or head = 1/2). Toss the coin again, and you now have a sequence of 2-toss outcomes, where every permutation is a potential outcome. To get tails both times, there is only 1 way in 4 that can happen (tail / tail or head AND tail / tail or head = 1/2 x 1/2 = 1/4). Again, looking for three tails in a row, we now have eight potential outcomes, but only one of them is the three tails option.
Ⓗ or Ⓣ= 1/2
ⒽⒽ or ⒽⓉ or ⓉⒽ or ⓉⓉ = 1/4
ⒽⒽⒽ or ⒽⒽⓉ or ⒽⓉⓉ or ⓉⓉⓉ or ⓉⒽⒽ or ⓉⓉⒽ or ⒽⓉⒽ or ⓉⒽⓉ = 1/8
The sum looks like this,
1/2 x 1/2 x 1/2 = 1/8
However, predicting just two tails out of three throws occurs in four out of the eight outcomes,
1/2 x 1/2 + 1/2 x 1/2 = 1/4
Now, Sutton asserts that it would only take one of those network pathways to transport Matthew’s ideas to Darwin, for contamination to occur. This means that the entire network must sum to one, or less. Because there are seven points of potential contact with Darwin, each path can be considered a 1/7th of the overall potential (ignoring the issue about overlap mentioned before), but that gives unequal weighting to paths of differing naturalist counts. Alternatively, each path section (showing the transfer of information), between two nodes (each node occupied by a naturalist) can be considered to have equal weighting, which would be 1/26
Screen Shot 2017-09-15 at 05.37.23 copy
Multiplying along each branch, and adding across branches gives, a mean probability of information travelling the length of any path (the weighted average of 6 direct communications, and 16 indirect communications via at least one other agent) of 1.16% (about 1 in 86). Put another way, if everyone in the network were actively trying to get a message to the next person along, in the direction of Darwin, then for every 100 attempts, only 1 would be successful. However, given there are only 26 people in total, that number of attempts (each person repeatedly attempting on four occasions!) rather places it beyond reason.
The alternative way to estimate the total probability of Sutton’s schema working is to sum across the individual branches, as described above. This gives 0.254, that is, a 25% chance of the information getting through, even with all the people in place, and there being a constant information availability. It sounds counterintuitive, expressing it like this, but the reality is that information is not always available, on demand, to coincide with the imagined meetings between these people. The calculation is therefore very generous in being an overestimate, but it has to be carried out like this because, Sutton is unable to establish any individual connection with more likelihood than any other. Even so, traversal of the network by the Matthew information, is still nowhere near being a realistic outcome, unless you have another three parallel universes, all pushing information towards either the same Darwin, or four Darwins with a shared consciousness.
Suffice to say, it’s not looking very likely that Sutton’s idea has a chance in Hell of being true. For a start, to be confident it happened we want that percentage score up in the high nineties. To be even remotely interested in pursuing further study, it should top 70%. And remember, this is the likelihood of information passing the length of a branch, only if ALL BRANCHES ARE OPERATING simultaneously. That last point might be a little difficult to imagine, but try thinking of it as each branch having the potential for information transfer, because all of the nodes are peopled, with willing occupants to pass on that information. Therefore, if any single node is found to not meet Sutton’s assumptions, then, obviously, the probabilities start to decrease dramatically, and his chances of being right with them.
The problem Sutton has with the statistics probably proving him wrong here, is that all he did was chuck as many random, independent elements into the cooking pot as possible, gave them a stir, and hoped to come up with some gourmet dish. By doing what he did, the best outcome is a brainless mush.

[1] A low number of coin tosses the probability favours the upper face, until the number of replicates is large enough for the histogram of outcomes (probability density function), to converge on a normal distribution. So, the fairer question is to ask, how many heads and tails will there be in thousands of replicated sets of coin tosses? The answer would be effectively half and half, to several decimal places.
[2] For those interested, the algebraic expression for two outcomes, p and q, is p = (1 − q). Repeating an event with this probability accumulates data which when plotted as a histogram (column chart), eventually recreates the familiar bell-shaped curve more familiar with the normal distribution (a.k.a. the Gaussian curve, named after all-round genius Carl Friedrich Gauss), where the left and right tails are evenly distributed about the mean (arithmetic average) at the peak of the curve. normalhistogram
What this Central Limit Theorem is saying is, in large enough groups of objects, the most likely value for the property being measured will tend to be near the mean. In those larger samples, the mean value will also be about the same as the most frequent value (i.e., the mode). That is not always the case. Before enough measurements have been recorded, the sample will be too small from which to make accurate estimates of that mean likelihood; the bell literally appears too squashed flat to be able to accurately identify the middle point, and the most frequent might be a tie between two or more values. If the sample is too small, the curve will also be more likely to be off-centre, i.e., one tail will be longer than the other, again introducing inaccuracies into the estimation. These probability distributions are models (abstracted representations of the underlying data), which can be used to then predict the chance of a certain event occurring in the future, under the same conditions.


Because they are models generated from the accumulation of individual events the likelihood of which were originally expressed in terms of p and q, then the model can also be expressed in those same terms,