Matching Cr2 segments at 219m using Q-match on GEDmatch | Precision=7 | Precision=3 | |
Number of overlaps (all segments show a common small region at P=7; most at P=3) | 12 | 22 or 24 | |
Number of people in overlaps | 10 | 11 | |
Missing people from our sample of 12 (10 Baruch Lousada + Randy's parents) | JG,MD | MD | |
Coincident segment boundaries - 5 types arise in this exercise | |||
Type 1. 3-person in 2 segments (one common person in 2 matching segments) | 3 | 2 | |
Type 2. 4-person in 3 segments (one common person in 3 matching segments) | 0 | 1 | |
Type 3. 4-person in 2 segments (no person in common in 2 matching segments) | 0 | 1 | |
Type 4. 4-person in 3 segments (both people in a match match a different other person in 2 further matching segments) | 0 | 1 | |
Type 5. 5-person in 3 segments (a person matches 2 other people, none of the 3 being common with 2 people in a further matching segment) | 0 | 1 |
'Precision' is an adjustment to the Q variable used in GEDmatch's statistics for determining what is a match. Precision=7 is the default setting. Much occurs over short segments, and at P=7, much of this fine detail is rejected in calling probable matches. Statistical considerations in calling matches include what gaps in the SNP sequence are tolerated, and also where the segment boundaries are (ie do they include the last common SNP or go to the next). These considerations are matters for judgement by QEDmatch in formulating its statistical routines. GEDmatch gave no advice on the topic of coincident overlapping match boundaries - perhaps because the definition of a match boundary is somewhat arbitrary as indicated. For our purposes below all that matters is that GEDmatch employs its criteria consistently.
Below we can see more matches at P=3, and the doubled number of matches shows a dramatic enhancement of the number and type of coincident segment boundaries. Though in the context of coincident segment boundaries it might be argued that Types 1 & 2 are not hugely significant in that if IBS matches occur then a coincident segment boundary will arise under the same conditions as an IBD match - because where the common match has the shortest segment then it will generate coincident boundaries in the 2 (Type 1) or 3 (Type 2) segment matches. The Type 4 match shown in this case comprises a Type 3 match plus 2 Types 1. The Type 5 match shown in this case comprises a Type 3 match plus 1 Type 1.
But a moment's thought shows how unusual Type 3 is. For it shows a coincident boundary involving 4 separate people in 3 cases (father/Jeremy/JG/Jeannine at 220037666, JG/Jeannine/mother/Allan at 218423580 and Jeannine/Ed/Julian/Jeremy at 217856142). As the number of SNPs for our 3cM segment length is around 200, there are about 1000 3cM segments in the genome, and about 30 segments in each match, there will be less than a 1/7000 chance for a single coincidence of this sort.
Such is the recombinatorial avidity of the genome that we are left with some minor mysteries. Thus, while J and Ed segments appear to reach 217856142 in the Type 5 match, the J/Ed match lower bound only reaches 217856192. It would appear that while GEDmatch only declares a match under certain conditions, eg a small number of missing SNPs, here the J and Ed DNA probably have some mini-segments which mismatch, leading to an excess number of missing SNPs which don't show up in other matches. Thus, the J/Ed match does not quite reach 217856142. Of course, this 217856142 boundary also shows that the 218782054 boundary is not a true lower boundary set by Jeannine. Presumably the recombinatorial avidity of the genome has again created mini-segments which cause GEDmatch to declare or not declare matches depending on how many SNPs are missing, and why it is not possible to rigorously prove IBD by the matches alone - and why we are lucky to have the rare 4-person segment boundaries at 220037666, 218423580 and 217856142.
Diagrams distilling the above may be found here.
The match details are shown below, using the following colour key:
Precision=7 | Precision=3 | |