What is the best DNA test?

The old saying goes, “If you don’t know where you’re going, it doesn’t matter how you get there.” The same is true for DNA testing. A better question than “What is the best DNA test?” is “what are my goals for doing a DNA test?” There are a lot of testing companies out there, but I want to focus on the Big 4: Ancestry, FamilyTreeDNA, 23&me and MyHeritage.

The generic answer that is usually given is to test at Ancestry, then do a transfer (copy your DNA file) over to FamilyTreeDNA, MyHeritage and Gedmatch. If you were an adoptee searching for a birth family (or vice versa), this would be the best way to maximize your exposure -basically, you’re leaving breadcrumbs around for people to find you. I would say in this case, if you can swing it, also test at 23&me to fully cover all your bases. You can opt to do just the ancestry testing, which is cheaper than doing the ancestry and health testing.

Are you interested in ethnicity testing? I wrote a post here about how ethnicity estimates are calculated. I don’t know if there is one company that is better than another for a specific ethnicity, although here are some good posts to read for Native American DNA (which may not even show up in your DNA, even if it is in your family tree) and African DNA. You may also be interested to know which testing companies are available in which countries, because if you’re an American looking for your German ancestors it’s probably helpful to know that Ancestry doesn’t currently offer tests in Germany.

If you had questions on your paternal line (father’s father’s father etc), Y DNA testing can be helpful for those with a suitable candidate to test. For men, this can be themselves. For women, a brother, father, father’s brother, or father’s brother’s son will carry the Y DNA of their paternal line. At the moment, FamilyTreeDNA is the only one of the big companies that does this kind of testing and matching, although 23&me does give haplogroups. You can start with a Y-37 test, and later upgrade without having to submit another sample.

Is there ever a reason for doing an mtDNA test? The Y chromosome mutates more often than mtDNA does, which makes mtDNA better at going back thousands of years to find our ancient origins, but less good at figuring out matches in a genealogical timeframe. The only use I can think of is to compare haplogroups (if you are comparing two women and you want to see if they are descended from the same ancestral female) or if you’re interested in your haplogroup as a marker of a particular ethnicity -for example, having an N haplogroup means your mother’s mother’s mother (etc) was of Native origin. Full mtDNA tests are done at FamilyTree DNA and again, 23&me will give your haplogroup.

Finally, if you are interested in doing a DNA test for health reasons (although I’m sceptical of the value in that), currently 23&me is the only one of the big companies to offer that service.

If you’ve already done a test and are wondering who else to get tested, the same question applies. What is your goal? The generic answer is to a) make sure you have someone on both your maternal and paternal side to help you sort through your matches b) test the eldest generations first. This would mean your testing priority is grandparents, great-aunts/uncles, parents, aunts/uncles, then cousins (further than first if you know of any). I don’t think there’s any benefit to testing your children since they have less of your DNA than you do, and if you’re interested in their other parent’s side you can always test the other parent or their relatives in the same priority order.

Advertisements

Don’t touch the settings!

Here is a parable, and like all parables, the point is not if it really happened but the lesson we can learn from it. Once upon a time, someone asked Picasso why he could get away with making art the way he did, and other people could not. Picasso grabbed a piece of paper and a pencil and sketched the most beautiful and perfect picture of a horse. He told the person that once they could make things like that, they were free to break the rules and make art however they wanted.

This is also true in the genealogy world. There are a lot of rules about how to go about doing good genealogy research, especially in the genetic genealogy world. If you want to be successful when you bend or break these rules, you have to know why those rules exist in the first place. I’m going to be talking more specifically in this post about why the Gedmatch settings are the way they are, and why it’s probably not a good idea to change them.

The first setting that people often fool with is the minimum segment size. The default is 7 cM.

I’m hoping everyone knows that a centimorgan (cM) is a unit of measurement used to measure DNA. So why is 7 the default, and why is it a bad idea to lower it?

I talked about identical by descent and identical by chance in my post DNA in a nutshell. I also provided a link to a really good chart that shows the odds that a segment is there by chance or by descent. Once you get below 7 cM, it’s only 50/50 that you share a segment with someone because you’re related to them, and not because you randomly happen to share similar DNA. It’s like comparing Anna Karenina and Hop on Pop and deciding that since they both have the word “the” in them, they must somehow be related. As someone in one of the genetic genealogy Facebook groups said, “lower the settings too much and you’ll match a banana.”

Can this rule be broken? Yes, but the results should still be used with caution. Sometimes people are trying to force a connection that just isn’t there. Perhaps they are worried for reasons covered in my post Is this normal? I’ll repeat (as many times as needed), once you get past second cousins, it is possible not to share DNA with a cousin. It doesn’t mean anything is out of the ordinary. I explain why in my post We all have two family trees. If you really, really want to find a matching segment with a cousin because they are on your brick wall line, you are more than welcome to try, but the burden of proof still rests on you to prove that that segment actually came from that ancestor that you share, and not because you randomly happened to have the same DNA in that spot (the odds are more in the favour of randomness). You might be pursuing this segment in this spot as if it is a real match and assume that whoever matches you there must be from the line you share. You could invest a lot of time and energy pursuing this match, only to find out that you’ve been on a wild goose chase. But hey, I pursue matches that are likely to be outside of the genealogical time frame, so I guess we’re all allowed to be masochists in our own way.

The other settings (there are actually a few of them) are for SNP counts. I don’t know why anyone would want to change these since very few people actually know what SNPs are (most people pronounce it like ‘snips’). Here’s what I know about SNPs. DNA testing does not test all our DNA. That kind of testing (although available) is still pretty expensive and it’s unknown whether doing so would provide us with better results. In my post Are DNA tests accurate? I wrote a how 99.9% of the DNA people share is identical (we also share 60% of our DNA with a banana, so I wasn’t kidding about lowering the settings too much). What a DNA test does is look at specific points where the DNA is likely to be different. These points are called SNPs. Some parts of our DNA have more of these points than do other parts. Therefore, if you know which parts are SNP-rich, you could consider lowering the cM value while raising the SNP value to compensate.

Or maybe, just maybe, you could consider leaving the settings where they are.

Are DNA tests accurate?

Thanks to talk shows, a lot of junk science, pseudoscience and a general misunderstanding of how science works is proliferating. Wine is good for you wine will kill you coffee is bad for you coffee is good for you fat is bad except sugar is worse than fat and on and on and on…

What’s someone to make of all this? In general, I think it leads people to think that science is just another opinion and not based in truth. And I’m not saying that science doesn’t have its problems (highly recommend the book “Inferior” by Angela Saini), but the terrible way in which science is being communicated to the public is causing a lot of problems. I’m not here to talk about how vaccine preventable diseases are on the rise, however, I’m here to talk about the science of DNA tests, because this is me when someone tells me that they heard that they aren’t accurate.

On the other hand, this is me when people are asking why their DNA ethnicity estimates are wrong:

On the one hand, I want to defend the science behind them, but on the other, I want to explain the weakness of the science behind them. You see my problem?

So let’s take a look at how ethnicity estimates are done and why yours might be wrong, and hopefully by the end we’ll have a greater appreciation for the science involved.

The first thing we have to understand is that DNA is made up of only 4 letters. Imagine if our alphabet only contained four letters and someone had to write different books using only these letters. A lot of books will share words, but we don’t have a problem distinguishing between like “Hop on Pop” and “Anna Karenina.” In reality, the way in which these four letters combine makes proteins that make up everything with DNA -plants, animals, and people (yes, your food has DNA in it. Not just GMO food, all food). I think most people know that we are very closely genetically related to chimps, but did you know we share 60% of our DNA with a banana? Our DNA is 99.9% similar to everyone else in the world, so the science behind figuring out how that 0.1% is different from person to person is pretty mind-boggling, when you think about it. It’s like if a random word was changed on random pages of “Anna Karenina.”

So how do we distinguish the 0.1% of DNA that is different between people and make categories based on those distinctions? The first thing that has to be done in order for ethnicity estimates to define an ethnicity. Here’s where it gets tricky. How do you define an ethnicity? For example, what does it mean when we say someone is English, or German, or Swedish? When you look at the Earth from space, you don’t see borders – not to mention that the countries and their borders have been changing since their creation. My great-grandmother came from Galicia, which was part of the Austro-Hungarian Empire. Currently the town she was from is part of Poland. Whose to say her DNA is significantly different from another person’s in a town just over the border in the Ukraine? Further, people have always been on the move. I may know where my ancestors were 100 years ago, or even 400 years ago, but it’s possible that my DNA came from a time frame beyond even that.

Let’s say we are able to decide what defines a particular ethnicity. The next step is to create what’s called a reference population. Basically, a group of people is put together whose family is documented to have lived in the area for a really long time. The assumption is that the DNA that these people share is in some way similar and is also distinguishable from people whose families have lived in other areas for a very long time. Keep in mind that this group of people is different from testing company to testing company. This is why every testing company’s ethnicity estimate will vary, even with the same person’s DNA.

Alright, now we have a whole lot of DNA from all over the world to compare your DNA to. This is not the DNA of people who USED to live in the area your family is from, this is DNA of people who currently live in the area where you’re from. We’re making an assumption that as long as we check that the families of these people have lived there for long enough that it’s the same thing.

If you’re like me, you will fall into more than one ethnicity category. Another member of a Facebook group I’m a part of described it as a recipe. Basically, you’re a mixture of several different kinds of soup. The testing company is going to try to separate out your carrot soup from your noodle soup from your borscht and compare it to their recipe for each kind of soup (keeping in mind that their recipe is slightly different from another testing company’s recipe). It’s about as complicated as unbaking a cake. Did this carrot come from the carrot soup or the noodle soup? We’re not really sure where this part of the soup came from, but it looks kind of similar to a vichyssoise so we’ll call it that.

Your ethnicity estimate can be inaccurate, yes, because it is based on several assumptions

1) that a particular ethnicity can be defined

2) that a particular reference population can be created based on this ethnicity

3) that the reference population has DNA that is similar to other people from that ethnicity and different from people not of that ethnicity

4) that the DNA of people whose families have lived in that region for a long time is the same as the DNA of the people who used to live in that region

4) that your DNA can be decomposed and assigned to various ethnicities based on how similar it is to the reference population’s DNA

Those are a lot of assumptions. I’m hoping you can see how impressive it is that there’s an entire science based on this, and that this science is getting better and more accurate. But…it’s not soup yet.

What’s the takeaway from this? For one, you should adjust your expectations as to what ethnicity estimates are capable of telling you and to what degree of accuracy. For two, ignore small percentages unless you have a documented paper trail to back them up. And lastly, DNA is about a lot more than ethnicity estimates. If you did a DNA test just for that then you are missing out on all the fun the rest of us are having connecting with new cousins and trying to figure out which segment of DNA came from which ancestor.

We all have two family trees

I read somewhere that 50% of people do a DNA test for the ethnicity estimate. Which probably explains why 50% of the questions in the Facebook genetic genealogy groups I’m a part of start with “I did a DNA test and why does it say I’m X when I’m not X?” Or conversely, “why doesn’t it show any X ethnicity when I was told I had X ethnicity?” (with X almost always being Native American).

The questions that are not about ethnicity are usually along the lines of “is this normal?” Most people don’t know that up to 10% of third cousins and 90% of fifth cousins don’t share any DNA. This post is going to explore why we don’t see ethnicities we think we should see, and why we don’t share DNA with all our more distant cousins. I’ll talk more about how ethnicity estimates work in another post.

Did you know you actually have two family trees? Genealogists use the term “paper tree” to refer to ancestors you can trace back using traditional genealogical methods, such as birth, death, marriage and census records. The term “genetic tree” refers to ancestors you inherited DNA from. Obviously, there’s a lot of overlap there, but your genetic tree is only a small subset of your paper tree. Given that we can only have as much DNA as we can have, there’s just not enough room to have a piece of every single ancestor’s DNA. Further, even if you inherited a piece of DNA from a particular ancestor, it is possible that the segment is too small to be useful. If you read my post “DNA in a Nutshell,” you will understand that once a segment gets too small, it’s entirely possible that it matches another person who has that segment totally by chance, and not because you share a common ancestor. I imagine that when comparisons are made for ethnicity estimates, the same idea holds true. Tiny amounts of DNA that match a particular company’s reference population for a particular ethnicity are likely not counted.

Obviously, if your genetic tree contradicts your paper tree, there’s an issue that needs to be resolved. For example, if you don’t share DNA with a close cousin, or you don’t share DNA in the expected range, someone’s paper tree is going to have to be updated. But absence of evidence is not evidence. Not sharing DNA with a more distant cousin or not having a particular ethnicity show up in your estimate does not mean that your paper tree is incorrect. If it’s DNA proof you’re looking for, finding a relative who matches you as they should and also matches a particular cousin and/or has the particular ethnicity show up will do -a great-aunt or uncle works well here if there isn’t a grandparent, an aunt or uncle if there isn’t a parent. People who are only half-related to you (like a cousin) are not really good candidates, because they might have inherited those segments from an ancestor you don’t share, but it’s better than nothing if there’s no one else to test. If the DNA is on a straight maternal or paternal line, mt DNA or Y DNA testing will assign you a haplogroup. You can use this information to see if your haplogroup is consistent with haplogroup of a particular ethnicity.

To sum up, your genetic tree is only a small part of your paper tree, so while it’s disappointing to not match with certain cousins or to not have a particular ethnicity show up in your estimate, it doesn’t mean there’s something incorrect in your paper tree. Maybe Native American DNA is just not that into you.

DNA in a Nutshell, Part 2

I watched this amazing video from the Rootstech 2018 conference and it made me feel that my post DNA in a Nutshell was incomplete. In that introductory post I talked about how we inherit DNA and provided a link  that has some great pictures. Even if you don’t want to read the article, looking at the pictures can really give you a good idea of how we get the DNA that we get.

One thing people understand about DNA is that if we share it, we’re related. Many people might not realize that if you don’t share DNA with someone, it doesn’t mean you aren’t related. According to the ISOGG wiki, Up to 10% of 3rd cousins (people who share great-great-grandparents) and up to half of 4th cousins (people who share great-great-great-grandparents) do not share any common DNA. As the DNA gets mixed up and handed down, it’s possible you didn’t inherit any of the DNA a particular ancestor had, but a cousin did. It’s also possible that you both have DNA from a particular ancestor, but that it is not the same DNA in the same spot. The algorithms that the companies run don’t know that your segment on chromosome 2 and your third cousin’s segment on chromosome 6 came from the same great-great-grandparent. You and your third cousin may be related, but if you don’t carry the same segments, the computer will never know it. This is important to know when it comes to understanding the intermediate topic of triangulation.

The use of the term “cousin” by companies is a very liberal one. When I see the term “cousin” used for matches I like to think  of it as generations rather than as a reference to a specific person. Since it is an estimate based on the amount of DNA you share, and the amount of DNA people in a particular relationship can vary, a 1st cousin can refer to an actual first cousin, a great-aunt/uncle, or even a first cousin once or twice removed. The further you get past close family (parents, siblings, aunts/uncles, nieces/nephews) it becomes much harder to pin down the actual relationship you share with a given DNA match.

Besides the difficulties of a varying range of DNA assigned to particular relationships, as mentioned in the previous post, things like half-relationships (where you only have one common ancestor rather than a common ancestral couple) as well as endogamy (where you might be related to someone in more than one way) and pedigree collapse (where you have intermarriage in your family tree) can make relationships look closer when the actual common ancestor is actually much further away than suggested. And since every company does their calculations differently about what counts towards the total amount of DNA you share with someone, different companies may give different results and some may conclude that you aren’t actually related to someone while another company says you are.

All this to say is that you shouldn’t worry if you are new to all this and everything isn’t as precise and exact as you expected. If you don’t share DNA with a third cousin who has also tested their DNA, it doesn’t necessarily mean something is amiss. If your testing company says your great-aunt is your 1st cousin, it’s not because your great-aunt is actually your 1st cousin. If you find a DNA match and your testing company says this person is your second cousin, but you can’t see the overlap in both of your trees, you may have to dig further back.  Let go of the expectation of certainty and embrace the variety.

SOLVED: The Case of the Unknown Bigamist

In a previous post talked about E’s husband the unknown bigamist, and I am pleased to announce that this bigamist is now known. This presents a unique opportunity for me to share how to find unknown grandparents using DNA, especially when traditional genealogy methods (like locating a marriage certificate) have not been successful.

E’s grandchild D shared their DNA results with me, along with a possible last name. Since D has only done DNA with Ancestry and does not have an subscription to the family tree/records side, they cannot see the family trees of their matches. If this is the case for you, you should sign up for a Free Registered Guest Account. This should let you see the pedigrees for matches with trees, although you still will not have the option to see the full family tree.

Naturally, the first thing I did was look through the available family trees for the last name D gave me. I didn’t find any matches with the last name, but I did find a similar one so I took that last name to a newspaper search site (a subscription that came with my Ancestry subscription) and gave it a go. If the man was a bigamist, there had to be a news article about him, right?

THERE WAS.

Not only did the newspaper article list this man, C, as a bigamist, it also gave his alias, the surname that D had given me. This had to be the guy, right?

Here’s an important tip- follow the evidence. In all cases of unknown parentage, it’s important to gather as much evidence as possible and make sure you have a rock-solid case before making conclusions. It would be really easy for me at this point to say that this man is D’s grandfather but the newspaper article is just the beginning. It’s unfortunate that shows like CSI have exaggerated the conclusions one can make with the evidence, but at least following the evidence rather than what we want the conclusion to be is emphasized. So, what did the evidence say about D’s relationship to C?

The next thing to figure out is how D and their matches were related. How did C fit into picture? I went back to the family trees. Two of the matches listed C as their grandfather. Since they had different grandmothers, however, they would be half cousins to D. I put the amount of shared cMs of each match to D in this tool by DNApainter and lo and behold, it fit that they could be half cousins. Interestingly, Ancestry had put these matches in the category of second cousins. That means that you cannot rely on what any testing company tells you about how you are related to someone. The testing companies base their estimate regarding the relationship one shares with someone on the amount of shared cMs, not on any knowledge they have about how they are actually related. That means that your best bet when trying to figure out your relationship to a DNA match is to use the dnapainter tool.

I wanted more evidence. I found another match, this one also had C in their tree but as a great-grand father. That means that they would be a half first cousin once removed to D. Again, I put the total cMs in the tool, but this time it did not come back in the correct range. I looked at the tree again. C shares the same name as his father. The match’s great-grandfather was not C, but C’s father. That meant D and this match were second cousins. The amount of DNA D shares with this match is consistent with second cousins. Phew!

There is now a lot of evidence to support the fact that C is the Unknown Bigamist, but I still don’t feel like I can say that I am 100% certain that this is D’s grandfather as they would on CSI. Finding other half, second and third cousins would continue to add to the body of evidence, however, if any evidence were to come around that disproves the idea, I would have to take it into consideration. Also, if this were an episode of a TV show, the case would be closed and the theme music would start playing. The real investigation is only starting. Who was this man? How did he meet E? Why did he try to pull off more than one marriage? We have found one answer, and with it comes a lot more questions. Hopefully by contacting his new-found half-cousins D can sort some of this out.

Is this normal?

One of the interesting things that people have discovered as a result of DNA testing is family secrets that would have otherwise been taken to someone’s grave are being brought to light. Some people have discovered that who they thought was their biological parent was not, others have discovered half-siblings they never knew about. Occasionally there are discoveries further up the line where grandparents or great-grandparents are not lining up with DNA as they should. In the genealogy world, we call these NPE. Although originally the acronym was for non-paternal event, sometimes people use the term a little more generally as Not Parent Expected. I don’t have any statistics to back me up, but I think the occurrence of these events is pretty rare. Naturally, though, these things have got people a little paranoid. I was very relieved to discover a maternal uncle and a paternal aunt as my top matches when I got my DNA results from Ancestry. Some people get DNA results and then they wonder, is this normal? Obviously no one wants to open up a big can of worms in their family where it isn’t warranted, so here I’d like to discuss what is normal and what is not normal. I still highly recommend taking your questions to a genetic genealogy group on Facebook before opening the can of worms, though. But at least you’ll save yourself the trip by ruling out these completely normal things.

When we talk about DNA results we are talking about two things: ethnicity estimates and DNA matches. I have a lot of disdain for ethnicity estimates so I’m going to tell you right off the bat that if your “is this normal” question pertains to your ethnicity estimate, my answer will be yes. Someone once said that these estimates are good dinner party talk but they’re really not good for much else, and I agree. The first thing to know about ethnicity estimates is that they are just estimates. If you look at the earth from space you are not going to see the artificial lines we have created to mark boundaries between countries. Further, people are always moving around. I am told my Scottish ancestors mingled with Vikings, so when I get “Scandinavia” as an estimate despite having no Scandinavian ancestry, I understand this to refer to that. Small percentages are especially suspect. The second thing to know is that each company will have different estimates because of the different reference populations they use when calculating said estimates. Finally, as each company refines their algorithms and adds more data to their calculations, these are likely to change. So if you’re wondering “why do I have Z ethnicity when my family tree doesn’t show any Z ethnicity?” It’s definitely not because you have an NPE in your tree or even because you had an ancestor with Z ethnicity that you haven’t discovered yet.

The second part of DNA results is your matches. If you have a known match, i.e., you know how you are related to this person, and are wondering if the amount of DNA you share with someone is too low or too high, the first thing to do is name your relationship. You can see my post What is a Cousin for more information or just use this handy cousin calculator. Once you know your relationship to someone, you need to use this DNApainter tool to see what the range is. Even between parent and child there is a small range of acceptable shared cMs. If the amount of DNA you share with this person falls within that range, then there’s no problem, everything is normal. Here’s an assignment: go see what the acceptable range for a third cousin (3C) is. Do you see that it is completely normal for third cousins NOT to share any DNA at all? In fact, you will see that once you get past second cousins (2C), it is possible to be related to someone and not share any DNA. No need to get into a tizzy because you don’t share DNA with a second cousin once removed (2C1R). It doesn’t mean your parents/grandparents/great-grandparents aren’t your parents/grandparents/great-grandparents. It just means you and this particular cousin didn’t inherit any of the same DNA that would tell the company that you are cousins.

If you have a match that you don’t know, and the amount of shared cMs is really high, you might also be concerned that there is something out of the ordinary. But keep in mind that just as it is possible to share no DNA with a second cousin once removed, it is possible to share up to 316cM. You may think, I know all my close cousins, is this cousin the result of a NPE of a relative of mine? You may know all your close cousins, but do you know all your second cousins and their descendants? Don’t assume NPE until you can verify that this person doesn’t fit into your family tree as they should. When you hear hooves, think horses not zebras.

What SHOULD you worry about? Not sharing any DNA with someone who is a second cousin or closer. Sharing less DNA than the range tells you you should share. Sharing more DNA than the range tells you you should share. Sharing over 500cMs with a complete stranger when you are certain of everyone in your tree up to the second cousin level. If you verified your relationship through the cousin calculator (and let me tell you, watching a group of people try to figure out how someone is related to someone else tells me most people don’t understand cousins) and then checked the DNApainter tool and something seems amiss, then by all means bring it to the attention of a group that deals with these kinds of things on Facebook. Otherwise you’re going to end up with this post in response to your query.