Thanks to talk shows, a lot of junk science, pseudoscience and a general misunderstanding of how science works is proliferating. Wine is good for you wine will kill you coffee is bad for you coffee is good for you fat is bad except sugar is worse than fat and on and on and on…
What’s someone to make of all this? In general, I think it leads people to think that science is just another opinion and not based in truth. And I’m not saying that science doesn’t have its problems (highly recommend the book “Inferior” by Angela Saini), but the terrible way in which science is being communicated to the public is causing a lot of problems. I’m not here to talk about how vaccine preventable diseases are on the rise, however, I’m here to talk about the science of DNA tests, because this is me when someone tells me that they heard that they aren’t accurate.
On the other hand, this is me when people are asking why their DNA ethnicity estimates are wrong:
On the one hand, I want to defend the science behind them, but on the other, I want to explain the weakness of the science behind them. You see my problem?
So let’s take a look at how ethnicity estimates are done and why yours might be wrong, and hopefully by the end we’ll have a greater appreciation for the science involved.
The first thing we have to understand is that DNA is made up of only 4 letters. Imagine if our alphabet only contained four letters and someone had to write different books using only these letters. A lot of books will share words, but we don’t have a problem distinguishing between like “Hop on Pop” and “Anna Karenina.” In reality, the way in which these four letters combine makes proteins that make up everything with DNA -plants, animals, and people (yes, your food has DNA in it. Not just GMO food, all food). I think most people know that we are very closely genetically related to chimps, but did you know we share 60% of our DNA with a banana? Our DNA is 99.9% similar to everyone else in the world, so the science behind figuring out how that 0.1% is different from person to person is pretty mind-boggling, when you think about it. It’s like if a random word was changed on random pages of “Anna Karenina.”
So how do we distinguish the 0.1% of DNA that is different between people and make categories based on those distinctions? The first thing that has to be done in order for ethnicity estimates to define an ethnicity. Here’s where it gets tricky. How do you define an ethnicity? For example, what does it mean when we say someone is English, or German, or Swedish? When you look at the Earth from space, you don’t see borders – not to mention that the countries and their borders have been changing since their creation. My great-grandmother came from Galicia, which was part of the Austro-Hungarian Empire. Currently the town she was from is part of Poland. Whose to say her DNA is significantly different from another person’s in a town just over the border in the Ukraine? Further, people have always been on the move. I may know where my ancestors were 100 years ago, or even 400 years ago, but it’s possible that my DNA came from a time frame beyond even that.
Let’s say we are able to decide what defines a particular ethnicity. The next step is to create what’s called a reference population. Basically, a group of people is put together whose family is documented to have lived in the area for a really long time. The assumption is that the DNA that these people share is in some way similar and is also distinguishable from people whose families have lived in other areas for a very long time. Keep in mind that this group of people is different from testing company to testing company. This is why every testing company’s ethnicity estimate will vary, even with the same person’s DNA.
Alright, now we have a whole lot of DNA from all over the world to compare your DNA to. This is not the DNA of people who USED to live in the area your family is from, this is DNA of people who currently live in the area where you’re from. We’re making an assumption that as long as we check that the families of these people have lived there for long enough that it’s the same thing.
If you’re like me, you will fall into more than one ethnicity category. Another member of a Facebook group I’m a part of described it as a recipe. Basically, you’re a mixture of several different kinds of soup. The testing company is going to try to separate out your carrot soup from your noodle soup from your borscht and compare it to their recipe for each kind of soup (keeping in mind that their recipe is slightly different from another testing company’s recipe). It’s about as complicated as unbaking a cake. Did this carrot come from the carrot soup or the noodle soup? We’re not really sure where this part of the soup came from, but it looks kind of similar to a vichyssoise so we’ll call it that.
Your ethnicity estimate can be inaccurate, yes, because it is based on several assumptions
1) that a particular ethnicity can be defined
2) that a particular reference population can be created based on this ethnicity
3) that the reference population has DNA that is similar to other people from that ethnicity and different from people not of that ethnicity
4) that the DNA of people whose families have lived in that region for a long time is the same as the DNA of the people who used to live in that region
4) that your DNA can be decomposed and assigned to various ethnicities based on how similar it is to the reference population’s DNA
Those are a lot of assumptions. I’m hoping you can see how impressive it is that there’s an entire science based on this, and that this science is getting better and more accurate. But…it’s not soup yet.
What’s the takeaway from this? For one, you should adjust your expectations as to what ethnicity estimates are capable of telling you and to what degree of accuracy. For two, ignore small percentages unless you have a documented paper trail to back them up. And lastly, DNA is about a lot more than ethnicity estimates. If you did a DNA test just for that then you are missing out on all the fun the rest of us are having connecting with new cousins and trying to figure out which segment of DNA came from which ancestor.