The Science Behind Testing Your Biological Age

Living longer, disease-free, and with a higher quality of life. These are the ultimate, yet daunting, goals for health enthusiasts across the world; many of whom are now harnessing the power of knowing, understanding, and manipulating their biological age to reach and measure their wellness success.

Thanks to revolutionary advancements in scientific research on how our bodies use or silence the instructions of life housed in our DNA, as well as how those instructions are impacted by external factors, industry pioneers like TruDiagnostic are now offering people a never-seen-before look into their body’s aging process.

To put it simply, biological aging is the process of cells gradually losing their function. As we get older, a loss of grip strength, balance, and memory, for example, are noticeable expressions of that cellular malfunction. On the other hand, chronological aging is just turning another year older on your birthday. For better or worse, the two often aren’t the same for most people.

This calendar versus cellular difference, as lead researchers have painstakingly discovered over the last 100+ years, has been definitively linked to one’s overall health and risk of disease. Which is why those who are looking to optimize and maintain their health are recalibrating where their wellness journey begins. Not at the gym or in the kitchen, but at TruDiagnostic’s biological age laboratory.

Not all biological age tests are created equal.

Biological age testing has become much more prevalent over the past few years. This is due to the longevity movement which has helped educate physicians and patients that biological aging is the number 1 risk factor for almost every chronic disease and is greatly related to quality of life as we age.  

However, as many testing companies rush to fill this void, it is important to know how to evaluate these tests to know which is best to implement and what it can tell us. Below, we have explained the concepts which are most relevant for choosing an epigenetic age test. We have also explained how we met these criteria.

  1. Are the algorithms the company is using published?

We only use published algorithms which are important as they have been validated to be successful measurements of the things they are claiming to measure. Without published, peer reviewed data, we often say it is like going to a fortune teller. You can choose to believe the results, but it is better to have evidence that the testing is measuring what it says. 

In addition, this published data helps you learn more.  As more research is done on these algorithms, we are able to find connections to health.  With interventional trials, we are also able to find out which therapies, treatments, and lifestyle modifications are most helpful to change these algorithms and improve our health. 

At TruDiagnostic, our algorithms are published.  In fact, we also have publications which compare our algorithms to others which have also been described in the literature. This way, you can see how ours match up. 

Additionally, we try to break this down to let you know how to improve these scores with published data. You can see a link to our interventional trials sheet here. This shows all of the trials which have been shown to improve these markers. 

So, how do I decide which one is the best?

Generally, with biological age testing, there are a few metrics you should look for when evaluating an algorithm. The first in the ICC value. This is how precision of the testing is measured. ICC stands for intraclass correlation. This is a statistical number which describes how a number within a group compares to each other. 

For instance, if we take blood from an individual and split it into five samples which are each measured separately, we would expect that all of the results are the exact same.  Unfortunately, with lab testing this isn’t always the case.  Sometimes these results can vary despite coming from the exact same sample.  

For precise tests, this variation is usually small. This would usually represent an ICC value of .9 or greater.  An ICC value of 1 would be perfect agreement. 

Thus, you should certainly look at the algorithm ICC value! Especially because this has been an issue with previous clocks as we will explain later.

The other values you should consider looking at when evaluating this testing are the hazard ratios to disease of a particular algorithm.  The hazard ratio is an estimate of the ratio of the hazard rate in the treated versus the control group. This is how we tell how advanced aging of these algorithms are connected to disease. 

For instance, if someone is 5 years older biologically, how do we know their risk of death?  How would we know their increased or decreased risk of cardiovascular disease or dementia? We do this calculation with a hazard ratio.  Here, 1 is usually the control, and the larger the number is above 1, the more likely the event is going to occur.  

You can see the table below as an example.  For instance, for every standard deviation in DunedinPACE you would see a 64% increase (hazard ratio of 1.64) in the risk of death.  1 standard deviation of the 2013 horvath algorithms would only represent a 2% increase in death.  As you can see here, DunedinPACE outperforms every algorithm. The exception is GrimAge but that is only because this validation dataset was the same data used to train GrimAge and thus we would expect it to be artificially elevated. 

How does picking the right test help me improve my quality of life?

Living longer and living disease free is a major focus of biological age tests. However, we all also realize the quality of life is also important.  As a result, ask yourself how the algorithm you are using is connected to quality of life. You can see below some images which compare published algorithms. Once again, DunedinPACE continues to show large associations to multiple measures of quality of life!

2. How precise is the algorithm? Are they using Principal Component Analysis?

As we mentioned earlier, the ICC value is a measure of precision and one you should certainly look for.  The reason being that some of the early clocks trained to chronological age have big problems with this as you can see in the graph below. 

Some of the original clocks had over 3.9 years of error.  Meaning, if you test within this time, how are you sure if it is real biological change or a remnant of testing error? 

Some algorithms like the DunedinPACE have ICC values of >.97 without any type of correction. This is great!  However, many of the other clocks can still be improved by the use of Principal Component Correction Algorithms. 

These correction methods were created by Morgan Levine’s lab while she was at Yale.  They increased the precision of the older algorithms significantly. We were the first to add Principal component correction after some joint publications with Yale and Corniell here

This is an incredibly important measurement which can answer questions like “how frequently should I test?”. If a test has low precision, you can’t test frequently because the error of the testing might be larger than the change in your biological age. 

3. How was the algorithm trained? First generation algorithms (trained to predict chronological age) are not as helpful as those trained to predict biological markers (2nd and 3rd generation algorithms). 

To understand why the first generation clocks are not the optimal types of clocks, we have to explore the idea of “phenotypic variation”. Why do people who are chronologically 50 years old look like they’re 30 years old chronologically and vice versa?

This difference isn’t captured in their chronological age, it is captured in the biochemistry of their bodies.

The first clocks created by Dr. Horvath (Horvath, 2013) and Dr. Hannum (Hannum et al., 2013) in 2013 were a huge breakthrough in age research and science. At the time, there were many reasons this was exciting. Mainly, the predictive capability of the clocks were amazing. We all know that age is the biggest risk factor for almost every chronic disease and death.

It was immediately clear that these clocks were much better than chronological age at telling us how a patient was aging. The first clocks were trained to predict the chronological age of the patient it tested. This is the definition of a first generation clock (Bergsma and Rogaeva 2020).

The problem with first generation clocks is that we don’t necessarily care about the chronological age of a patient. Rather, we really care about the biochemistry of aging. So, how can we detect that better? 

The answer is to measure and train these DNA methylation patterns to better measurements of aging rather than chronological age. This is how the second generation clocks were created. The three most popular second generation clocks are PhenoAge (Levine et al., 2018) which was trained to 10 blood measurements, GrimAge (Lu et al., 2019) which was trained to predict 12 protein measurements and time until death, and the Telomere Length Clock (Lu et al., 2019) which was trained to predict telomere length. 

These second generation clocks were much better. How do we know? Accelerated aging scores were even more predictive of negative health outcomes, and decelerated aging scores were even more positive health outcomes (Bergsma and Rogaeva 2020).

Beyond this, the second generation clocks were also associated more highly with diseases (Bergsma and Rogaeva 2020). Even then, however, there was still room for improvement. This is because these second generation clocks were created with samples from many people over different timepoints in their life. To get the best aging signal, it would be best to follow the same individuals across their own life at various time points.

That’s exactly what the DunedinPACE did. Unlike previous clocks, the Dunedin Pace of Aging (DunedinPACE) was not trained on chronological age. It is the first clock to be trained entirely on phenotypes of aging in the same patients across their lifespan – all the way from age 3 to age 51. This is helpful because we aren’t picking up ‘noise” in our measurements. By following the same individuals we can make sure that things like environmental exposures aren’t included in these clocks. For example, 50 years ago many people were exposed to more lead through leaded gasoline, less antibiotics, and less microplastics. If we don’t control for the time at which people lived, our algorithm might include markers associated with these exposures rather than just measuring aging.

Generally, the more biologically informed an algorithm is, the better it is at capturing the signal of biological aging and reducing other confounding factors. We are the only commercial company offering 2nd or 3rd generation algorithms. 

4. Does the algorithm respond to interventions which we know to beneficially affect the biology of aging?

These clocks are the best ways to predict age related outcomes. However, we still don’t know exactly why we see these patterns in our DNA.

In order to make sure that this is a reliable and useful measurement, we also need to make sure that these clocks respond to things we already know beneficially affect biology. An article from 2020 by Jamie Justice PhD, from Wake Forest, outlines the following criteria for an aging biomarker.

As you can see, at the time of publication in 2020, none of the clocks have been able to fulfill this last criteria. However, this has changed. Now, the DunedinPace has satisfied all criteria. 

One of the cohorts used to validate to prove this consisted of middle-aged, non-obese adults enrolled in the CALERIE trial. This trial tested the effects of caloric restriction – an intervention that has been successful in a variety of studies to improve biological aging – over a period of two years. 

As you can see in the image below, just as expected, the DunedinPACE was able to show a decrease in the rate of aging in those groups who restricted calories by approximately 11% over 2 years.

However, the importance of this goes beyond validation of DunedinPACE.  This data also shows that the first generation algorithms actually went up with caloric restriction. This shouldn’t happen as all other types of phenotypic aging markers in the study improved.  Thus, it shows that first generation algorithms don’t always respond to intervention correctly! 

You can see more about our testing in the Calarie Trial here. We also included an editorial on why DunedinPACE is the best measurement below.

5. What tissue is being used for testing? Is the tissue the same as used in the algorithm design?

While epigenetics is extremely exciting as a biomarker, it can often be difficult as every cell has a different epigenetic signature.  For instance, if we measured brain tissue with biological age algorithms, we would get lower ages than if we tested blood. If we tested breast tissue, we would get higher ages than blood. This is because the epigenetic methylation signature is different across tissues.  This can change our algorithms in ways that are not accurate.  

That is why at TruDiagnostic, we only use blood.  It is a tissue type which has control features like cell deconvolution methods. This allows us to know what cell types are present at what percentage so we can make sure that tissue type can be controlled for.   

This is also why you want to ask how an algorithm was trained.  If an algorithm was trained in blood, you wouldn’t want to measure it with saliva. This is because Saliva can include epithelial cells which are not found in blood.  

We use advanced 12 cell immune deconvolution methods at TruDiagnostic.  We are the only company to also have Saliva deconvolution methods. You can see the white paper on this here. 

Additionally, we use blood because it is what most researchers are doing work on. As new algorithms are created at universities, we would be able to do the analysis of these algorithms because all of our data is also generated in blood. 

Also, we know that some behaviors like smoking can have a drastic impact on saliva DNA methylation and that it doesn’t have the same issues in blood.

6. Does the algorithm have controls for the different cell types which make up a sample?

We control for immune cell subsets with immune deconvolution methods. This is important for precision as variation of immune tissues needs to be taken into account.  It has recently become a point of emphasis across the scientific community as you can see at the following link as discussed by Eric Verdin of the Buck institute.

We are both expecting to publish papers on this in January which will show that controlling for immune cell subsets with high resolution methods improve precision of these algorithms and improves relationship to disease prediction. 

Controlling for cell types is incredibly important to make sure you are seeing real aging signals and not just change in the cell type you are testing. 

7. How many CpGs are being tested (also, will my data provide more insight in the future)?

When we test methylation, we are looking at areas called CpGs which can be methylated.  There are over 28 million different CpG locations in each cell. This is great because it gives us the ability to create really precise algorithms. However, most people would not test all of these due to cost.  

At TruDiagnostic, we test approximately 1 million CpGs.  We also use the same testing infrastructure as almost all clinical researchers.  We do this because it is the largest scale at the cheapest cost. The more data we collect, the more we are able to report back to you.  It also allows us to collaborate with researchers to find more insights into the aging process. 

If we were only able to measure 100,000-200,000, it would significantly limit the types of insights we would be able to provide you from this data. For instance, as the more precise algorithms were created, we implemented this immediately into our population to provide the most accurate results. Most others could not do this because it added approximately 70,000 additional CpGs to the algorithms.

8. What analysis/reports are offered with your testing? And again, have they been published?

As we all know, health goes beyond aging.  Additionally, the methylation patterns in your DNA can tell us a lot more than just about your aging.  It can tell us more about the health of your heart, your brain, and your lungs for instance.  So, how are these other companies also reading these patterns?

With TruDiagnostic, we report on several other aspects of your health.  We have reports on your likelihood of losing weight with caloric restriction, we can tell you about the methylation patterns as it relates to alcohol and smoking, through a physician we can also report on diseases such as diabetes and obesity risk.   

Beyond that, our aging specific algorithms are also more robust than other platforms. We don’t just report biological age but also report immune age, rate of aging, and telomere length.  For any testing, make sure you are maximizing the insight of the data which is available!

9. Where is your sample being tested? What are the quality assurance and quality control protocols?  What preprocessing method is being used?

One other question you should ask epigenetic testing providers is how they are running their tests.  We own a CLIA lab and control processing of samples entirely without any 3rd party. In addition, over our 2.5 years of business, we have built one of the largest epigenetic databases in the world. 

We also have our own in house bioinformatics team which uses validated preprocessing and batch control methods to track individual longitudinal change to make sure our algorithms are the most accurate and precise as possible.

Ready to discover your biological age with TruDiagnostic? Click here to start.

Already tested? Register now to join Rejuvenation Olympics – a free age reversal competition!

Share This Post


Rejuvenation Olympian