# The DNA methylation profile of beef cattle is influenced by additive genetics and age

### Population resource and data collection

All animal procedures implemented in this study were approved by the University of Nebraska-Lincoln Animal Care and Use Committee. All methods are performed in accordance with relevant guidelines and regulations. Methods are reported in the manuscript according to the recommendations in the ARRIVE manual.

The blood obtained from 136 cows was centrifuged at 2500 xg for 10 minutes at room temperature. The buffy coat was collected and stored at -80 ° C. All samples were collected from October 2 to November 27, 2018 at the Eastern Nebraska Research and Extension Center at the University of Nebraska-Lincoln. Cows were selected to provide variation in age and ranged from 217 to 3,192 days (0.6 to 8.7 years) old at the time of sampling. The animals came from a mixed population consisting of purebred Angus and composites of different proportions of Angus, Simmental and Red Angus.

All animals were genotyped with medium density Illumina BovineSNP50 (~ 50 K SNPs) BeadChip (Illumina, San Diego, CA, USA). Genotype filtering included removal of non-autosomal SNPs, SNPs with less allele frequency <0.02, and SNPs with Hardy-Weinberg equilibrium P-value> 10−5.

### DNA methylation

DNA was extracted from buffy coats using the DNeasy Blood and Tissue Kit (Qiagen, Cat. No. 69506) and subsequently bisulfite-converted using the EZ DNA Methylation Kit (ZymoResearch, Irvine, CA, USA). DNAm data were obtained from bisulfite-treated samples using the mammalian array (HorvathMammalMethylChip40)31. The DNAm level for each site was calculated as methylation β-value (β-value = intensity of the methylated allele / intensity of the non-methylated allele + intensity of the methylated allele + 100). The addition of 100 was used to stabilize β values ​​when both the intensities of the methylated and non-methylated alleles were small32. SeSAMe pipeline33 was used to generate normalized β values ​​and for quality control. The β value has severe heteroskedasticity outside the intermediate methylation range; thus, a logit transformation of the β-values ​​(M-values) was used to approximate homoskedasticity. M-values ​​of 0 correspond to 50% of the methylation, and positive and negative values ​​correspond to greater and less than 50% methylation level, respectively. M values ​​were used to quantify DNAm level by region (i.e., promoter, 5 ‘and 3’ UTR, exonic, intronic, and intergenic) and location related to CpG islands (i.e., inside or outside). DNA methylation load was calculated as the sum of all DNAm levels (M-value).

DNAm status for each site was determined by the distribution of the methylation β values. For example, β values ​​below, within and above 2 standard deviations were classified as unmethylated (0), intermediate methylated (1) and methylated (2), respectively. DNAm status was used for prediction purposes.

### Statistical analyzes

#### Genetic parameters for DNAm level and DNAm load

Genetic parameters (ie, additive genetic and residual variances) for DNAm level (M values) and DNAm load were estimated using the following animal model fitted in a Bayesian genomic best linear unbiased prediction (GBLUP) framework.

$${\ varvec {y}} = {\ varvec {X}} {\ varvec {b}} + {\ varvec {Z}} {\ varvec {u}} + {\ varvec {e}}$$

where \ ({\ varvec {y}} \) corresponds to the vector of phenotypes (DNAm level or DNAm load); \ ({\ varvec {X}} \) corresponds to the design matrix that connects the fixed effects with the phenotypes; \ ({\ varvec {b}} \) corresponds to the vector of fixed effects, including the point of intersection, the linear and quadratic (DNAm only) covariates of age and covariates for the proportion of each breed; \ ({\ varvec {Z}} \) corresponds to the incidence matrix linking the random animal effect with the phenotypes; \ ({\ varvec {u}} \) corresponds to the vector of random animal effects, where \ ({\ varvec {u}} \) ~ N (0, G \ ({\ sigma} _ {u} ^ {2} \)), where G corresponds to the genomic ratio matrix constructed according to VanRaden’s first method34 and \ ({\ sigma} _ {u} ^ {2} \) corresponds to the additive genetic variance; and \ ({\ varvec {e}} \) corresponds to the vector of random residual effects associated with the phenotype where \ ({\ varvec {e}} \) ~ N (0, I \ ({\ sigma} _ {e} ^ {2} \)), where you correspond to the identity matrix and \ ({\ sigma} _ {e} ^ {2} \) corresponds to the remaining variance. The h2 was obtained as the ratio of additive genetic variance divided by phenotypic variance (additive genetic variance + residual variance).

Gibbs sampling was used to test a posterior parameter distribution with a chain length of 20,000 iterations, burning of 2,000 samples and a thinning interval of 100. Analyzes were performed using EVXR package35 and R-software.

#### Age effect on DNAm load and age prediction

The effect of age on the DNAm load was estimated by adjusting an exponential regression of the DNAm load on the age (years) of the animals. DNAm status was included as a variable to predict the age of animals using five Bayesian regression models: BRR29Bayes A29BayesB29BayesCπ36and Bayesian LASSO (BLASSO)37as follows:

$${\ varvec {y}} = {\ varvec {X}} {\ varvec {b}} + \ sum_ {i = 1} ^ {k} {{\ varvec {m}}} _ {{\ varvec {i}}} {\ boldsymbol {\ alpha}} _ {{\ varvec {i}}} {{\ varvec {\ updelta}}} _ {{\ varvec {i}}} + {\ varvec {e} }$$

where \ ({\ varvec {X}} \) and \ ({\ varvec {e}} \) has been previously described; \ ({\ varvec {y}} \) corresponds to the vector of age this year; \ ({\ varvec {b}} \) corresponds to the vector of fixed effects, including the point of intersection and the linear covariates of each race; \ ({{\ varvec {m}}} _ {{\ varvec {i}}} \) is the vector for the DNAm status of the site I (coded as 0, 1 and 2); \ ({\ boldsymbol {\ alpha}} _ {{\ varvec {i}}} \) i is the effect of DNAm status of the site I for each of k websites; \ ({{\ varvec {\ updelta}}} _ {{\ varvec {i}}} \) is a DNAm status site indicator I was included (\ ({{\ varvec {\ updelta}}} _ {{\ varvec {i}}} \) = 1) or excluded (\ ({{\ varvec {\ updelta}}} _ {{\ varvec {i}}} \) = 0) from the model for a given iteration of gibbs sampling (BayesRR and BayesA, \ ({{\ varvec {\ updelta}}} _ {{\ varvec {i}}} \) = 1). In BayesRR and BayesCπ, the effect of DNAm status is assumed to follow a normal distribution. In BayesA and BayesB, the effect of DNAm status is assumed to follow a tdistribution with site-specific variances. In Bayes Lasso, the effect of DNAm status is assumed to follow a double-exponential distribution.

Gibbs sampling was used to test a posterior parameter distribution with a chain length of 20,000 iterations, burning of 2,000 samples and a thinning interval of 100. Bootstrapping (n= 400) were used to evaluate the performance of the models with 102 and 34 people, respectively, as training and validation populations. The performance of the models was evaluated based on the relationship between true and predicted age, mean square error, and slope of the regression of predicted age to true age. Analyzes were performed in BGLR package35 and R-software.