Evaluation of imputed genomic data in discrete traits using Random forest and Bayesian threshold methods

ABSTRACT. The objectives of this study were (1) to quantify imputation accuracy and to assess the factors affecting it; and (2) to evaluate the accuracy of threshold BayesA (TBA), Bayesian threshold LASSO (BTL) and random forest (RF) algorithms to analyze discrete traits. Genomic data were simulated to reflect variations in heritability (h2 = 0.30 and 0.10), number of QTL (QTL = 81 and 810), number of SNP (10 K and 50 K) and linkage disequilibrium (LD=low and high) for 27 chromosomes. For real condition simulating, we randomly masked markers with 90% missing rate for each scenario; afterwards, hidden markers were imputed using FImpute software. In imputed genotypes, a wide range of accuracy was observed for RF (0.164-0.512) compared to TBA (0.283-0.469) and BTL (0.272-0.504). Comparing to original genotypes, using imputed genotypes decreased the average accuracy of genomic prediction about 0.0273 (range of 0.024 to 0.036). Comparing to Bayesian threshold, using RF was improved rapidly accuracy of genomic prediction with increase in the marker density. Despite the higher accuracy of BTL and TBA at different levels of LD and heritability, the increase in accuracy was greater for RF. Furthermore, the best method for prediction of genomic accuracy depends on genomic architecture of population.