Wavelet-domain elastic net for clustering on genomes strains

Abstract We propose to evaluate genome similarity by combining discrete non-decimated wavelet transform (NDWT) and elastic net. The wavelets represent a signal with levels of detail, that is, hidden components are detected by means of the decomposition of this signal, where each level provides a different characteristic. The main feature of the elastic net is the grouping of correlated variables where the number of predictors is greater than the number of observations. The combination of these two methodologies applied in the clustering analysis of the Mycobacterium tuberculosis genome strains proved very effective, being able to identify clusters at each level of decomposition.