The landscape of gene-CDS-haplotype diversity in rice (Oryza sativa L.): properties, population organization, footprints of domestication and breeding, and implications in genetic improvement
Fan Zhang, Chunchao Wang, Min Li, Yanru Cui, Yingyao Shi, Zhichao Wu, Zhiqiang Hu, Wensheng Wang, Jianlong Xu, Zhikang Li
Molecular Plant, IF: 12.084
Published: 10 February 2021
The polymorphisms within the gene coding regions represent the most important part of the overall genetic diversity in rice. We characterized the gene-CDS-haplotype (gcHap) diversity of 45,963 rice genes in 3,010 rice accessions. With an average 226±390 gcHaps per gene in rice populations, all rice genes could be classified into three main categories: 12,865 conserved genes, 10,254 subspecific differentiating genes and 22,844 remaining ones. We found that 39,218 of rice genes carry a total of >255,179 major gcHaps of potential functional importance. Most (87.5%) of detected gcHaps were subspecies or population-specific. The inferred proto-ancestors of local landrace populations reconstructed from conserved predominant (ancient) gsHaps correlated strongly with wild rice accessions of the same geographic origins, supporting a multi-origin (domestication) model ofO. sativa . Past breeding efforts resulted in a generally increased gcHap diversity in modern varieties and significant frequency shifts of predominant gcHaps of 14,266 genes from independent selection in the two subspecies. Low frequencies of ‘favorable’ gcHaps at most known genes related to rice yield in modern varieties suggests a huge potential for rice improvement by mining and pyramiding of ‘favorable’ gcHaps. The gcHap data were demonstrated to have a greater power over SNPs in detecting causal genes affecting complex traits. The rice gcHap diversity dataset generated in this study will greatly facilitate rice basic research and improvement in the future.