Mapping of an incomplete dominant gene controlling multifoliolate leaf by BSA-Seq in soybean (Glycine max L.)

ZHANG Zhi-Hao1,2(), WANG Jun1, LIU Zhang-Xiong2,*(), QIU Li-Juan1,2,*()   

  1. 1School of Agriculture, Yangtze University, Jingzhou 434025, Hubei, China
    2Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
  • Received:2020-03-24 Accepted:2020-08-19 Online:2020-12-12 Published:2020-09-02
The leaves of cultivated soybean (Glycine max L.) are comprising of three leaflets in general, but there are also individual varieties or mutants which have a high frequency of compound leaves with 4-7 leaflets, named multifoliolate leaves. Compound leaf formation enhances the plant's ability to adapt to the external environment. Study of related genes to multifoliolate leaves might contribute to the improvement yield level of and soybean agronomic traits. In this study, a multifoliolate leaf mutant Zhonghuang 622 was identified from the mutant library of soybean cultivar Zhongpin 661, which had 4-9 leaflets in each compound leaf. The compound leaf phenotypes of F2 and F2:3 population from a cross between Zhongpin 661 and Zhonghuang 622 were investigated in Beijing and Hainan, respectively. Analysis of phenotypic data from F2 and F2:3 population revealed that the multifoliolate leaf trait was controlled by an incomplete dominant gene. BSA-Seq method was used for gene mapping. The two bulks of normal trifoliate and multifoliolate individuals in F2 population were constructed and sequenced for an average depth of 32.75×, which covered 99.22% genome compared to the reference genome. Through correlation analysis of mixed pool sequencing results by ED method, two regions were located on chromosome 11, with a total length of 5.29 Mb and a total length of 1103 genes. Three regions were identified on chromosome 11 at confidence of 0.99, with a total length of 3.42 Mb and a total of 701 genes by the association analysis of SNP-index method. There were 690 genes located simultaneously and six SNP genes between parents by the two association analysis methods. These results lay the foundation for map-based cloning of the genes related to compound leaf development.

Key words: soybean, mutant, BSA-Seq

Fig. 1

BSA-seq data analysis process"

Table 1

Number of individuals with different genotypes and phenotypes in the F2 population"

基因型Genotype 期望比
Expectation ratio
χ2 P0.05, 0.01
lf alf a lf alf b lf blf b
I 54 2 0
II 0 99 0
III 0 4 80
总数Total 54 105 80 1︰2︰1 9.18 5.99, 9.27

Fig. 2

Identification of multiple leaflet gene candidate intervals using two association methods A: ED correlation analysis results, the abscissa is the chromosome position, the ordinate represents the fifth power of the euclidean distance (ED) value after fitting, the black line is the fifth power of ED after fitting, the dashed line represents the significance association threshold. B: SNP-index correlation analysis results, the abscissa for chromosomal location, the black line for fitting after ΔSNP-index value, the red line represents the confidence level of 0.99 the threshold line, blue line represents the confidence level of 0.95 the threshold line, green line represents the confidence level of the threshold line of 0.90. The results of the two association analysis methods show that the correlation regions associated with the multifoliolate leaf trait is located at the end of chromosome 11."

Table 2

Information of associated regions detected by different methods"

Association analysis method
Start of associated regions
End of associated regions
Associated region size (Mb)
Gene number in the associated regions
Euclidean distance (ED)
Chr. 11 0 4,150,000 4.15 896
Chr. 11 5,570,000 6,710,000 1.14 207
总计Total 5.29 1103
SNP-index Chr. 11 0 250,000 0.25 44
Chr. 11 1,510,000 3,480,000 1.97 439
Chr. 11 5,570,000 6,770,000 1.20 218
总计Total 3.42 701
Intersection of two methods
Chr. 11 0 250,000 0.25 44
Chr. 11 1,510,000 3,480,000 1.97 439
Chr. 11 5,570,000 6,710,000 1.14 207
总计Total 3.36 690

Fig. 3

Distribution of SNPs and associated signals on chromosomes between samples From outside to inside in order: the first circle represents chromosome coordinates, the second circle represents gene distribution, the third circle represents SNP density distribution, the fourth circle represents ED value distribution, and the fifth circle represents ΔSNP-index value distribution."

Table 3

Sequence and information of the primers"

Primer ID
Forward primer (5°-3°)
Reverse primer (5°-3°)
Product size (bp)
Number of detection sites

Table 4

Identification of some SNP loci in the interval"

SNP loci
Reference base
Altered base
混池read值Bulk read value in mixed pool SNP质量评价
SNP quality evaluation
Appraisal results
Zhongpin 661
Zhonghuang 622
Normal leaf mixing pool
Multi-leaflet mixing pool
Chr.11 1738094 C T 14,7 8,0 32,0 42,0 低 Low 假False
Chr.11 1738120 C T 14,5 8,0 33,0 39,0 低Low 假False
Chr.11 1738157 G A 10,0 9,1 32,5 38,4 低Low 假False
Chr.11 1738175 A T 9,0 9,1 34,9 39,6 低Low 假False
Chr.11 1738511 A T 20,0 9,1 37,4 39,10 低Low 假False
Chr.11 1947868 A T 12,0 0,7 27,0 0,48 高High 真True
Chr.11 1964348 C T 9,0 5,1 42,5 41,7 低Low 假False
Chr.11 1964460 T C 8,0 9,5 37,28 44,28 低Low 假False
Chr.11 1964710 T A 9,0 6,2 36,8 38,8 低Low 假False

Table 5

Number of SNP types that occurred between parents in the interval"

SNP loci
碱基类型 Base type 突变类型
Mutation type
Gene ID
Annotated function
中品661 Zhongpin 661 中黄622 Zhonghuang 622
1947868 A T 非同义SNV
Nonsynonymous SNV
Glyma.11G027100 同源盒蛋白knotted-1-like-7
Homeobox protein knotted-1-like-7
2489820 G A 同义SNV
Synonymous SNV
Glyma.11G034100 亮氨酸-tRNA连接酶/亮氨酰-tRNA合成酶
Leucine-tRNA ligase/Leucyl-tRNA synthetase
2954231 G A 非同义SNV
Nonsynonymous SNV
Glyma.11G040200 无意义转录物1的调节因子(UPF1, RENT1)
Regulator of nonsense transcripts 1 (UPF1, RENT1)
3156378 G A 非同义SNV
Nonsynonymous SNV
Glyma.11G043100 At-hook motif核定位蛋白21相关
At-hook motif nuclear localized protein 21- related
3334413 C A 基因上游
Glyma.11G045200 核基质构成蛋白1蛋白相关
Nuclear matrix constituent protein 1-like protein-related
6288434 C T 内含子
Glyma.11G083800 酰基激活酶1, 过氧化物酶体相关
Acyl-activating enzyme 1, peroxisomal-related

Fig. 4

Expression profile of six candidate genes Colors in the square represent the expression level of candidate genes: blue is the lowest, white is middle, and red is the highest."

