Welcome to Acta Agronomica Sinica,

Acta Agronomica Sinica ›› 2025, Vol. 51 ›› Issue (8): 2128-2138.doi: 10.3724/SP.J.1006.2025.44199

• CROP GENETICS & BREEDING·GERMPLASM RESOURCES·MOLECULAR GENETICS • Previous Articles     Next Articles

Genome-wide association study of yield components using a 40K SNP array and identification of a stable locus for boll weight in upland cotton (Gossypium hirsutum L.)

LI Yi-Qian2(), XU Shou-Zhen1, LIU Ping1, MA Qi1, XIE Bin1, CHEN Hong1,*()   

  1. 1Cotton Research Institute, Xinjiang Academy of Agricultural and Reclamation Science, Shihezi 832000, Xinjiang, China
    2College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, Zhejiang, China
  • Received:2024-12-03 Accepted:2025-04-27 Online:2025-08-12 Published:2025-05-14
  • Contact: *E-mail: xjchenh990122@163.com
  • Supported by:
    Science and Technology Innovation 2023-Major Project(2023ZD0404106)

Abstract:

Cotton yield is primarily determined by key yield components, including boll number per plant, boll weight, and lint percentage. Understanding the genetic basis of these traits is essential for advancing molecular breeding strategies. In this study, a natural population of 612 upland cotton (Gossypium hirsutum L.) accessions was genotyped using a 40K SNP array based on liquid-phase probe hybridization technology. Phenotypic data for boll number per plant, boll weight, lint percentage, and seed cotton yield were collected across five different environments. A genome-wide association study (GWAS) identified six significant loci: two associated with boll number per plant (on chromosomes A03 and A05), one with boll weight (on chromosome A07), one with lint percentage (on chromosome D01), and two with seed cotton yield (on chromosomes A05 and D07). Notably, a stable QTL located between 89.01 and 90.45 Mb on chromosome A07 was consistently associated with boll weight across all five environments (P = 5.3646×10-8). Haplotype analysis of this region revealed two major haplotypes, with accessions carrying the favorable haplotype exhibiting a significant increase in boll weight of 0.64 g. By integrating whole-genome resequencing and transcriptome data, seven candidate genes were identified within this region, and a key SNP variant was pinpointed for potential use in molecular marker development. These findings enhance our understanding of the genetic architecture of cotton yield traits and offer valuable molecular resources for high-yield cotton breeding programs.

Key words: upland cotton, yield components, GWAS, boll weight, high-yield breeding

Fig. S1

Frequency distribution histogram of four yield-related traits in five environments BN: boll number; BW: boll weight; LP: lint percentage; Yield: seed cotton yield; 2019SHZ, 2019KRL, 2020SHZ, 2020KRL, and 2021SHZ represent five natural environments, namely Shihezi in 2019, Korla in 2019, Shihezi in 2020, Korla in 2020, and Shihezi in 2021, respectively."

Table 1

Statistics of yield-related traits across different environments and heritability calculation"

性状
Trait
环境
Environment
最小值
Min.
最大值
Max.
平均值
Mean
标准差
SD
变异系数
CV (%)
广义遗传力
H2
单株铃数
BN
(bolls plant-1)
2019SHZ 3.45 13.65 6.94 1.63 23.44 0.50
2019KRL 3.95 14.10 7.12 1.59 22.35
2020SHZ 2.75 9.35 4.86 0.93 19.17
2020KRL 4.00 10.58 6.31 0.94 14.95
2021SHZ 3.00 13.30 6.86 1.71 24.92
平均Mean 4.06 10.53 6.42 0.90 14.09
单铃重
BW (g)
2019SHZ 3.78 7.63 5.60 0.49 8.66 0.83
2019KRL 3.82 7.90 5.74 0.58 10.13
2020SHZ 4.17 7.82 5.67 0.46 8.14
2020KRL 3.90 8.23 5.69 0.60 10.63
2021SHZ 2.58 10.35 5.73 0.68 11.90
平均 Mean 4.51 7.53 5.68 0.41 7.20
衣分
LP (%)
2019SHZ 28.58 49.57 40.81 3.02 7.39 0.94
2019KRL 33.89 51.22 43.65 2.72 6.23
2020SHZ 34.72 50.29 44.08 2.43 5.54
2020KRL 31.94 53.59 42.71 2.61 6.12
2021SHZ 27.91 56.97 42.38 2.95 6.97
平均 Mean 34.44 50.24 42.67 2.39 5.60
产量
Yield (kg)
2019SHZ 0.84 3.14 1.61 0.36 22.56 0.51
2019KRL 0.77 3.17 1.69 0.40 23.71
2020SHZ 0.62 2.12 1.18 0.21 17.92
2020KRL 1.14 4.75 3.13 0.43 13.65
2021SHZ 1.32 4.82 2.69 0.31 11.32
平均Mean 1.44 2.76 2.06 0.20 9.86

Fig. 1

Correlation analysis and frequency distribution histograms of four yield-related traits The diagonal boxes show frequency distribution histograms of trait means across five environments. Upper triangles show correlation coefficients and significance levels. Lower triangles show scatter plots between traits. BN: bolls per plant; BW: boll weight; LP: lint percentage; Yield: seed cotton yield. *, **, *** indicate significant correlations at the 0.05, 0.01, and 0.001 probability levels, respectively."

Table 2

Summary of QTLs identified by GWAS"

性状
Trait
染色体
Chr.
最显著SNP位置
Peak SNP position (bp)
QTL区间
QTL region (bp)
P
P-value
环境
Environment
单株铃数BN
(bolls plant-1)
A03 108732262 107926223-108967905 4.28E-06 BN-2019SHZ
A05 4400157 4049988-4980793 2.59E-05 BN-2019SHZ
铃重BW (g) A07
89462845 89012271-90448020 5.36E-08 BW-2020SHZ
88885938 1.08E-05 BW-2020 KRL
90441327 1.95E-05 BW-2021SHZ
88885938 2.46E-06 BW-BLUP
88885938 1.37E-06 BW-Mean
衣分LP (%) D01 8257066 7871209-9168549 6.39E-06 LP-2020KRL
产量Yield (kg) D07 15784739 15694741-16158454 8.63E-07 Yield-2020SHZ
A05 16130212 15449114-16318936 2.53E-05 Yield-BLUP

Fig. 2

Manhattan plots of genome-wide association analysis for yield-related traits (a): bolls per plant in 2019 Shihezi; (b): lint percentage in 2020 Korla; (c): boll weight in 2020 Korla; (d): boll weight in 2020 Shihezi; (e): boll weight in 2021 Shihezi; (f): mean boll weight; (g): BLUP value of boll weight; (h): seed cotton yield in 2020 Shihezi; (i): BLUP value of seed cotton yield. Abbreviations are the same as those given in Table 1."

Fig. 3

Haplotype analysis and candidate gene identification for the boll weight QTL on Chromosome A07 (a): linkage disequilibrium (LD) plot of the boll weight QTL region on Chromosome A07; (b): haplotype analysis of the boll weight QTL (t-test, ***, P < 0.001); (c): expression profiles of genes containing nonsynonymous variants in the QTL interval. Gene IDs are shown on the left, and the heatmap displays FPKM values in ovules and fibers at different developmental stages. dpa: days post anthesis."

Table S1

Summary of nonsynonymous variants information within the candidate interval of boll weight QTL on Chromosome A07"

染色体
Chr.
起始
Start
终止
End
参考型
Reference
突变型
Alternate
基因ID
Gene ID
A07 89194219 89194219 A T GH_A07G2179
A07 89234256 89234256 C T GH_A07G2180
A07 89234408 89234408 G C GH_A07G2180
A07 89238319 89238319 G A GH_A07G2181
A07 89238478 89238478 G C GH_A07G2181
A07 89238508 89238508 A G GH_A07G2181
A07 89238519 89238519 C A GH_A07G2181
A07 89244445 89244445 G C GH_A07G2182
A07 89244493 89244493 G A GH_A07G2182
A07 89244494 89244494 C T GH_A07G2182
A07 89244830 89244830 A G GH_A07G2182
A07 89305519 89305519 G A GH_A07G2184
A07 89505855 89505855 T G GH_A07G2189
A07 89507731 89507731 G A GH_A07G2190
A07 89519739 89519739 G T GH_A07G2191
A07 89598121 89598121 T G GH_A07G2192
A07 89598233 89598233 T A GH_A07G2192
A07 89844356 89844356 G A GH_A07G2193
A07 89844389 89844389 C A GH_A07G2193
A07 89844432 89844432 C T GH_A07G2193
A07 89844467 89844467 T C GH_A07G2193
A07 89844501 89844501 C T GH_A07G2193
A07 89844551 89844551 G A GH_A07G2193
A07 89844599 89844599 G T GH_A07G2193
A07 89844600 89844600 A G GH_A07G2193
A07 89844641 89844641 G A GH_A07G2193
A07 89844860 89844860 G A GH_A07G2193
A07 89844980 89844980 G T GH_A07G2193
A07 89845057 89845057 T A GH_A07G2193
A07 89845101 89845101 G C GH_A07G2193
A07 89845146 89845146 A G GH_A07G2193
A07 89850561 89850561 A G GH_A07G2193
A07 89996775 89996775 G A GH_A07G2197
A07 89996897 89996897 C T GH_A07G2197
A07 89996929 89996929 A G GH_A07G2197
A07 89997083 89997083 G A GH_A07G2197
A07 90005923 90005923 A T GH_A07G2198
A07 90057893 90057893 T C GH_A07G2199
A07 90058035 90058035 T G GH_A07G2199
A07 90058102 90058102 T A GH_A07G2199
A07 90058125 90058125 A G GH_A07G2199
A07 90058134 90058134 G T GH_A07G2199
A07 90058140 90058140 C T GH_A07G2199
A07 90058175 90058175 A T GH_A07G2199
A07 90058178 90058178 C A GH_A07G2199
A07 90058232 90058232 A T GH_A07G2199
A07 90058292 90058292 T G GH_A07G2199
A07 90058320 90058320 C A GH_A07G2199
A07 90058380 90058380 C G GH_A07G2199
A07 90058385 90058385 G C GH_A07G2199
A07 90058520 90058520 T C GH_A07G2199
A07 90058524 90058524 G C GH_A07G2199
A07 90058548 90058548 C A GH_A07G2199
A07 90058565 90058565 A C GH_A07G2199
A07 90058602 90058602 T C GH_A07G2199
A07 90058673 90058673 C G GH_A07G2199
A07 90058683 90058683 G C GH_A07G2199
A07 90058719 90058719 A G GH_A07G2199
A07 90095575 90095575 G A GH_A07G2200
A07 90095636 90095636 G C GH_A07G2200
A07 90128297 90128297 G A GH_A07G2201
A07 90160058 90160058 C A GH_A07G2203
A07 90169457 90169457 G C GH_A07G2205
A07 90170915 90170915 C A GH_A07G2206
A07 90176489 90176489 T C GH_A07G2206
A07 90176534 90176534 G A GH_A07G2206
A07 90176573 90176573 T C GH_A07G2206
A07 90176808 90176808 G C GH_A07G2206
A07 90177633 90177633 C T GH_A07G2206
A07 90177788 90177788 A C GH_A07G2206
A07 90178114 90178114 A C GH_A07G2206
A07 90178237 90178237 T A GH_A07G2206
A07 90178390 90178390 C G GH_A07G2206
A07 90328539 90328539 A C GH_A07G2208

Table 3

Annotations of proteins encoded by candidate genes associated with boll weight on Chromosome A07"

基因ID
Gene ID
蛋白注释
Protein annotation
GH_A07G2180 跨膜蛋白 Transmembrane protein
GH_A07G2181 跨膜蛋白 Transmembrane protein
GH_A07G2182 β-1,2-N-乙酰葡萄糖氨基转移酶 beta-1,2-N-acetylglucosaminyl transferase
GH_A07G2184 SNF1相关蛋白激酶 SNF1-related protein kinase
GH_A07G2189 类束状蛋白阿拉伯半乳糖蛋白2 FASCICLIN-like arabinogalactan 2
GH_A07G2201 RING/U-box超家族蛋白 RING/U-box superfamily protein
GH_A07G2203 CONSTANS样蛋白9 CONSTANS-like 9
[1] Hu Y, Chen J D, Fang L, Zhang Z Y, Ma W, Niu Y C, Ju L Z, Deng J Q, Zhao T, Lian J M, et al. Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton. Nat Genet, 2019, 51: 739-748.
[2] Lam H M, Xu X, Liu X, Chen W B, Yang G H, Wong F L, Li M W, He W M, Qin N, Wang B, et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet, 2010, 42: 1053-1059.
[3] Fang L, Wang Q, Hu Y, Jia Y H, Chen J D, Liu B L, Zhang Z Y, Guan X Y, Chen S Q, Zhou B L, et al. Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits. Nat Genet, 2017, 49: 1089-1098.
doi: 10.1038/ng.3887 pmid: 28581501
[4] Ma Z Y, He S P, Wang X F, Sun J L, Zhang Y, Zhang G Y, Wu L Q, Li Z K, Liu Z H, Sun G F, et al. Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. Nat Genet, 2018, 50: 803-813.
doi: 10.1038/s41588-018-0119-7 pmid: 29736016
[5] Han Z G, Chen H, Cao Y W, He L, Si Z F, Hu Y, Lin H, Ning X Z, Li J L, Ma Q, et al. Genomic insights into genetic improvement of upland cotton in the world’s largest growing region. Ind Crops Prod, 2022, 183: 114929.
[6] Paterson A H, Brubaker C L, Wendel J F. A rapid method for extraction of cotton (Gossypium spp.) genomic DNA suitable for RFLP or PCR analysis. Plant Mol Biol Rep, 1993, 11: 122-127.
[7] Si Z F, Jin S K, Li J Y, Han Z G, Li Y Q, Wu X N, Ge Y X, Fang L, Zhang T Z, Hu Y. The design, validation and utility of the “ZJU CottonSNP40K” liquid chip through genotyping by target sequencing. Ind Crops Prod, 2022, 188: 115629.
[8] Zhang T Z, Hu Y, Jiang W K, Fang L, Guan X Y, Chen J D, Zhang J B, Saski C A, Scheffler B E, Stelly D M, et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat Biotechnol, 2015, 33: 531-537.
[9] Kang H M, Sul J H, Service S K, Zaitlen N A, Kong S Y, Freimer N B, Sabatti C, Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat Genet, 2010, 42: 348-354.
doi: 10.1038/ng.548 pmid: 20208533
[10] Wang M J, Tu L L, Lin M, Lin Z X, Wang P C, Yang Q Y, Ye Z X, Shen C, Li J Y, Zhang L, et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet, 2017, 49: 579-587.
[11] Wang M J, Tu L L, Yuan D J, Zhu D, Shen C, Li J Y, Liu F Y, Pei L L, Wang P C, Zhao G N, et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat Genet, 2019, 51: 224-229.
[12] Ma Z Y, Zhang Y, Wu L Q, Zhang G Y, Sun Z W, Li Z K, Jiang Y F, Ke H F, Chen B, Liu Z W, et al. High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. Nat Genet, 2021, 53: 1385-1391.
doi: 10.1038/s41588-021-00910-2 pmid: 34373642
[13] Chang C C, Chow C C, Tellier L C, Vattikuti S, Purcell S M, Lee J J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 2015, 4: 7.
doi: 10.1186/s13742-015-0047-8 pmid: 25722852
[14] Wang K, Li M Y, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res, 2010, 38: e164.
[15] Dai F, Chen J D, Zhang Z Q, Liu F J, Li J, Zhao T, Hu Y, Zhang T Z, Fang L. COTTONOMICS: a comprehensive cotton multi- omics database. Database, 2022, 2022: baac080.
[16] Li F G, Fan G Y, Lu C R, Xiao G H, Zou C S, Kohel R J, Ma Z Y, Shang H H, Ma X X, Wu J Y, et al. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol, 2015, 33: 524-530.
[17] Li Y Q, Si Z F, Wang G P, Shi Z L, Chen J W, Qi G A, Jin S K, Han Z G, Gao W H, Tian Y, et al. Genomic insights into the genetic basis of cotton breeding in China. Mol Plant, 2023, 16: 662-677.
doi: 10.1016/j.molp.2023.01.012 pmid: 36738104
[18] Zhang Y Y, Zhou F W, Wang H, Chen Y N, Yin T M, Wu H T. Genome-wide comparative analysis of the fasciclin-like arabinogalactan proteins (FLAs) in salicacea and identification of secondary tissue development-related genes. Int J Mol Sci, 2023, 24: 1481.
[19] Cagnola J I, Dumont de Chassart G J, Ibarra S E, Chimenti C, Ricardi M M, Delzer B, Ghiglione H, Zhu T, Otegui M E, Estevez J M, et al. Reduced expression of selected FASCICLIN-LIKE ARABINOGALACTAN PROTEIN genes associates with the abortion of kernels in field crops of Zea mays (maize) and of Arabidopsis seeds. Plant Cell Environ, 2018, 41: 661-674.
[20] Feraru E, Feraru M I, Moulinier-Anzola J, Schwihla M, Ferreira Da Silva Santos J, Sun L, Waidmann S, Korbei B, Kleine-Vehn J. PILS proteins provide a homeostatic feedback on auxin signaling output. Development, 2022, 149: dev200929.
[1] GAO Meng-Juan, ZHAO He-Ying, CHEN Jia-Hui, CHEN Xiao-Qian, NIU Meng-Kang, QIAN Qi-Run, CUI Lu-Fei, XING Jiang-Min, YIN Qing-Miao, GUO Wen, ZHANG Ning, SUN Cong-Wei, YANG Xia, PEI Dan, JIA Ao-Lin, CHEN Feng, YU Xiao-Dong, REN Yan. Mapping and identification of a novel sharp eyespot resistance locus Qse.hnau-5AS and its candidate genes in wheat [J]. Acta Agronomica Sinica, 2025, 51(8): 2240-2250.
[2] LI Bing-Lin, YE Xiao-Lei, XIAO Hong, XIAO Guo-Bin, LYU Wei-Sheng, LIU Jun-Quan, REN Tao, LU Zhi-Feng, LU Jian-Wei. Effects of magnesium fertilization rates on rapeseed yield, magnesium uptake, and yield loss caused by frost damage [J]. Acta Agronomica Sinica, 2025, 51(7): 1850-1860.
[3] WANG Qiong, ZOU Dan-Xia, CHEN Xing-Yun, ZHANG Wei, ZHANG Hong-Mei, LIU Xiao-Qing, JIA Qian-Ru, WEI Li-Bin, CUI Xiao-Yan, CHEN Xin, WANG Xue-Jun, CHEN Hua-Tao. Genome-wide association analysis and candidate genes prediction of flowering time and maturity date traits in soybean (Glycine max L.) [J]. Acta Agronomica Sinica, 2025, 51(6): 1558-1568.
[4] MENG Zi-Zhen, LIU Chen, SHENG Qian-Nan, XIONG Zhi-Hao, FANG Ya-Ting, ZHAO Jian, YU Qiu-Hua, WANG Kun-Kun, LI Xiao-Kun, REN Tao, LU Jian-Wei. Effects of nitrogen, phosphorus, and potassium fertilizer application on the yield increase of winter oilseed rape and the degree of yield reduction due to freezing stress [J]. Acta Agronomica Sinica, 2025, 51(4): 1037-1049.
[5] XU Jian-Xia, DING Yan-Qing, CAO Ning, CHENG Bin, GAO Xu, LI Wen-Zhen, ZHANG Li-Yi. Genome-wide association analysis and prediction of candidate genes for plant height and internode number in Chinese sorghum [J]. Acta Agronomica Sinica, 2025, 51(3): 568-585.
[6] ZHAO Fei-Fei, LI Shao-Xiong, LIU Hao, LI Hai-Fen, WANG Run-Feng, HUANG Lu, YU Qian-Xia, HONG Yan-Bin, CHEN Xiao-Ping, LU Qing, CAO Yu-Man. Association mapping of internode and lateral branch internode length of peanut main stem and analysis of candidate genes [J]. Acta Agronomica Sinica, 2025, 51(2): 548-556.
[7] LI Chang-Xi, DONG Zhan-Peng, GUAN Yong-Hu, LIU Jin-Wei, LI Hang, MEI Yong-Jun. Genetic contribution and decision coefficient analysis of agronomic characters and lint yield traits of upland cotton in southern Xinjiang [J]. Acta Agronomica Sinica, 2024, 50(6): 1486-1502.
[8] WANG Fei-Er, GUO Yao, LI Pan, WEI Jin-Gui, FAN Zhi-Long, HU Fa-Long, FAN Hong, HE Wei, YIN Wen, CHEN Gui-Ping. Compensation mechanism of increased maize density on yield with water and nitrogen reduction supply in oasis irrigation areas [J]. Acta Agronomica Sinica, 2024, 50(6): 1616-1627.
[9] ZHANG Hong-Mei, ZHANG Wei, WANG Qiong, JIA Qian-Ru, MENG Shan, XIONG Ya-Wen, LIU Xiao-Qing, CHEN Xin, CHEN Hua-Tao. Genome-wide association study for vitamin E content in soybean (Glycine max L.) seed [J]. Acta Agronomica Sinica, 2024, 50(5): 1223-1235.
[10] HAO Qian-Lin, YANG Ting-Zhi, LYU Xin-Ru, QIN Hui-Min, WANG Ya-Lin, JIA Chen-Fei, XIA Xian-Chun, MA Wu-Jun, XU Deng-An. QTL mapping and GWAS analysis of coleoptile length in bread wheat [J]. Acta Agronomica Sinica, 2024, 50(3): 590-602.
[11] WANG Qiong, ZHU Yu-Xiang, ZHOU Mi-Mi, ZHANG Wei, ZHANG Hong-Mei, CEHN Xin, CEHN Hua-Tao, CUI Xiao-Yan. Genome-wide association analysis and candidate genes predication of leaf characteristics traits in soybean (Glycine max L.) [J]. Acta Agronomica Sinica, 2024, 50(3): 623-632.
[12] CHEN Xu-Sheng, ZHAO Liang, DI Jia-Chun. Molecular pyramiding of insect and glyphosate-resistant genes and correlation analysis on economic traits of the pyramided lines in upland cotton [J]. Acta Agronomica Sinica, 2024, 50(10): 2637-2642.
[13] LIU Tao-Fen, LUO Dan, ZHANG Qi-Peng, SUN Yuan-Yuan, LI Pei-Song, TIAN Jing-Shan, ZHANG Wang-Feng, XIANG Dao, ZHANG Ya-Li, YANG Ming-Feng, GOU Ling. Ethephon ripening affects boll weight and fiber quality of machine-harvested cotton [J]. Acta Agronomica Sinica, 2024, 50(1): 209-218.
[14] WANG Rang-Jian, YANG Jun, ZHANG Li-Lan, GAO Xiang-Feng. Genome-wide association analysis of geraniol primrose glycoside abundance in tender tea shoots [J]. Acta Agronomica Sinica, 2023, 49(7): 1843-1859.
[15] TANG Yu-Feng, YAO Min, HE Xin, GUAN Mei, LIU Zhong-Song, GUAN Chun-Yun, QIAN Lun-Wen. Genome-wide identification and functional analysis of SGR gene family in Brassica napus L. [J]. Acta Agronomica Sinica, 2023, 49(7): 1829-1842.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!