欢迎访问作物学报,今天是

作物学报 ›› 2007, Vol. 33 ›› Issue (01): 70-76.

• 研究论文 • 上一篇    下一篇

一种基于似然极大的动态聚类方法及其应用

肖静;胡治球;王学枫;徐辰武*   

  1. 扬州大学江苏省作物遗传生理重点实验室,江苏扬州225009
  • 收稿日期:2005-11-23 修回日期:1900-01-01 出版日期:2007-01-12 网络出版日期:2007-01-12
  • 通讯作者: 徐辰武

A Maximum Likelihood-Based Dynamic Clustering Method and Its Application

XIAO Jing,HU Zhi-Qiu,WANG Xue-Feng,XU Chen-Wu*   

  1. Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou 225009, Jiangsu, China
  • Received:2005-11-23 Revised:1900-01-01 Published:2007-01-12 Published online:2007-01-12
  • Contact: XU Chen-Wu

摘要:

将传统的动态聚类分析和判别分析相结合,引出一种基于似然极大的动态聚类方法,该方法以EM算法实现的极大似然估计进行类参数估计,以相应的贝叶斯后验概率判别个体的归类。模拟研究表明,该方法通常既可无偏估计类参数,又可判别最佳分类个数。与重心法动态聚类和最小组内平方和法动态聚类相比,稳健性较高。同时通过提高判别标准,可以降低误判率。用Fisher的Iris试验数据验证了方法的可行性,并将之成功应用于一个水稻F2群体的个体的主基因基因型鉴别。

关键词: 聚类分析, 后验概率, 贝叶斯信息准则, 判别分析

Abstract:

Clustering analysis is to determine the intrinsic grouping in a set of unlabeled data. A cluster is a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. However, the current clustering techniques have not addressed all the requirements adequately. For instance, dealing with large number of dimensions and large number of data can be problematic because of time complexity. The effectiveness of the distance-based clustering methods depends on the definition of distance; if an obvious distance measure doesn’t exist we must define it, which is not always easy, especially in multi-dimensional spaces. In addition, the choice of the optimal number of clusters in practice is impossible. Thus, choosing the correct number of clusters and the best clustering method is still a question open to discussion. In order to solve these problems, in this paper, we introduced a maximum likelihood-based dynamic clustering method, which combined the conventional dynamic clustering and discrimination analysis. The parameters of different clusters were estimated by the maximum likelihood method implemented via expectation-maximization (EM) algorithm and the objects were classified by the Bayesian posterior probability. This classified idea could increase the posterior confidence of classified individuals. The results of simulation studies showed that the proposed method not only unbiasedly estimated the corresponding cluster parameters but also differentiated the optimum clustering numbers by Bayesian information criterion (BIC). Compared with the K-means method and the minimum square sum within groups (MinSSw) method, the proposed method was more robustness and had almost the same clustering accuracy as K-means and MinSSw methods. Moreover, the misclassified rate (MR) could be reduced by enhancing the discrimination criterion. However, the unclassified rate (UR) would be increased by enhancing the discrimination criterion. Thus, an eclectic discrimination criterion could be given by the user in order to decrease both MR and UR. The result indicated that the proposed method had a significant advantage on clustering accuracy compared to the K-means and MinSSw methods. An example of the plant height and the number of tiller of F2 population in rice cross Duonieai×Zhonghua 11 was used in the illustration. The results listed in Table 6 indicated that the genetic difference of these two traits in this cross involves only one pleiotropic major gene. The additive effect and dominance effect of the major gene were estimated as -24.57 cm and 57.12 cm on plant height, and 23.01 and -25.89 on number of tiller, respectively. The major gene shows overdominance for plant height and near complete dominance for number of tillers.

Key words: Cluster analysis, Posterior probability, Bayesian information criterion, Discrimination analysis

[1] 张以忠, 曾文艺, 邓琳琼, 张贺翠, 刘倩莹, 左同鸿, 谢琴琴, 胡燈科, 袁崇墨, 廉小平, 朱利泉. 甘蓝S-位点基因SRKSLGSP11/SCR密码子偏好性分析[J]. 作物学报, 2022, 48(5): 1152-1168.
[2] 张贵合,郭华春. 马铃薯不同品种(系)的光合特性比较与聚类分析[J]. 作物学报, 2017, 43(07): 1067-1076.
[3] 王瑞云,季煦,陆平,刘敏轩,许月,王纶,王海岗,乔治军. 利用荧光SSR分析中国糜子遗传多样性[J]. 作物学报, 2017, 43(04): 530-548.
[4] 胡一波,杨修仕,陆平*,任贵兴*. 中国北部藜麦品质性状的多样性和相关性分析[J]. 作物学报, 2017, 43(03): 464-470.
[5] 徐宁,陈冰嬬,王明海,包淑英,王桂芳,郭中校. 绿豆品种资源萌发期耐碱性鉴定[J]. 作物学报, 2017, 43(01): 112-121.
[6] 吴奇,周宇飞,高悦,张姣,陈冰嬬,许文娟,黄瑞冬. 不同高粱品种萌发期抗旱性筛选与鉴定[J]. 作物学报, 2016, 42(08): 1233-1246.
[7] 刘颖,张巧凤,付必胜,蔡士宾,蒋彦婕,张志良,邓渊钰,吴纪中,戴廷波. 小麦纹枯病抗源的遗传多样性及抗性基因位点SSR标记分析[J]. 作物学报, 2015, 41(11): 1671-1681.
[8] 李龙,王兰芬,武晶,景蕊莲,王述民. 普通菜豆品种苗期抗旱性鉴定[J]. 作物学报, 2015, 41(06): 963-971.
[9] 姜朋,陈小霖,张平平,张鹏,姚金保,马鸿翔. 宁麦9号对其衍生品种的遗传贡献[J]. 作物学报, 2014, 40(05): 830-837.
[10] 范伟,李雪姣,关明俐,缪刘杨,史佳楠,窦世娟,刘丽娟,李莉云,刘国振. 水稻几丁质酶基因的转录与表达特征[J]. 作物学报, 2014, 40(04): 571-580.
[11] 王艺陶,周宇飞,李丰先,依兵,白薇,闫彤,许文娟,高明超,黄瑞冬. 基于主成分和SOM聚类分析的高粱品种萌发期抗旱性鉴定与分类[J]. 作物学报, 2014, 40(01): 110-121.
[12] 唐梅,陈玉宁,任小平,黄莉,周小静,严海燕,姜慧芳. 源于栽培种花生的EST-SSR引物对野生花生扩增的多态性[J]. 作物学报, 2012, 38(07): 1221-1231.
[13] 盖红梅,李玉刚,王瑞英,李振清,王圣健,高峻岭,张学勇. 鲁麦14对山东新选育小麦品种的遗传贡献[J]. 作物学报, 2012, 38(06): 954-961.
[14] 刘金帅, 赖惠成, 贾振红. 基于YCbCr颜色空间和Fisher判别分析的棉花图像分割研究[J]. 作物学报, 2011, 37(07): 1274-1279.
[15] 王玲, 黄雯雯, 刘连盟, 傅强, 黄世文. 对中国南方部分籼型杂交水稻纹枯病抗性的评价[J]. 作物学报, 2011, 37(02): 263-270.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!