等位基因功能差异的统计遗传学分析及应用

引用本文

胡文明, 阚海华, 王伟, 徐辰武. 等位基因功能差异的统计遗传学分析及应用. 作物学报, 2014, 40(1): 72-79
HU Wen-Ming, KAN Hai-Hua, WANG Wei, XU Chen-Wu. Statistical Genetics Approach for Functional Difference Identification of Allelic Variations and Its Application. Acta Agronomica Sinica, 2014, 40(1): 72-79 复制到剪切板

Permissions

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

等位基因功能差异的统计遗传学分析及应用

胡文明^**, 阚海华^**, 王伟, 徐辰武^*

扬州大学 / 江苏省作物遗传生理重点实验室 / 教育部植物功能基因组学重点实验室, 江苏扬州 225009

^* 通讯作者(Corresponding author): 徐辰武, E-mail:qtls@yzu.edu.cn, Tel: 0514-87979358

**同等贡献

收稿日期:2013-07-25

基金:本研究由国家重点基础研究发展计划(973计划)项目(2011CB100100), 国家自然科学基金项目(31171187)和江苏高校“青蓝工程”科技创新团队项目资助。

摘要

等位基因的变异在各种生物中都是普遍存在的, 并对基因的表达起着重要的调控作用。为了探索关联分析中品种数目(A)、平均等位基因多态信息含量(B)和候选基因总贡献率(C)对候选基因分析结果的影响, 本研究采用经验贝叶斯(E-Bayes)方法探讨了上述因素对候选基因检测功效、遗传效应估计值的准确度和精确度以及假阳性出现频率等的影响。结果表明: (1) 随着A、B和C的增加, 候选基因的检测功效和效应估计值的准确度和精确度明显提高, 假阳性出现的频率降低。(2) B对检测功效有显著的影响。在B值保持较高的水平时, 即使品种的数目保持较低的水平以及候选基因的总贡献率较低时, 平均检测功效也可达到80%; 当B值为中等水平时, 需要较大品种数目才能使平均统计功效超过80%; 当B值较小时, 品种数目即使达到100, 3种贡献率水平下的统计功效最高也未达到50%。(3) B对候选基因效应估计值的准确度和精确度有显著的影响。随着B的增加, 候选基因效应估计的准确度和精确度增加。(4) B因素对假阳性频率也有显著影响。在实例分析中检测到4个基因与稻米糊化温度显著关联。因此, 在进行等位基因功能差异的统计遗传学分析时等位基因多态性是主要的影响因素, 同时较多的品种数和较高的贡献率对候选基因的统计功效、效应估计值的准确度和精确度也有重要影响。

关键词: 等位变异; 超饱和模型; 变量选择; E-Bayes

Statistical Genetics Approach for Functional Difference Identification of Allelic Variations and Its Application

HU Wen-Ming^**, KAN Hai-Hua^**, WANG Wei, XU Chen-Wu^*

Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology / Key Laboratory of Plant Functional Genomics of Ministry of Education / Yangzhou University, Yangzhou 225009, China

**Contributed equally to this work

Abstract

Allelic variations are ubiquitous in organisms, and play important roles in regulating genes expression. In order to study the influence of number of varieties (A), average polymorphism information content (B) and total contribution of candidate genes (C) on the association analysis of candidate genes, the empirical Bayes (E-Bayes) method was applied to explore the effects of abovementioned three factors on the statistical power of candidate genes, the accuracy and precision of the estimates of genetic effects and the false discovery rate (FDR). Results were as follows: (1) With the increase of factors A, B, and C, the statistical power and the accuracy and precision of the estimates of genetic effects were all enhanced, meanwhile the FDR was decreased. (2) Factor B had a significant influence on the statistical power of candidate genes. When factor B was at a higher level, the ave- raged statistical power could still reach 80% even though both factors A and C remained at lower levels. When factor B was at a medium level, more varieties were needed to ensure that the statistical power could reach 80%. However, when factor B was at a lower level, even though factor A was equal to 100, the statistical power in three different levels of factor C could not reach 50%. (3) Factor B had a significant impact on the accuracy and precision of estimated effects of candidate genes. With the increase of factor B, both the accuracy and precision of effect estimates for candidate genes were improved simultaneously. (4) Factor B also had an important effect on FDR. Through a real data analysis in rice, four detected candidate genes were significantly associated with pasting temperature (PT) by our model. Therefore, the polymorphism information content is a primary factor for detecting the functional difference of alleles. In addition, more varieties and higher contribution rate also have important influence on the statistical power and the accuracy and precision of estimates of effects.

Keyword: Allelic variation; Oversaturated model; Variable selection; E-Bayes

Show Figures

基因是位于染色体上的一段DNA序列, 是生物遗传信息的载体。编码RNA或蛋白质的基因称为结构基因, 不编码RNA或蛋白质的基因称为调控基因。无论是结构基因还是调控基因的序列变异均可对生物体的表型造成影响。等位基因(allele)是位于同源染色体的相同位置上控制某一性状的不同形态的基因, 不同的等位基因各有自己特定的产物和表型。等位基因功能差异最早是在染色体印迹现象中发现的^{[ 1]}, 在玉米^{[ 2]}、草莓^{[ 3, 4]}、棉花^{[ 5]}、小麦^{[ 6, 7]}、水稻^{[ 8]}等植物中普遍存在。目前的主要任务是确定更多的等位基因差异与相应的表型多态性之间的联系, 挖掘优异的等位基因和优异的等位基因组合, 构建基因间的互作网络, 使生物学家和遗传学家对生物表型多态性的遗传本质有更深刻的认识。

上述研究虽定性分析了等位基因的差异, 但并未研究基因组中各遗传座位等位基因的联合效应对特定表型的影响。随着全基因组测序技术的日臻成熟以及成本的不断降低, 使得对较多数量基因座位和较多数目品种等位基因序列变异同时检测成为可能, 进而可应用统计学的方法分析等位基因功能差异对表型的影响, 即构建目标性状与候选基因间的遗传模型。其分析方法通常有两类, 第1类是基于连锁分析的QTL定位, 常用的分离群体是双亲杂交衍生, QTL定位方法可以分为标记回归(marker regression, MR)^{[ 9]}、区间作图(interval mapping, IM)^{[ 10]}、复合区间作图(composite interval mapping, CIM)^{[ 11]}、完备复合区间作图(inclusive composite interval mapping, ICIM)^{[ 12]}和多区间作图(multiple interval mapping, MIM)^{[ 13]}等。第2类是基于连锁不平衡理论的关联分析, 分离群体是自然群体, 除了分子标记外, 模型中还包含群体结构或/和亲缘关系协变量。QTL定位软件和关联分析软件, 大多采用逐点扫描, 然后根据一定的测验统计量对每一标记(主效应)或标记组合(互作效应)与表型的关联进行检验。另一种检验表型与标记间关联的方法是同时构建表型值与所有标记间的线性模型, 这类模型中自变量个数远远大于样本数, 属于超饱和模型。常用以下4种方法估计超饱和模型参数: (1)惩罚最大似然方法^{[ 14]}(maximum penalized likelihood method, PENAL), 该方法是将所有参数的联合先验分布作为惩罚因子, 与似然函数一起构成惩罚似然函数, 通过最大化惩罚似然函数估计QTL效应。(2)最小绝对缩减和变量选择算子^{[ 15]}(least absolute shrinkage and selection operator, LASSO), 是对回归系数加以一定约束条件的最小二乘法。常规最小二乘法(ordinary least square, OLS)对回归系数的估计是通过最小化误差平方和实现的, 但当模型中自变量个数远远大于样本数时, OLS无法估计模型参数。LASSO算法通过施加回归系数绝对值之和小于一个给定的常数约束条件, 从而压缩回归系数, 在此基础上, 再估计OLS参数。(3)随机搜索变量选择法(stochastic search variable selection, SSVS)最初是用来处理线性回归模型的, 后来逐步运用到一些比较复杂的模型, 如广义线性模型^{[ 16]}和对数线性模型^{[ 17]}。2003年, Yi等^{[ 18]}进一步改进了SSVS方法, 使之能够定位多个QTL。(4)逐步回归(stepwise regression method)是多元线性回归分析中较常采用的一种变量选择方法。它的主要思路是将全部自变量按其对依变量作用的显著程度由大到小地逐个引入回归方程, 分为向前选入法(forward)、向后剔除法(backward)和选入与剔除交替进行的逐步回归法(stepwise)。除此之外, 2007年 Xu^{[ 19]}提出一种不依赖于MCMC (markov chain monte carlo)抽样技术的经验贝叶斯方法(E-Bayes), 并成功应用于大麦DH群体有关农艺性状主效应和上位性效应的分析^{[ 20]}。

增加群体大小、减小表型误差、创造近等基因系和染色体片段代换系以及适当增加标记密度都是提高连锁分析检测功效的有效途径^{[ 21, 22]}。本研究将E-Bayes思想用于自然群体等位基因的功能差异分析, 研究品种数目、候选基因的贡献率以及等位基因多态信息含量3个因素对关联分析中候选基因的统计功效、参数估计值的准确度和精确度以及假阳性出现频率的影响, 并进一步探讨其适应条件, 以便为利用种质资源挖掘优异基因提供技术支撑。

1 原理与方法

1.1 变量编码

设候选基因的数目为 g, 第 i个候选基因具有 a_i+1 ( a_i≥0)个等位基因, 则该候选基因内的变异具有 a_i个自由度, 如以线性模型表示, 相当于 a_i个独立变量, g个基因座位共有独立变量个, 独立变量的编码见表1。

表1 第 i个候选基因的 a_i+1个等位基因满秩编码 Table 1 Coding for a_i independent variables of a_i+1 alleles in candidate gene i

1.2 统计模型及变量编码

以 y_j表示群体中第 j ( j=1, 2, …, n)个品种 n_j个个体表型值的平均值, 则 y_j具有如下统计遗传模型:

(1)

其中, b₀表示群体均值, x_mj表示第 j个品种在第 m ( m=1, 2, …, M)个变量上的编码值, x_kj和 x_lj的定义同 x_mj; x_cj表示第 j个品种在第 i个候选基因座位内的第 c ( c=1, 2, …, a_i)个独立变量的编码, x_dj定义同 x_cj。 b_m表示第 m个变量的主效应, b_kl表示第 k个独立变量和第 j个独立变量之间的互作效应, b_cd表示同一候选基因座位内的 a_i个独立变量中第 c个变量和第 d个变量之间的互作效应。由于同一基因座位内独立变量之间的互作效应并不存在, 因此在模型(1)中需要将之删除。例如, 假设有 A、B两个候选基因, 分别具有 A₁和 A₂以及 B₁、 B₂和 B₃个等位基因, 则 A基因有1个独立变量, 编码为 x₁, B基因有2个独立变量, 编码为 x₂和 x₃, 则(1)式可以表示为:

e_j为剩余误差, 并假定其遵循平均数为0, 方差为σ²/n_j的正态分布。

1.3 统计分析方法

模型(1)通常是一个超饱和模型, 需采用超饱和模型分析方法统计分析。本文主要采用Xu^{[ 19]}提出的E-Bayes思路估计参数。E-Bayes方法是一种不依赖于MCMC抽样但可以用于超饱和模型参数估计的方法。该方法在进行贝叶斯分析之前, 先运用极大似然估计得到所有的方差分量, 然后将各方差分量作为回归系数先验分布的有关参数, 再利用最优线性无偏估计的方法得到回归系数的后验分布, 最后通过分析后验分布的性质得到参数的估计值。

1.4 模拟设置

假设某自然群体包含若干个品种, 为了简化起见并不失一般性, 假定基因组上分布50个候选基因座位, 每个基因座位上只有2个等位基因, 则每个基因座位仅有1个独立变量。共有50×(50+1)/2=1275个可能效应, 其中包含50个主效, 50×(50-1)/2=1225个互作效应。在1275个可能效应中, 随机设置8个遗传效应, 包括主效应4个, 主效应位点间的互作效应1个, 非主效应位点间的互作效应1个, 主效应位点与非主效应位点间的互作效应2个。具体的效应大小和位置设置见表2。

表2 单个候选基因的贡献率 Table 2 Contribution of each candidate gene

共设置3个试验因素, 其中因素A为品种数目, 有4个水平, 分别为A1=30、A2=50、A3=70和A4=100, 每一品种考察的株数为20。因素B为平均多态信息含量, 多态信息含量是供试品种间候选基因座位等位性变异大小的度量。

(2)

公式(2)中, PIC_l为第 l座位的PIC值, p_lu和 p_lv分别为第 l座位内等位基因 u和等位基因 v的频率, k为该座位等位基因总数, 本研究每一候选基因的 k均为2, 设置等位基因 u的频率为5个水平, 即0.5、0.6、0.7、0.8和0.9, 相应的等位基因 v的频率为0.5、0.4、0.3、0.2和0.1, 用PIC_Calc0.6软件计算PIC值, 由此计算的PIC值共有5个水平, 分别为B1=0.1638、B2=0.2638、B3=0.3318、B4=0.3648和B5=0.3750。因素C为候选基因的总贡献率, 设3个水平, 分别为C1=30%、C2=50%和C3=70%。全试验共4×5×3=60个处理, 每一处理重复模拟100次。贡献率的计算公式见文献^{[ 19]}。

1.5 考察指标

(1)候选基因的统计功效。以100个重复样本中检测到候选基因样本的出现频率来表示。(2)候选基因效应估计值的准确度和精确度。前者以达到显著的若干样本相应候选基因效应估计值的平均值度量, 后者以达到显著的若干样本相应候选基因效应估计值的标准差度量。(3)假阳性候选基因的出现频率。以100个重复样本中出现的所有假阳性候选基因的次数总和表示。

2 结果与分析

2.1 候选基因的统计功效

从图1可以看出, (1)随着供试品种数目、PIC和候选基因总贡献率的增大, 候选基因的统计功效呈逐步上升趋势。但不同因素对统计功效的影响大小有别, 影响最大的是A因素, 其次为B因素, 再次为C因素。A因素对统计功效的影响最大, 无论B因素取何种水平, 随着A水平的增加, 统计功效迅速提高, 当A因素取较低的水平时, B因素的各水平间差异不明显, 当A因素取最高水平时, B因素各水平间的差异明显, 尤以B1、B2和B3间的差异最大。(2) B值较高时(例如B5 = 0.375), 品种数目较少(例如70), 候选基因的总贡献率即使较低(例如30%), 平均统计功效也可达到80%; 在此基础上增加品种数目至100统计功效便可增加至90%以上。(3)当B值为中等水平时(例如B3 = 0.3318), 品种数目需要达到100平均统计功效才能超过80%。(4)当B值较小时(例如B1 = 0.1638), 品种数目即使达到100, 3种贡献率水平下的统计功效最高也未达到50%。

Figure Option
View Download New Window

图1 品种数目、多态信息含量和候选基因总贡献率对统计功效的影响a~c中○、△、+、×和◇表示B1、B2、B3、B4和B5水平; d~k中○、△、+和×表示A1~A4水平。Fig. 1 Statistic power impacted by number of variants, PIC and total contribution rateIn a-c, ○, △, +, ×, and ◇ represent B1, B2, B3, B4, and B5, respectively; and in d-k, ○, △, +, and ×represent A1-A4, respectively.

以上结果说明, 在选取品种时要首先保证有足够的品种数目, 其次每一候选基因内的等位变异要尽可能大, 才能有比较好的检测效果, 当品种数较少和等位变异较小, 特别是绝大多数品种都携带同一种等位基因时, 即使该候选基因对表现型有遗传效应, 也不容易被发现。此外, 从本模拟研究可以看出, 只要PIC不太小, 通常70个品种就可保证有较大把握发现中等贡献的候选基因。

2.2 候选基因效应估计值的准确度和精确度

从图2可以看出, 候选基因的总贡献率越高, 品种数目越多, PIC值越大, 候选基因的效应估计值愈为准确和精确。但相对而言, PIC值对候选基因的准确度和精确度影响更大, 其次为品种数目, 总贡献率的影响相对较小。例如, 当候选基因的总贡献率为30%时, 候选基因10在同一品种数目水平下随着PIC值的增加其准确度和精确度明显逐步提高。当候选基因的总贡献率为70%时也有同样的变化趋势。但这两种贡献率下候选基因估计值的平均值和标准误却相差不大。

Figure Option
View Download New Window

图2 C1~C3水平下候选基因效应的估计值及其标准差○、△、+、×和◇表示PIC值在B1~B5水平下效应估计值；竖线表示相应估计值的标准差。Fig.2 Means and standard deviation of each candidate gene effect under C1~C3 levels○, △,+,× and ◇ indicate estimates under B1-B5 levels of PIC. Vertical bar represents standard deviations of corresponding estimates.

2.3 假阳性候选基因的出现次数

在本研究模拟设置下, 假阳性候选基因只出现在PIC极小的B1和B2两个水平下, 且B1水平下的假阳性候选基因的出现次数明显多于B2水平(表3), 因此, PIC是影响假阳性候选基因出现次数的主要因素。

表3 假阳性候选基因的出现次数 Table 3 Frequency of the spurious candidate genes

3 实例数据分析

实例数据见文献[8]。供试水稻品种118个。选取在稻米胚乳中表达的与淀粉合成相关的18个基因作为目的基因, 根据日本晴、9311、桂朝2号等13个在品质性状上具有代表性的品种的所有品质相关基因的基因组测序结果, 设计43个分子标记。对以上118个水稻品种进行了基因型检测, 目标性状为RVA黏滞性特性。本文仅以糊化温度(pasting temperature, PT)数据为例分析。利用E-Bayes方法共检测到4个标记与稻米糊化温度显著关联(如图3所示), 分别为标记AGPlar-1、SSII3-2、Pul-1和SSII3-3。其中Pul-1和SSII3-3既有主效应, 又有互作效应; 主效应大小分别为-2.12和1.11, 互作效应大小为-1.26。而AGPlar-1和SSII3-2则仅存在互作效应, 效应大小为1.10。

	Figure Option View Download New Window
	图3 稻米糊化温度(PT)候选基因及其效应Fig. 3 Analysis result of candidate genes controlling pasting temperature (PT) and their effects

4 讨论

基因变异在各种生物中是普遍存在的, 其对基因的表达起着重要的调控作用。若变异发生在非编码区, 可以通过增强或减弱该变异位点与结合蛋白的结合, 促进或削弱下游基因的表达, 如果变异发生在编码区, 则该变异的等位基因可能通过编码功能蛋白直接影响表型的大小或编码一种蛋白因子间接地促进或阻碍性状的大小。随着分子生物学研究以及基因组测序技术的发展, 目前我们可以很方便地对有关基因测序从而得到大量序列变异信息, 建立这些变异位点与表型之间的联系已成为目前遗传学研究的热点。本研究设置的可能效应数目与品种数目之比最大为42.5(1275/30), 最小为12.75 (1275/100), 这种可能效应数目远大于品种数目的超饱和模型, 一般可以采用变量选择和压缩估计等方法来处理, Xu^{[ 19]}提出的E-Bayes方法结合了极大似然估计(MLE)和BLUP两种统计方法, 其主要目的是估计回归系数, 而不是方差, 此外E-Bayes方法同时估计所有回归系数的后验平均值, 因此每个回归系数的估计都是独立的, 从而提高了对小样本的估计效率。本文模拟研究了E-Bayes方法, 并取得了较好的分析结果。若采用TASSEL关联分析软件则只能分析基因座位主效应, 无法分析基因座位间的互作效应, 4个主效候选基因的功效分别为94%、93%、67%和57%。而Stepwise是模型选择最直接的一种方法, 运用该方法运算时, 模型收敛的速度比较快, 对于该处理设定的8个效应均能检测到, 统计功效也较高, 但与E-Bayes方法相比, 效应的精确度略低。LASSO和贝叶斯有相似之处, 但其同样不依赖MCMC抽样, 这使得其运算时间大大缩短, 运用LASSO模拟时, 设定的8个效应均能被检测到, 但假阳性较高, 且效应的准确度与精确度均较低。PENAL方法由于其不依赖MCMC抽样, 而是采用极大似然估计的思路, 所以运算速度较快, 但是对于该处理检测功效并不理想, 只检测到4个效应较大的候选基因, 且功效较低, 效应估计值的准确度和精确度较差, 这可能是模型变量远远超过样本容量之故。本研究表明, 候选基因(或基因间的互作)能否被检测到以及被检测到的频率均与该候选基因的贡献率有关, 贡献率越大, 被检测到的概率越高, 例如, 总贡献率为70%时, 贡献率较小的4个效应在品种数目为30时均没有被检测到, 而贡献率最大的候选基因10和20在5种PIC水平下却均能被检测到。本研究构建的模型具有较高的检出率, 当平均等位基因多态性不太小时, 只需要70个个体就能够检测到具有中等贡献率的基因。因此, 进行等位基因功能差异的统计遗传学分析时, 等位基因多态性是主要的影响因素, 同时较多品种的数目和较高的贡献率对候选基因的统计功效、效应估计值的准确度和精确度也有重要影响。

本文提出的模型适用于亲缘关系较远和群体结构不明显的自然群体。利用该模型进行关联分析具有明显的优势, 不仅检出率较高, 而且可以分析基因座位之间的互作。另外, 该模型是对所有的基因座位同时分析, 随机误差降低, 因此, 关联分析的准确度和精确度都较高。而常用的QTL定位方法和关联分析方法一般是对单个基因座位逐个检测, 导致误差效应与大量的微效多基因效应混合, 每次检测时误差不统一, 准确度和精确度相对较低。因此, 本文基于全基因组所有座位进行变量筛选的模型选择方法具有较高的利用价值。下一步的工作设想是对模型作进一步改进, 考虑消除群体结构和亲缘关系对分析结果的影响, 使之适用于任何的自然群体, 进一步提高模型的适用范围。

5 结论

进行等位基因功能差异的统计遗传学分析时, 等位基因多态性信息含量是主要的影响因素, 同时较多品种的数目和较高的贡献率对候选基因的统计功效、效应估计值的准确度和精确度也有重要影响。

The authors have declared that no competing interests exist.

作者已声明无竞争性利益关系。

参考文献

View Option

[1s]	Galton F. Regression towards mediocrity in hereditary stature. J Anthropol Inst Great Brit Ireland , 1886, 15: 246-263 [本文引用:1]
[2]	Guo M, Yang S, Rupe M, Hu B, Bickel D R, Oscar L A. Genomewide allele-specific expression analysis using Massively Parallel Signature Sequencing (MPSS™) reveals cis- and trans-effects on gene expression in maize hybrid meristem tissue. Plant Mol Biol, 2008, 66: 551-563 [本文引用:1] [JCR: 4.15]
[3]	Schaart J G, Mehli L, Schouten H J. Quantification of allele- specific expression of a gene encoding strawberry polygalacturonase-inhibiting protein (PGIP) using pyrosequencing. Plant J, 2005, 41: 493-500 [本文引用:1] [JCR: 6.16]
[4]	Yoon M Y, Moe K T, Kim D Y, Rho I R, Kim S, Kim K T, Won M K, Chung J W, Park Y J. Genetic diversity and population structure analysis of strawberry (Fragaria × ananassa Duch. ) using SSR markers. Electr J Biotechnol, 2012, 15(2): 5 [本文引用:1]
[5]	Adams K L, Cronn R, Percifield R, Wendel J F. Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc Natl Acad Sci USA, 2003, 100: 4649-4654 [本文引用:1] [JCR: 9.737]
[6]	Kolev S, Ganeva G, Christov N, Belchev I, Kostov K, Tsenov N, Rachovska G, Land geva S, Ivanov M, Abu-Mhadi N, Todorovska E. Allele variation in loci for adaptine response and plant height and its effect on grain yeild in wheat. Biotechnol Biotechnol Equip, 2010, 24: 1807-1813 [本文引用:1] [JCR: 0.76]
[7]	Kolev S, Vassilev D, Kostov K, Todorovska E. Allele variation in loci for adaptive response in Bulgarian wheat cultivars and land races and its effect on heading date. Plant Genet Resour Char Util, 2011, 9: 251-255 [本文引用:1]
[8]	谢会兰. 水稻淀粉合成相关基因分子标记的建立及其遗传网络初步探析. 扬州大学硕士学位论文. 2007 Xie H L. The Foundation of Molecular Markers Correlated with Rice Starch and Preliminary Detection of Its Genetic Network. MS Thesis of Yangzhou University, 2007 (in Chinese with English abstract) [本文引用:1]
[9]	Soller M, Brody T. On the power of experimental designs for the detection of linkage between marker loci and quantitative loci in crosses between inbred lines. Theor Appl Genet, 1976, 47: 35-39 [本文引用:1] [JCR: 3.297]
[10]	Land er E S, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 1989, 121: 185-199 [本文引用:1] [JCR: 4.007]
[11]	Zeng Z B. Precision mapping of quantitative trait loci. Genetics, 1994, 136: 1457-1468 [本文引用:1] [JCR: 4.007]
[12]	Li H H, Ye G Y, Wang J K. A modiﬁed algorithm for the improvement of composite interval mapping. Genetics, 2007, 175: 361-374 [本文引用:1] [JCR: 4.007]
[13]	Zeng Z B, Kao C, Basten C J. Estimating the genetic architecture of quantitative traits. Genet Res, 2000, 74: 279-289 [本文引用:1] [JCR: 1.712]
[14]	Zhang Y M, Xu S. A penalized maximum likelihood method for estimating epistatic effects of QTL. Heredity, 2005, 95: 96-104 [本文引用:1] [JCR: 4.597]
[15]	Cohen R A. Introducing the glmselect procedure for model selection. Statist & Data Anal, 31: 207-231 [本文引用:1] [JCR: 0.724]
[16]	Robin M, David D. Two-level stochastic search variable selection in GLMs with missing predictors. Int J Biostat, 2010, 6(1): 33 [本文引用:1] [JCR: 1.284]
[17]	Ntzoufras I, Forster J J, Dellaportas P. Stochastic search variable selection for log-linear models. J Stat Comput Sim, 2000, 68: 23-37 [本文引用:1] [JCR: 0.497]
[18]	Yi N, George V, Allison D B. Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics, 2003, 164: 1129-1138 [本文引用:1] [JCR: 4.007]
[19]	Xu S. An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics, 2007, 63: 513-521 [本文引用:4] [JCR: 1.827]
[20]	Xu S, Jia Z. Genomewide analysis of epistatic effects for quantitative traits in barley. Genetics, 2007, 175: 1955-1963 [本文引用:1] [JCR: 4.007]
[21]	Li H H, Hearne S, Bänziger M, Li Z, Wang J. Statistical properties of QTL linkage mapping in biparental genetic populations. Heredity, 2010, 105: 257-267 [本文引用:1] [JCR: 4.597]
[22]	李慧慧, 张鲁燕, 王建康. 数量性状基因定位研究中若干常见问题的分析与解答. 作物学报, 2010, 36: 918-931 Li H H, Zhang L Y, Wang J K. The analysis and solution of some common questions in quantitative traits QTL mapping. Acta Agron Sin, 2010, 36: 918-931 (in Chinese with English abstract) [本文引用:1] [CJCR: 1.8267]

1886

0.0

... 等位基因功能差异最早是在染色体印迹现象中发现的^[1], 在玉米^[2]、草莓^[3,4]、棉花^[5]、小麦^[6,7]、水稻^[8]等植物中普遍存在 ...

2008

4.15

0.0

Plant Mol Biol. 2008, 66(5):551 - 563 DOI:10.1007/s11103-008-9290-z

Genome-wide allele-specific expression analysis using Massively Parallel Signature Sequencing (MPSS?) Revealscis- andtrans -effects on gene expression in maize hybrid meristem tissue

Mei Guo(1) Sean Yang(12) Mary Rupe(1) Bin Hu(1) David R. Bickel(1) Lane Arthur(13) Oscar Smith(1)

1.Pioneer Hi-Bred International, Inc., A DuPont Business 7300 NW 62nd Avenue Johnston IA 50131-0552 USA
2.Monsanto St. Louis MO 63167 USA
3.DuPont Central Research & Development Wilmington DE 19880 USA

Allelic differences in expression are important genetic factors contributing to quantitative trait variation in various organisms. However, the extent of genome-wide allele-specific expression by different modes of gene regulation has not been well characterized in plants. In this study we developed a new methodology for allele-specific expression analysis by applying Massively Parallel Signature Sequencing (MPSS™), an open ended and sequencing based mRNA profiling technology. This methodology enabled a genome-wide evaluation ofcis- andtrans-effects on allelic expression in six meristem stages of the maize hybrid. Summarization of data from nearly 400 pairs of MPSS allelic signature tags showed that 60% of the genes in the hybrid meristems exhibited differential allelic expression. Because both alleles are subjected to the sametrans-acting factors in the hybrid, the data suggest the abundance ofcis-regulatory differences in the genome. Comparing the same allele expressed in the hybrid versus its inbred parents showed that 40% of the genes were differentially expressed, suggesting differenttrans-acting effects present in different genotypes. Suchtrans-acting effects may result in gene expression in the hybrid different from allelic additive expression. With this approach we quantified gene expression in the hybrid relative to its inbred parents at the allele-specific level. As compared to measuring total transcript levels, this study provides a new level of understanding of different modes of gene regulation in the hybrid and the molecular basis of heterosis.

... 等位基因功能差异最早是在染色体印迹现象中发现的^[1], 在玉米^[2]、草莓^[3,4]、棉花^[5]、小麦^[6,7]、水稻^[8]等植物中普遍存在 ...

2005

6.16

0.0

... 等位基因功能差异最早是在染色体印迹现象中发现的^[1], 在玉米^[2]、草莓^[3,4]、棉花^[5]、小麦^[6,7]、水稻^[8]等植物中普遍存在 ...

2012

0.0

... 等位基因功能差异最早是在染色体印迹现象中发现的^[1], 在玉米^[2]、草莓^[3,4]、棉花^[5]、小麦^[6,7]、水稻^[8]等植物中普遍存在 ...

2003

9.737

0.0

PNAS. 2003, 100(8):4649-4654

Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing

Keith L. Adams*, Richard Cronn†, Ryan Percifield*, and Jonathan F. Wendel*‡

^*Department of Botany, Iowa State University, Ames, IA 50011; and^†U.S. Department of Agriculture Forest Service, Pacific Northwest Research Station, Corvallis, OR 97331

Most eukaryotes have genomes that exhibit high levels of gene redundancy, much of which seems to have arisen from one or morecycles of genome doubling. Polyploidy has been particularly prominent during flowering plant evolution, yielding duplicatedgenes (homoeologs) whose expression may be retained or lost either as an immediate consequence of polyploidization or on anevolutionary timescale. Expression of 40 homoeologous gene pairs was assayed by cDNA-single-stranded conformation polymorphismin natural (1- to 2-million-yr-old) and synthetic tetraploid cotton (Gossypium) to determine whether homoeologous gene pairs are expressed at equal levels after polyploid formation. Silencing or unequalexpression of one homoeolog was documented for 10 of 40 genes examined in ovules ofGossypium hirsutum. Assays of homoeolog expression in 10 organs revealed variable expression levels and silencing, depending on the gene andorgan examined. Remarkably, silencing and biased expression of some gene pairs are reciprocal and developmentally regulated,with one homoeolog showing silencing in some organs and the other being silenced in other organs, suggesting rapid subfunctionalization.Duplicate gene expression was examined in additional natural polyploids to characterize the pace at which expression alterationevolves. Analysis of a synthetic tetraploid revealed homoeolog expression and silencing patterns that sometimes mirrored thoseof the natural tetraploid. Both long-term and immediate responses to polyploidization were implicated. Data suggest that somesilencing events are epigenetically induced during the allopolyploidization process.

... 等位基因功能差异最早是在染色体印迹现象中发现的^[1], 在玉米^[2]、草莓^[3,4]、棉花^[5]、小麦^[6,7]、水稻^[8]等植物中普遍存在 ...

2010

0.76

0.0

... 等位基因功能差异最早是在染色体印迹现象中发现的^[1], 在玉米^[2]、草莓^[3,4]、棉花^[5]、小麦^[6,7]、水稻^[8]等植物中普遍存在 ...

2011

0.0

... 等位基因功能差异最早是在染色体印迹现象中发现的^[1], 在玉米^[2]、草莓^[3,4]、棉花^[5]、小麦^[6,7]、水稻^[8]等植物中普遍存在 ...

2007

0.0

... 等位基因功能差异最早是在染色体印迹现象中发现的^[1], 在玉米^[2]、草莓^[3,4]、棉花^[5]、小麦^[6,7]、水稻^[8]等植物中普遍存在 ...

1976

3.297

0.0

... 其分析方法通常有两类, 第1类是基于连锁分析的QTL定位, 常用的分离群体是双亲杂交衍生, QTL定位方法可以分为标记回归(marker regression, MR)^[9]、区间作图(interval mapping, IM)^[10]、复合区间作图(composite interval mapping, CIM)^[11]、完备复合区间作图(inclusive composite interval mapping, ICIM)^[12]和多区间作图(multiple interval mapping, MIM)^[13]等 ...

1989

4.007

0.0

1994

4.007

0.0

2007

4.007

0.0

2000

1.712

0.0

2005

4.597

0.0

. 2005, 95(1):96-104

A penalized maximum likelihood method for estimating epistatic effects of QTL

Kunyan Liu¹ ,Yufeng Jing¹ ,Chuanrang Zhu¹ ,Xinghua Wei³ ,Bin Han^1,2 ,Marco A Oropeza-Rosas¹ ,Namiko Satoh Nagasawa^1,2 ,Edward S Buckler^2,3,5,7 ,Qi Feng¹ ,Randall J Wisser⁴ ,Qian Qian³ ,Araby R Belcher⁵ ,Doreen Ware³ ,Stephen Kresovich³ ,Yan Zhao¹ ,Meng Li^5,6 ,Ahong Wang¹ ,Canyang Li¹ ,Zhiwu Zhang⁵ ,Tao Huang¹ ,Tingting Lu¹ ,Wei Li¹ ,Peter J Bradbury^2,3 ,Peter J Balint-Kurti^5,7 ,Tao Sang⁴ ,Taoying Zhou¹ ,John C Zwonitzer⁵ ,Qi-Fa Zhang⁸ ,Liuwei Deng¹ ,Wenjun Li¹ ,James B Holland^1,7 ,David Jackson¹ ,Qiang Zhao^1,2 ,Jiayang Li⁹ ,Lu Wang¹ ,Xuehui Huang^1,2 ,Michael D McMullen⁶ ,Yiqi Lu¹ ,Yunli Guo¹ ,Kristen L Kump¹ ,Peter Bommert¹ ,Qijun Weng¹ ,Danlin Fan¹ ,Zhang Lin¹

¹ Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.
² Present address: Laboratory of Plant Genetics and Breeding, Akita Prefectural University, Akita, Japan.
³ State Key Laboratory of Rice Biology, China National Rice Research Institute, Chinese Academy of Agricultural Sciences, Hangzhou, China.
⁴ Department of Plant Biology, Michigan State University, East Lansing, Michigan, USA.
⁵ Institute for Genomic Diversity, Cornell University, Ithaca, New York, USA.
⁶ National Center for Soybean Improvement, State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Agriculture, Nanjing Agricultural University, Nanjing, China.
⁷ US Department of Agriculture–Agricultural Research Service, Ithaca, New York, USA.
⁸ National Key Laboratory of Crop Genetic Improvement, National Center for Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China.
⁹ National Center for Plant Gene Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.

An official journal of the Genetics Society, Heredity publishes high-quality articles describing original research and theoretical insights in all areas of genetics. Research papers are complimented by News & Commentary articles and reviews, keeping researchers and students abreast of hot topics in the field.

... 常用以下4种方法估计超饱和模型参数: (1)惩罚最大似然方法^[14](maximum penalized likelihood method, PENAL), 该方法是将所有参数的联合先验分布作为惩罚因子, 与似然函数一起构成惩罚似然函数, 通过最大化惩罚似然函数估计QTL效应 ...

0.724

0.0

... (2)最小绝对缩减和变量选择算子^[15](least absolute shrinkage and selection operator, LASSO), 是对回归系数加以一定约束条件的最小二乘法 ...

2010

1.284

0.0

... (3)随机搜索变量选择法(stochastic search variable selection, SSVS)最初是用来处理线性回归模型的, 后来逐步运用到一些比较复杂的模型, 如广义线性模型^[16]和对数线性模型^[17] ...

2000

0.497

0.0

2003

4.007

0.0

... 2003年, Yi等^[18]进一步改进了SSVS方法, 使之能够定位多个QTL ...

2007

1.827

0.0

. 2007, 63(2):513-521

An Empirical Bayes Method for Estimating Epistatic Effects of Quantitative Trait Loci

Shizhong Xu

Department of Botany and Plant Sciences, University of California, Riverside, Riverside, California 92521, U.S.A.email:xu@genetics.ucr.edu

Summary The genetic variance of a quantitative trait is often controlled by the segregation of multiple interacting loci. Linear model regression analysis is usually applied to estimating and testing effects of these quantitative trait loci (QTL). Including all the main effects and the effects of interaction (epistatic effects), the dimension of the linear model can be extremely high. Variable selection via stepwise regression or stochastic search variable selection (SSVS) is the common procedure for epistatic effect QTL analysis. These methods are computationally intensive, yet they may not be optimal. The LASSO (least absolute shrinkage and selection operator) method is computationally more efficient than the above methods. As a result, it has been widely used in regression analysis for large models. However, LASSO has never been applied to genetic mapping for epistatic QTL, where the number of model effects is typically many times larger than the sample size. In this study, we developed an empirical Bayes method (E-BAYES) to map epistatic QTL under the mixed model framework. We also tested the feasibility of using LASSO to estimate epistatic effects, examined the fully Bayesian SSVS, and reevaluated the penalized likelihood (PENAL) methods in mapping epistatic QTL. Simulation studies showed that all the above methods performed satisfactorily well. However, E-BAYES appears to outperform all other methods in terms of minimizing the mean-squared error (MSE) with relatively short computing time. Application of the new method to real data was demonstrated using a barley dataset.

... 除此之外, 2007年 Xu^[19]提出一种不依赖于MCMC (markov chain monte carlo)抽样技术的经验贝叶斯方法(E-Bayes), 并成功应用于大麦DH群体有关农艺性状主效应和上位性效应的分析^[20] ...

... 本文主要采用Xu^[19]提出的E-Bayes思路估计参数 ...

... 贡献率的计算公式见文献^[19] ...

... 75 (1275/100), 这种可能效应数目远大于品种数目的超饱和模型, 一般可以采用变量选择和压缩估计等方法来处理, Xu^[19]提出的E-Bayes方法结合了极大似然估计(MLE)和BLUP两种统计方法, 其主要目的是估计回归系数, 而不是方差, 此外E-Bayes方法同时估计所有回归系数的后验平均值, 因此每个回归系数的估计都是独立的, 从而提高了对小样本的估计效率 ...

2007

4.007

0.0

2010

4.597

0.0

. 2010, 105(3):257-267

Statistical properties of QTL linkage mapping in biparental genetic populations

... 增加群体大小、减小表型误差、创造近等基因系和染色体片段代换系以及适当增加标记密度都是提高连锁分析检测功效的有效途径^[21,22] ...

2010

0.0

1.8267

... 增加群体大小、减小表型误差、创造近等基因系和染色体片段代换系以及适当增加标记密度都是提高连锁分析检测功效的有效途径^[21,22] ...