Title: | Multi-Parent Population QTL Analysis |
---|---|
Description: | Analysis of experimental multi-parent populations to detect regions of the genome (called quantitative trait loci, QTLs) influencing phenotypic traits measured in unique and multiple environments. The population must be composed of crosses between a set of at least three parents (e.g. factorial design, 'diallel', or nested association mapping). The functions cover data processing, QTL detection, and results visualization. The implemented methodology is described in Garin, Wimmer, Mezmouk, Malosetti and van Eeuwijk (2017) <doi:10.1007/s00122-017-2923-3>, in Garin, Malosetti and van Eeuwijk (2020) <doi: 10.1007/s00122-020-03621-0>, and in Garin, Diallo, Tekete, Thera, ..., and Rami (2024) <doi: 10.1093/genetics/iyae003>. |
Authors: | Vincent Garin [aut, cre] , Valentin Wimmer [aut], Dietrich Borchardt [ctb, dtc], Fred van Eeuwijk [ctb, ths], Marcos Malosetti [ctb, ths] |
Maintainer: | Vincent Garin <[email protected]> |
License: | GPL-3 |
Version: | 1.5.0 |
Built: | 2024-11-06 04:58:07 UTC |
Source: | https://github.com/vincentgarin/mppr |
This function combines all raw data sources in a single data object of class
mppData
.
create.mppData( geno.off = NULL, geno.par = NULL, map = NULL, pheno = NULL, cross.ind = NULL, par.per.cross = NULL )
create.mppData( geno.off = NULL, geno.par = NULL, map = NULL, pheno = NULL, cross.ind = NULL, par.per.cross = NULL )
geno.off |
|
geno.par |
|
map |
Three columns |
pheno |
|
cross.ind |
|
par.per.cross |
Three columns |
a list
of class mppData
which contains the following elements
geno.off |
|
geno.par |
|
pheno |
|
map |
|
cross.ind |
Cross indicator. |
par.per.cross |
|
The list
also contain other arguments that will be filled later in
the data processing.
Vincent Garin
data(USNAM_geno) data(USNAM_map) data(USNAM_pheno) geno.off <- USNAM_geno[7:506, ] geno.par <- USNAM_geno[1:6, ] map <- USNAM_map pheno <- USNAM_pheno cross.ind <- substr(rownames(pheno), 1, 4) par.per.cross <- cbind(unique(cross.ind), rep("B73", 5), rownames(geno.par)[2:6]) mppData <- create.mppData(geno.off = geno.off, geno.par = geno.par, map = map, pheno = pheno, cross.ind = cross.ind, par.per.cross = par.per.cross)
data(USNAM_geno) data(USNAM_map) data(USNAM_pheno) geno.off <- USNAM_geno[7:506, ] geno.par <- USNAM_geno[1:6, ] map <- USNAM_map pheno <- USNAM_pheno cross.ind <- substr(rownames(pheno), 1, 4) par.per.cross <- cbind(unique(cross.ind), rep("B73", 5), rownames(geno.par)[2:6]) mppData <- create.mppData(geno.off = geno.off, geno.par = geno.par, map = map, pheno = pheno, cross.ind = cross.ind, par.per.cross = par.per.cross)
Partition the genotype indices into training and validation sets for cross-validation (CV).
CV_partition(cross.ind, k = 5)
CV_partition(cross.ind, k = 5)
cross.ind |
|
k |
|
The genotype indices are randomly assigned within cross to k subsets (folds). Then each subset is used once as validation set, the remaining data go in the training set.
Return:
fold |
|
Vincent Garin
data(mppData) part.cv <- CV_partition(cross.ind = mppData$cross.ind, k = 5) part.cv[[1]]$train.set part.cv[[1]]$val.set
data(mppData) part.cv <- CV_partition(cross.ind = mppData$cross.ind, k = 5) part.cv[[1]]$train.set part.cv[[1]]$val.set
Determine the connected parts of a MPP design using the method of Weeks and Williams (1964) and the package igraph.
design_connectivity(par_per_cross, plot_des = TRUE, output_loc = NULL)
design_connectivity(par_per_cross, plot_des = TRUE, output_loc = NULL)
par_per_cross |
Three columns |
plot_des |
|
output_loc |
Path where the plot of the design will be saved if the argument is given. Default = NULL. |
Return a list with each element representing one connected part of the design and the list of parents contained in this part.
If plot_des = TRUE
and output_loc
has been specified. A plot
of the graph (con_plot.pdf) will be saved at the specified location.
Vincent Garin
Weeks, D. L., & Williams, D. R. (1964). A note on the determination of connectedness in an N-way cross classification. Technometrics, 6(3), 319-324.
data(mppData) par_per_cross <- mppData$par.per.cross con.part <- design_connectivity(par_per_cross)
data(mppData) par_per_cross <- mppData$par.per.cross con.part <- design_connectivity(par_per_cross)
Determine the effect of environmental covariates (EC) on the mean of a trait across environments and the time range where this effect is the strongest. The procedure was originally proposed by Li et al. (2018).
EC_effect( trait_env_mean, crop_duration, EC_list, type, min_win = 20, sel_criteria = "global", plot = TRUE, plot_dir = NULL, p_title = "EC_plot", env_nm = NULL )
EC_effect( trait_env_mean, crop_duration, EC_list, type, min_win = 20, sel_criteria = "global", plot = TRUE, plot_dir = NULL, p_title = "EC_plot", env_nm = NULL )
trait_env_mean |
vector of trait mean over environment. |
crop_duration |
numerical value indicating the crop duration. |
EC_list |
list EC parameter matrix. one per environment. The order of the environment must be the same as the one of the trait mean. |
type |
character string vector indicating the type of statistic that correspond to the EC. Either the cumulated sum ('sum') or the average ('mean'). |
min_win |
Numerical value indicating the minimum size of the range between start and end day when the EC values are measured. Default = 20 |
sel_criteria |
Character specifying the selection criteria. Default = 'global' |
plot |
Logical value indicating if a plot of the EC effects over time should be returned. Default = FALSE, |
plot_dir |
Directory where the plot should be returned. Default = NULL |
p_title |
Title of the plot. Default = 'EC_plot' |
env_nm |
Optional vector of environment name. Default = NULL. |
Return:
data.frame
that contains the following elements for each EC (line).
The first line is the value of the EC in the different environments:
Start and end date of the optimal window
R2 of correlation between trait and EC
Direction of the correlation
Average R2 value over all tested windows
EC value in the different environments for the optimal time window
Vincent Garin
Li, X., Guo, T., Mu, Q., Li, X., & Yu, J. (2018). Genomic and environmental determinants and their interplay underlying phenotypic plasticity. Proceedings of the National Academy of Sciences, 115(26), 6679-6684.
mppData
objectsThe function first converts genotype data into ABH format. Then it calculates within cross identical by descent (IBD) probabilities.
IBD.mppData( mppData, het.miss.par = TRUE, subcross.ind = NULL, par.per.subcross = NULL, type, F.gen = NULL, BC.gen = NULL, type.mating = NULL, error.prob = 1e-04, map.function = "haldane" )
IBD.mppData( mppData, het.miss.par = TRUE, subcross.ind = NULL, par.per.subcross = NULL, type, F.gen = NULL, BC.gen = NULL, type.mating = NULL, error.prob = 1e-04, map.function = "haldane" )
mppData |
An object of class |
het.miss.par |
|
subcross.ind |
Optional |
par.per.subcross |
Optional three columns |
type |
|
F.gen |
|
BC.gen |
|
type.mating |
|
error.prob |
|
map.function |
|
The function first transforms genotype data into within cross ABH format. The function takes the parents of the different cross as reference and assigns the following scores: "A" if the offspring score is equivalent to parent 1; "B" if it is equivalent to parent 2; "H" if it is heterozygous. The function attributes NA for missing when: 1) the offspring score is missing; 2) the two parents have the same score; or 3) when at least one parental score is missing.
If a parent score is heterozygous or missing (het.miss.par = TRUE
),
the assignment rules are the following. If the two parents are
heterozygous or one parent is heterozygous and the other missing, the
offspring get NA since the parental origin can not be inferred with certainty.
If one parent is heterozygous or missing and the second parent is
homozygous, the function looks at offspring segregating pattern to
infer which allele was transmitted by the heterozygous parent. If this is
possible we consider the heteroxzygous parent as homozygous for the
transmitted allele and use this allele score for ABH assignment.
The ABH assignment can be performed using sub-cross structure providing
information about sub-cross in arguments subcross.ind
and
par.per.subcross
.
Then the function calculates the IBD probabilities using read.cross()
and calc.genoprob()
functions from the R/qtl package
(Broman et al. 2009).
The type of population must be specified in argument type. Different
population types are possible: F-type ('F'), back-cross ('BC'), backcross
followed by selfing ('BCsFt'), double haploid ('DH'), and recombinant imbred
lines ('RIL'). The number of F and BC generations can be specified using
F.gen
and BC.gen
. The argument type.mating
specifies if
F and RIL populations were obtained by selfing or by sibling mating.
DH and RIL populations are read as back-cross by R/qtl. For these two population types, heterozygous scores will be treated as missing values.
an increased mppData
object containing the the same elements
as the mppData
object provided as argument and the
following new elements:
geno.IBD |
A R/qtl |
n.zigo |
|
type |
|
Vincent Garin
Broman KW, Wu H, Sen S, Churchill GA (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19:889-890.
Broman, K. W., & Sen, S. (2009). A Guide to QTL Mapping with R/qtl (Vol. 46). New York: Springer.
create.mppData
, QC.mppData
,
IBS.mppData
data(mppData_init) mppData <- QC.mppData(mppData_init) mppData <- IBS.mppData(mppData = mppData) mppData <- IBD.mppData(mppData = mppData, het.miss.par = TRUE, type = 'RIL', type.mating = 'selfing')
data(mppData_init) mppData <- QC.mppData(mppData_init) mppData <- IBS.mppData(mppData = mppData) mppData <- IBD.mppData(mppData = mppData, het.miss.par = TRUE, type = 'RIL', type.mating = 'selfing')
mppData
objectsTransform the genotype marker matrix of a mppData
object into
Identical by state (IBS) 0, 1, 2 format. The IBS score represent the number
of copies of the minor allele.
IBS.mppData(mppData)
IBS.mppData(mppData)
mppData |
An object of class |
an increased mppData
object containing the the same elements
as the mppData
object provided as argument and the
following new elements:
geno.IBS |
Marker |
allele.ref |
|
Vincent Garin
data(mppData_init) mppData <- QC.mppData(mppData_init) mppData <- IBS.mppData(mppData = mppData)
data(mppData_init) mppData <- QC.mppData(mppData_init) mppData <- IBS.mppData(mppData = mppData)
Build a single position QTL incidences matrix.
inc_mat_QTL(x, mppData, Q.eff, order.MAF = FALSE, ref_par = NULL)
inc_mat_QTL(x, mppData, Q.eff, order.MAF = FALSE, ref_par = NULL)
x |
|
mppData |
An object of class |
Q.eff |
|
order.MAF |
|
ref_par |
Optional |
Return:
QTL.mat |
QTL incidence matrix. For the cross-specific model, it represents the difference between the number of allele from parent 2 or B and parent 1 or A divided by two. For parental (ancestral) model it represents the expected number of parental (ancestral) allele copies. For the bi-allelic model, it represents the number of copies of the least frequent allele. |
Vincent Garin
data(mppData) QTLmatCr <- inc_mat_QTL(x = 2, mppData = mppData, Q.eff = "cr") QTLmatPar <- inc_mat_QTL(x = 2, mppData = mppData, Q.eff = "par") QTLmatAnc <- inc_mat_QTL(x = 2, mppData = mppData, Q.eff = "anc") QTLmatBi <- inc_mat_QTL(x = 2, mppData = mppData, Q.eff = "biall")
data(mppData) QTLmatCr <- inc_mat_QTL(x = 2, mppData = mppData, Q.eff = "cr") QTLmatPar <- inc_mat_QTL(x = 2, mppData = mppData, Q.eff = "par") QTLmatAnc <- inc_mat_QTL(x = 2, mppData = mppData, Q.eff = "anc") QTLmatBi <- inc_mat_QTL(x = 2, mppData = mppData, Q.eff = "biall")
Performs a backward elimination using a list of given QTLs positions. The
positions with a p-value above the significance level alpha
, are
successively removed.
mpp_back_elim(mppData, trait = 1, QTL = NULL, Q.eff = "cr", alpha = 0.05)
mpp_back_elim(mppData, trait = 1, QTL = NULL, Q.eff = "cr", alpha = 0.05)
mppData |
An object of class |
trait |
|
QTL |
Object of class |
Q.eff |
|
alpha |
|
The function starts with all QTL positions in the model and test the inclusion
of each position as the last in the model. If all position p-values are below
alpha
the procedure stop. If not the position with the highest p-value
is remove and the procedure continue until there is no more unsignificant
position.
Return:
QTL |
|
Vincent Garin
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) QTL.sel <- mpp_back_elim(mppData = mppData, QTL = QTL)
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) QTL.sel <- mpp_back_elim(mppData = mppData, QTL = QTL)
Compute QTL models along the genome using cofactors representing other genetic positions for control.
mpp_CIM( mppData, trait = 1, Q.eff = "cr", cofactors = NULL, window = 20, plot.gen.eff = FALSE, n.cores = 1 )
mpp_CIM( mppData, trait = 1, Q.eff = "cr", cofactors = NULL, window = 20, plot.gen.eff = FALSE, n.cores = 1 )
mppData |
An object of class |
trait |
|
Q.eff |
|
cofactors |
Object of class |
window |
|
plot.gen.eff |
|
n.cores |
|
For more details about the different models, see documentation of the
function mpp_SIM
. The function returns a -log10(p-value) QTL
profile.
Return:
CIM |
|
Vincent Garin
# Cross-specific effect model ############################# data(mppData) SIM <- mpp_SIM(mppData = mppData, Q.eff = "cr") cofactors <- QTL_select(Qprof = SIM, threshold = 3, window = 20) CIM <- mpp_CIM(mppData = mppData, Q.eff = "cr", cofactors = cofactors, window = 20, plot.gen.eff = TRUE) plot(x = CIM) plot(x = CIM, gen.eff = TRUE, mppData = mppData, Q.eff = "cr") # Bi-allelic model ################## cofactors <- mppData$map[c(15, 63), 1] CIM <- mpp_CIM(mppData = mppData, Q.eff = "biall", cofactors = cofactors, window = 20) plot(x = CIM, type = "h")
# Cross-specific effect model ############################# data(mppData) SIM <- mpp_SIM(mppData = mppData, Q.eff = "cr") cofactors <- QTL_select(Qprof = SIM, threshold = 3, window = 20) CIM <- mpp_CIM(mppData = mppData, Q.eff = "cr", cofactors = cofactors, window = 20, plot.gen.eff = TRUE) plot(x = CIM) plot(x = CIM, gen.eff = TRUE, mppData = mppData, Q.eff = "cr") # Bi-allelic model ################## cofactors <- mppData$map[c(15, 63), 1] CIM <- mpp_CIM(mppData = mppData, Q.eff = "biall", cofactors = cofactors, window = 20) plot(x = CIM, type = "h")
Evaluation of MPP QTL detection procedure by cross-validation (CV).
mpp_CV( pop.name = "MPP_CV", trait.name = "trait1", mppData, trait = 1, her = 1, Rep = 10, k = 5, Q.eff = "cr", thre.cof = 3, win.cof = 50, N.cim = 1, window = 20, thre.QTL = 3, win.QTL = 20, backward = TRUE, alpha.bk = 0.05, n.cores = 1, verbose = TRUE, output.loc )
mpp_CV( pop.name = "MPP_CV", trait.name = "trait1", mppData, trait = 1, her = 1, Rep = 10, k = 5, Q.eff = "cr", thre.cof = 3, win.cof = 50, N.cim = 1, window = 20, thre.QTL = 3, win.QTL = 20, backward = TRUE, alpha.bk = 0.05, n.cores = 1, verbose = TRUE, output.loc )
pop.name |
|
trait.name |
|
mppData |
An object of class |
trait |
|
her |
|
Rep |
|
k |
|
Q.eff |
|
thre.cof |
|
win.cof |
|
N.cim |
|
window |
|
thre.QTL |
|
win.QTL |
|
backward |
|
alpha.bk |
|
n.cores |
|
verbose |
|
output.loc |
Path where a folder will be created to save the results. |
For details on the MPP QTL detection models see mpp_SIM
documentation. The CV scheme is adapted from Utz et al. (2000) to the MPP
context. A single CV run works like that:
Generation of a k-fold partition of the data. The partition is done within crosses. Each cross is divided into k subsets. Then for the kth repetition, the kth subset is used as validation set, the rest goes into the training set.
For the kth repetition, utilization of the training set for cofactor
selection and multi-QTL model determination (mpp_SIM
and
mpp_CIM
). If backward = TRUE
, the final list of QTLs is
tested simultaneously using a backward elimination
(mpp_back_elim
).
Use the list of detected QTLs in the training set to calculate
the proportion of genetic variance explained by all detected QTLs in the
training set (p.ts = R2.ts/h2). Where R2.ts is the adjusted
R squared and h2 is the average within cross heritability (her
). By
default, her = 1, which mean that
For each single QTL effect, difference partial R squared are also
calculated. Difference R squared are computed by doing the difference between
a model with all QTLs and a model without the ith position. For details about R
squared computation and adjustment look at QTL_R2
.
Use the estimates of the QTL effects in the training set (B.ts) to
predict the phenotypic values of the validation set. y.pred.vs = X.vs*B.ts.
Computes the predicted R squared in the validation set using the squared
Pearson correlation coefficient between the real values (y.vs) and the
predicted values (y.pred.vs). R2.vs = cor(y.ts,y.pred.ts)^2. Then
the predicted genetic variance in the validation set (p.vs) is equal to
p.vs = R2.vs/h2. For heritability correction, the user can provide a single
value for the average within cross heritability or a vector specifying each
within cross heritability. By default, her = 1
, which means that the
results represent the proportion of phenotypic variance explained (predicted)
in the training (validation) sets.
The predicted R squared is computed per cross and then averaged at the population level (p.ts). Both results are returned. Partial QTL predicted R squared are also calculated using the difference between the predicted R squared using all QTL and the predicted R squared without QTL i. The bias between p.ts and p.vs is calculated as bias = 1 - (p.vs/p.ts).
List
containing the following results items:
CV_res |
|
p.vs.cr |
|
QTL |
|
QTL.profiles |
|
The results elements return as R object are also saved as text
files at the specified output location (output.loc
). A transparency
plot of the CV results (plot.pdf) is also saved.
Vincent Garin
Utz, H. F., Melchinger, A. E., & Schon, C. C. (2000). Bias and sampling error of the estimated proportion of genotypic variance explained by quantitative trait loci determined from experimental data in maize using cross validation and validation with independent samples. Genetics, 154(4), 1839-1849.
mpp_back_elim
,
mpp_CIM
,
mpp_perm
,
mpp_SIM
,
QTL_R2
## Not run: data(mppData) # Specify a location where your results will be saved my.loc <- tempdir() CV <- mpp_CV(pop.name = "USNAM", trait.name = "ULA", mppData = mppData, her = .4, Rep = 1, k = 3, verbose = FALSE, output.loc = my.loc) ## End(Not run)
## Not run: data(mppData) # Specify a location where your results will be saved my.loc <- tempdir() CV <- mpp_CV(pop.name = "USNAM", trait.name = "ULA", mppData = mppData, her = .4, Rep = 1, k = 3, verbose = FALSE, output.loc = my.loc) ## End(Not run)
Multi-parent population QTL analysis model using a forward regression.
mpp_forward( pop.name = "MPP", trait.name = "trait1", mppData, trait = 1, Q.eff = "cr", ref.par = NULL, sum_zero = FALSE, threshold = 4, window = 30, backward = TRUE, alpha.bk = 0.05, plot.Qprof = FALSE, plot.gen.eff = FALSE, CI = FALSE, drop = 1.5, text.size = 18, n.cores = 1, verbose = TRUE, output.loc )
mpp_forward( pop.name = "MPP", trait.name = "trait1", mppData, trait = 1, Q.eff = "cr", ref.par = NULL, sum_zero = FALSE, threshold = 4, window = 30, backward = TRUE, alpha.bk = 0.05, plot.Qprof = FALSE, plot.gen.eff = FALSE, CI = FALSE, drop = 1.5, text.size = 18, n.cores = 1, verbose = TRUE, output.loc )
pop.name |
|
trait.name |
|
mppData |
An object of class |
trait |
|
Q.eff |
|
ref.par |
Optional |
sum_zero |
Optional |
threshold |
|
window |
|
backward |
|
alpha.bk |
|
plot.Qprof |
|
plot.gen.eff |
|
CI |
|
drop |
|
text.size |
|
n.cores |
|
verbose |
|
output.loc |
Path where a folder will be created to save the results. By default the function uses the current working directory. |
The function run a full MPP QTL detection using models with different possible
assumptions concerning the number of alleles at the QTL position. For more details about
the different models, see documentation of the function mpp_SIM
.
The procedure is the following:
Forward regression to determine the a multi-QTL model. The function
selects successively QTL positions with -log10(pval) above the threshold.
Those positions are added as cofactors for following detection run.
The procedure stop when no more position has a -log10(pval) above the
threshold (QTL_forward
).
If backward = TRUE
, backward elimination on the final
list of detected QTLs.
Estimation of the QTL genetic effects and R squared statistics
(QTL_gen_effects
and QTL_R2
).
If CI = TRUE
, confidence interval calculation
based on a CIM- (composite interval mapping removing all cofactors on the
scanned chromosome) of the last run of the forward regression.
If plot.Qprof = TRUE
, plot of the last run of the forward
regression using plot.QTLprof
.
If plot.gen.eff = TRUE
, plot of the genetic effect distribution
along the genome of the last run of the forward regression using
plot.QTLprof
.
Return:
List containing the following items:
n.QTL |
Number of detected QTLs. |
QTL |
|
R2 |
|
QTL.effects |
|
QTL.CI |
If |
Some output files are also saved at the specified location
(output.loc
):
A QTL report (QTL_REPORT.txt) with: 1) the number of detected QTLs;
2) the global R squared statistics; 3) for each QTL, position information
(plus confidence interval if CI = TRUE
) and estimated QTL genetic
effects per cross or parents (for details see QTL_gen_effects
).
The list of QTL (QTL.txt).
The QTL R squared statistics (QTL_R2.txt) (for details see
QTL_R2
).
If CI = TRUE
, the QTL confidence intervals (QTL_CI.txt).
General results of the QTL detection process: number of QTLs and global adjusted and non-adjusted R squared statistics (QTL_genResults.txt).
If plot.Qprof = TRUE
, the plot of the last regression run
(QTL_profile.pdf). If plot.gen.eff = TRUE
, plot of the genetic
effects per cross or parents (gen_eff.pdf) with dashed lines representing
the QTL positions. For more details see plot.QTLprof
Vincent Garin
mpp_SIM
, plot.QTLprof
,
QTL_gen_effects
, QTL_forward
, QTL_R2
## Not run: data(mppData) # Specify a location where your results will be saved my.loc <- "C:/.../..." # Cross-specific model USNAM_cr <- mpp_forward(pop.name = "USNAM", trait.name = "ULA", mppData = mppData, plot.gen.eff = TRUE, plot.Qprof = TRUE, CI = TRUE, output.loc = my.loc) ## End(Not run)
## Not run: data(mppData) # Specify a location where your results will be saved my.loc <- "C:/.../..." # Cross-specific model USNAM_cr <- mpp_forward(pop.name = "USNAM", trait.name = "ULA", mppData = mppData, plot.gen.eff = TRUE, plot.Qprof = TRUE, CI = TRUE, output.loc = my.loc) ## End(Not run)
Determination of an empirical null distribution of the QTL significance threshold for a MPP QTL analysis using permutation test (Churchill and Doerge, 1994).
mpp_perm( mppData, trait = 1, Q.eff = "cr", N = 1000, q.val = 0.95, verbose = TRUE, n.cores = 1 )
mpp_perm( mppData, trait = 1, Q.eff = "cr", N = 1000, q.val = 0.95, verbose = TRUE, n.cores = 1 )
mppData |
An object of class |
trait |
|
Q.eff |
|
N |
Number of permutations. Default = 1000. |
q.val |
Single |
verbose |
|
n.cores |
|
Performs N permutations of the trait data and
computes each time a genome-wide QTL profile. For every run, it stores the
highest -log10(p-val). These values can be used to build a null distribution
for the QTL significance threshold. Quantile values can be determined from
the previous distribution. For more details about the different possible
models and their assumptions see mpp_SIM
documentation.
Return:
List
with the following object:
max.pval |
Vector of the highest genome-wide -log10(p-values). |
q.val |
Quantile values from the QTL significance threshold null distribution. |
seed |
|
Vincent Garin
Churchill, G. A., & Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping. Genetics, 138(3), 963-971.
data(mppData) Perm <- mpp_perm(mppData = mppData, Q.eff = "cr", N = 5)
data(mppData) Perm <- mpp_perm(mppData = mppData, Q.eff = "cr", N = 5)
Multi-parent population QTL analysis.
mpp_proc( pop.name = "MPP", trait.name = "trait1", mppData, trait = 1, Q.eff = "cr", plot.gen.eff = FALSE, thre.cof = 3, win.cof = 50, N.cim = 1, window = 20, thre.QTL = 3, win.QTL = 20, backward = TRUE, alpha.bk = 0.05, ref.par = NULL, sum_zero = FALSE, CI = FALSE, drop = 1.5, text.size = 18, n.cores = 1, verbose = TRUE, output.loc )
mpp_proc( pop.name = "MPP", trait.name = "trait1", mppData, trait = 1, Q.eff = "cr", plot.gen.eff = FALSE, thre.cof = 3, win.cof = 50, N.cim = 1, window = 20, thre.QTL = 3, win.QTL = 20, backward = TRUE, alpha.bk = 0.05, ref.par = NULL, sum_zero = FALSE, CI = FALSE, drop = 1.5, text.size = 18, n.cores = 1, verbose = TRUE, output.loc )
pop.name |
|
trait.name |
|
mppData |
An object of class |
trait |
|
Q.eff |
|
plot.gen.eff |
|
thre.cof |
|
win.cof |
|
N.cim |
|
window |
|
thre.QTL |
|
win.QTL |
|
backward |
|
alpha.bk |
|
ref.par |
Optional |
sum_zero |
Optional |
CI |
|
drop |
|
text.size |
|
n.cores |
|
verbose |
|
output.loc |
Path where a folder will be created to save the results. |
The function run a full MPP QTL detection using models with different possible
assumptions concerning the number of alleles at the QTL position. For more
details about the different models, see documentation of the function
mpp_SIM
. The procedure is the following:
Simple interval mapping (SIM) to select cofactor
(mpp_SIM
).
Composite interval mapping (CIM) with selected cofactors
(mpp_CIM
).
Optional backward elimination on the list of QTL
candidates (backward = TRUE
) (mpp_back_elim
).
Computation of the QTL genetic effects (QTL_gen_effects
)
and proportion of the phenotypic variation explained by the QTLs (R squared)
(QTL_R2
).
Optional QTL confidence interval computation from a CIM- profile
(excluding cofactors on the scanned chromosome) (argument CI=TRUE
).
Return:
List containing the following items:
n.QTL |
Number of detected QTLs. |
cofactors |
|
QTL |
|
R2 |
|
QTL.effects |
|
QTL.CI |
If |
Some output files are also saved at the specified location
(output.loc
):
A QTL report (QTL_REPORT.txt) with: 1) the number of detected QTLs;
2) the global R squared statistics; 3) for each QTL, position information
(plus confidence interval if CI = TRUE
) and estimated QTL genetic
effects per cross or parents (for details see QTL_gen_effects
).
The SIM and CIM results in a text file (SIM.txt, CIM.txt).
The list of cofactors (cofactors.txt).
The list of QTL (QTL.txt).
The QTL R squared statistics (QTL_R2.txt) (for details see
QTL_R2
).
If CI = TRUE
, the QTL confidence intervals (QTL_CI.txt).
General results of the QTL detection process: number of QTLs and global adjusted and non-adjusted R squared statistics (QTL_genResults.txt).
The plot of the CIM profile (QTL_profile.pdf) with dotted vertical
lines representing the cofactors positions. If plot.gen.eff = TRUE
,
plot of the genetic effects per cross or parents (gen_eff.pdf) with dashed
lines representing the QTL positions. For more details see
plot.QTLprof
Vincent Garin
mpp_back_elim
,
mpp_CIM
,
mpp_perm
,
mpp_SIM
,
plot.QTLprof
,
QTL_gen_effects
,
QTL_R2
data(mppData) # Specify a location where your results will be saved my.loc <- tempdir() # Cross-specific model USNAM_cr <- mpp_proc(pop.name = "USNAM", trait.name = "ULA", mppData = mppData, plot.gen.eff = TRUE, CI = TRUE, verbose = FALSE, output.loc = my.loc)
data(mppData) # Specify a location where your results will be saved my.loc <- tempdir() # Cross-specific model USNAM_cr <- mpp_proc(pop.name = "USNAM", trait.name = "ULA", mppData = mppData, plot.gen.eff = TRUE, CI = TRUE, verbose = FALSE, output.loc = my.loc)
Computes single QTL models along the genome using different models.
mpp_SIM(mppData, trait = 1, Q.eff = "cr", plot.gen.eff = FALSE, n.cores = 1)
mpp_SIM(mppData, trait = 1, Q.eff = "cr", plot.gen.eff = FALSE, n.cores = 1)
mppData |
An object of class |
trait |
|
Q.eff |
|
plot.gen.eff |
|
n.cores |
|
The implemented models vary according to the number of alleles assumed at the QTL position and their origin. Four assumptions for the QTL effect are possible.
Concerning the type of QTL effect, the first option is a cross-specific QTL
effects model (Q.eff = "cr"
). In this model, the QTL effects are
assumed to be nested within cross which leads to the estimation of one
parameter per cross. The cross-specific model corresponds to the
disconnected model described in Blanc et al. 2006.
A second possibility is the parental model (Q.eff = "par"
). The
parental model assumes one QTL effect (allele) per parent that are independent
from the genetic background. This means that QTL coming form parent i has the
same effect in all crosses where this parent is used. This model is supposed
to produce better estimates of the QTL due to larger sample size when parents
are shared between crosses.
In a connected MPP (design_connectivity
), if np - 1 < nc, where
np is the number of parents and nc the number of crosses, the parental model
should be more powerful than the cross-specific model because it estimate
a reduced number of QTL parameters. This gain in power will be only true if
the assumption of constant parental effect through crosses holds. Calculated
with HRT assumption, the parental model corresponds to the connected model
presented in Blanc et al. (2006).
The third type of model is the ancestral model (Q.eff = "anc"
). This
model tries to use genetic relatedness that could exist between parents.
Indeed, the parental model assumes that parent are independent which is not
the case. Using genetic relatedness between the parents, it is possible group
these parents into a reduced number of ancestral cluster. Parents belonging
to the same ancestral group are assumed to transmit the same allele
(Jansen et al. 2003; Leroux et al. 2014). The ancestral model estimate
therefore one QTL effect
per ancestral class. Once again, the theoretical expectation is a gain of
QTL detection power by the reduction of the number of parameters to estimate.
The HRT ancestral model correspond to the linkage desequilibrium
linkage analysis (LDLA) models used by Bardol et al. (2013) or
Giraud et al. (2014).
The final possibility is the bi-allelic model (Q.eff = "biall"
).
Bi-allelic genetic predictor are a single vector with value 0, 1 or 2
corresponding to the number of allele copy of the least frequent SNP allele.
Relatedness between lines is therefore defined via identical by state (IBS)
measurement. This model corresponds to models used for association mapping.
For example, it is similar to model B in Wurschum et al. (2012) or
association mapping model in Liu et al. (2012).
Return:
SIM |
|
Vincent Garin
Bardol, N., Ventelon, M., Mangin, B., Jasson, S., Loywick, V., Couton, F., ... & Moreau, L. (2013). Combined linkage and linkage disequilibrium QTL mapping in multiple families of maize (Zea mays L.) line crosses highlights complementarities between models based on parental haplotype and single locus polymorphism. Theoretical and applied genetics, 126(11), 2717-2736.
Blanc, G., Charcosset, A., Mangin, B., Gallais, A., & Moreau, L. (2006). Connected populations for detecting quantitative trait loci and testing for epistasis: an application in maize. Theoretical and Applied Genetics, 113(2), 206-224.
Giraud, H., Lehermeier, C., Bauer, E., Falque, M., Segura, V., Bauland, C., ... & Moreau, L. (2014). Linkage Disequilibrium with Linkage Analysis of Multiline Crosses Reveals Different Multiallelic QTL for Hybrid Performance in the Flint and Dent Heterotic Groups of Maize. Genetics, 198(4), 1717-1734.
Jansen, R. C., Jannink, J. L., & Beavis, W. D. (2003). Mapping quantitative trait loci in plant breeding populations. Crop Science, 43(3), 829-834.
Leroux, D., Rahmani, A., Jasson, S., Ventelon, M., Louis, F., Moreau, L., & Mangin, B. (2014). Clusthaplo: a plug-in for MCQTL to enhance QTL detection using ancestral alleles in multi-cross design. Theoretical and Applied Genetics, 127(4), 921-933.
Liu, W., Reif, J. C., Ranc, N., Della Porta, G., & Wurschum, T. (2012). Comparison of biometrical approaches for QTL detection in multiple segregating families. Theoretical and Applied Genetics, 125(5), 987-998.
Meuwissen T and Luo, Z. (1992). Computing inbreeding coefficients in large populations. Genetics Selection Evolution, 24(4), 305-313.
Wurschum, T., Liu, W., Gowda, M., Maurer, H. P., Fischer, S., Schechert, A., & Reif, J. C. (2012). Comparison of biometrical models for joint linkage association mapping. Heredity, 108(3), 332-340.
# Cross-specific model ###################### data(mppData) SIM <- mpp_SIM(mppData = mppData, Q.eff = "cr", plot.gen.eff = TRUE) plot(x = SIM) plot(x = SIM, gen.eff = TRUE, mppData = mppData, Q.eff = "cr") # Bi-allelic model ################## SIM <- mpp_SIM(mppData = mppData, Q.eff = "biall") plot(x = SIM, type = "h")
# Cross-specific model ###################### data(mppData) SIM <- mpp_SIM(mppData = mppData, Q.eff = "cr", plot.gen.eff = TRUE) plot(x = SIM) plot(x = SIM, gen.eff = TRUE, mppData = mppData, Q.eff = "cr") # Bi-allelic model ################## SIM <- mpp_SIM(mppData = mppData, Q.eff = "biall") plot(x = SIM, type = "h")
mppData
object
Complete mppData
object made from a sample data of the maize US nested
association mapping (NAM) population. This mppData object went through all
the steps of the data processing: create.mppData
,
QC.mppData
, IBS.mppData
, IBD.mppData
, parent_cluster.mppData
. The mppData contain all the data
necessary for the QTL analysis procedures.
data(mppData)
data(mppData)
mppData
The complete mppData
object is a list
containing the following
elements:
geno.IBS: IBS genotype marker matrix.
geno.IBD: R/qtl cross.object containing the genotype within cross IBD probabilities.
geno.id: List of genotypes.
allele.ref: Matrix containing for each marker the most and least frequent marker scores and the two heterozygous scores.
geno.par: Parents marker matrix.
geno.par.clu: Parent marker data used to cluster the parents.
par.clu: Parent clustering results.
mono.anc: Positions for which the ancestral clustering was monomorphic.
pheno: Phenotypic trait matrix.
map: Genetic map corresponding to the geno (IBS, IBD, par) arguments.
haplo.map: Genetic map corresponding to geno.par.clu.
cross.ind: Vector indicating to which cross each genotype belongs.
par.per.cross: Matrix with for each cross the parent 1 and 2.
parents: Vector of parents.
type: Type of population.
n.cr: Number of crosses.
n.par: Number of parents.
n.anc: Average number of ancestral group along the genome.
n.zigo: Number of possible allelic computations 2 (AA/BB) or 3 (AA/AB/BB).
rem.mk: Removed markers in the data processing.
rem.gen: Removed genotypes in the data processing.
status: Indicates the level of progression in the data processing.
create.mppData
, QC.mppData
,
IBS.mppData
, IBD.mppData
,
parent_cluster.mppData
data(mppData)
data(mppData)
Add the new phenotypic values contained in 'pheno' to a mppData
object.
mppData_add_pheno(mppData, pheno)
mppData_add_pheno(mppData, pheno)
mppData |
An object of class |
pheno |
|
Return:
mppData |
New |
Vincent Garin
mppData_mdf_pheno
, subset.mppData
,
data(mppData) pheno_new <- data.frame(geno.id = mppData$geno.id, ph1 = rnorm(498)) mppData <- mppData_add_pheno(mppData = mppData, pheno = pheno_new)
data(mppData) pheno_new <- data.frame(geno.id = mppData$geno.id, ph1 = rnorm(498)) mppData <- mppData_add_pheno(mppData = mppData, pheno = pheno_new)
mppData
object
Example mppData
object representing a subset from the maize EU-NAM Flint
population (Bauer et al. 2013, Lehermeier et al. 2014, Giraud et al. 2014).
data(mppData_GE)
data(mppData_GE)
mppData
Sample data from the maize EU-NAM Flint population. The genotype data were obtained from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50558. The genotypes come from the five following crosses: (UH007 x D152), (UH007 x F03802), (UH007 x F2), (UH007 x F283), and (UH007 x DK105). We selected 100 markers randomly spread on chromosomes five and six. The genetic map was downloaded here http://maizegdb.org/data_center/reference?id=9024747.
The phenotypic data represent the within environment adjusted means for dry matter yield (DMY) calculated at La Coruna (CIAM), at Roggenstein (TUM), at Einbeck (KWS), and at Ploudaniel (INRA_P). The raw plot data were obtained here: http://www.genetics.org/content/198/1/3/suppl/DC1.
Bauer, E., Falque, M., Walter, H., Bauland, C., Camisan, C., Campo, L., ... & Altmann, T. (2013). Intraspecific variation of recombination rate in maize. Genome biology, 14(9), R103.
Giraud, H., Lehermeier, C., Bauer, E., Falque, M., Segura, V., Bauland, C., ... & Schipprack, W. (2014). Linkage disequilibrium with linkage analysis of multiline crosses reveals different multiallelic QTL for hybrid performance in the flint and dent heterotic groups of maize. Genetics, 198(4), 1717-1734.
Lehermeier, C., Krämer, N., Bauer, E., Bauland, C., Camisan, C., Campo, L., ... & Moreau, L. (2014). Usefulness of multiparental populations of maize (Zea mays L.) for genome-based prediction. Genetics, 198(1), 3-16.
data(mppData_GE)
data(mppData_GE)
mppData
object with raw data
mppData
object with raw genotypic and phenotypic data of a sample of the maize US nested association mapping (NAM) population. Different operations
of quality control and data processing still need to be performed before
the QTL detection analysis.
data(mppData_init)
data(mppData_init)
mppData
see examples of the create.mppData
.
data(mppData_init)
data(mppData_init)
Modify the phenotypic values of a mppData
object.
mppData_mdf_pheno(mppData, pheno)
mppData_mdf_pheno(mppData, pheno)
mppData |
An object of class |
pheno |
Two columns |
Return:
mppData |
New |
Vincent Garin
mppData_add_pheno
, subset.mppData
,
data(mppData) pheno_new <- data.frame(geno.id = mppData$geno.id, ULA = rnorm(498)) mppData <- mppData_mdf_pheno(mppData = mppData, pheno = pheno_new)
data(mppData) pheno_new <- data.frame(geno.id = mppData$geno.id, ULA = rnorm(498)) mppData <- mppData_mdf_pheno(mppData = mppData, pheno = pheno_new)
Computes multi-QTL models with cofactors along the genome using an approximate
mixed model computation. An initial variance covariance (VCOV) structure is
calculated using function from the nlme
package. Then, this information
is used to estimate the QTL global and within parental effect significance using a
Wald test.
mppGE_CIM( mppData, trait, VCOV = "UN", VCOV_data = "unique", cofactors = NULL, cof_red = FALSE, cof_pval_sign = 0.1, window = 20, ref_par = NULL, n.cores = 1, maxIter = 100, msMaxIter = 100 )
mppGE_CIM( mppData, trait, VCOV = "UN", VCOV_data = "unique", cofactors = NULL, cof_red = FALSE, cof_pval_sign = 0.1, window = 20, ref_par = NULL, n.cores = 1, maxIter = 100, msMaxIter = 100 )
mppData |
An object of class |
trait |
|
VCOV |
VCOV |
VCOV_data |
|
cofactors |
Object of class |
cof_red |
|
cof_pval_sign |
|
window |
|
ref_par |
Optional |
n.cores |
|
maxIter |
maximum number of iterations for the lme optimization algorithm. Default = 100. |
msMaxIter |
maximum number of iterations for the optimization step inside the lme optimization. Default = 100. |
The estimated model is the following:
For further details see the vignette.
It is possible to calculate one initial VCOV using a null model with all
the cofactors (VCOV_data = "unique"
) or one VCOV per combination of
cofactors (VCOV_data = "minus_cof"
). In the later case, the cofactor
that fall witin a distance of window
on the left and right of a QTL
position is removed for the calculation of the initial VCOV. Therefore,
N_cof + 1 VCOV are calculated.
Return:
CIM |
|
Vincent Garin
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2021). nlme: Linear and Nonlinear Mixed Effects Models_. R package version 3.1-152, <URL: https://CRAN.R-project.org/package=nlme>.
data(mppData_GE) cofactors <- mppData_GE$map$mk.names[c(35, 61)] CIM <- mppGE_CIM(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), cofactors = cofactors, window = 20) Qpos <- QTL_select(CIM) plot(CIM) plot_allele_eff_GE(mppData = mppData_GE, nEnv = 2, EnvNames = c('CIAM', 'TUM'), Qprof = CIM, Q.eff = 'par', QTL = Qpos, text.size = 14)
data(mppData_GE) cofactors <- mppData_GE$map$mk.names[c(35, 61)] CIM <- mppGE_CIM(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), cofactors = cofactors, window = 20) Qpos <- QTL_select(CIM) plot(CIM) plot_allele_eff_GE(mppData = mppData_GE, nEnv = 2, EnvNames = c('CIAM', 'TUM'), Qprof = CIM, Q.eff = 'par', QTL = Qpos, text.size = 14)
QTL detection in MPP characterized in multiple environments.
mppGE_proc( pop.name = "MPP", trait.name = "trait1", mppData, trait, EnvNames = NULL, VCOV = "UN", ref_par = NULL, VCOV_data = "unique", SIM_only = FALSE, thre.cof = 4, win.cof = 50, cof_red = FALSE, cof_pval_sign = 0.1, window = 20, thre.QTL = 4, win.QTL = 20, text.size = 18, n.cores = 1, maxIter = 100, msMaxIter = 100, verbose = TRUE, output.loc = NULL )
mppGE_proc( pop.name = "MPP", trait.name = "trait1", mppData, trait, EnvNames = NULL, VCOV = "UN", ref_par = NULL, VCOV_data = "unique", SIM_only = FALSE, thre.cof = 4, win.cof = 50, cof_red = FALSE, cof_pval_sign = 0.1, window = 20, thre.QTL = 4, win.QTL = 20, text.size = 18, n.cores = 1, maxIter = 100, msMaxIter = 100, verbose = TRUE, output.loc = NULL )
pop.name |
|
trait.name |
|
mppData |
An object of class |
trait |
|
EnvNames |
|
VCOV |
VCOV |
ref_par |
Optional |
VCOV_data |
|
SIM_only |
|
thre.cof |
|
win.cof |
|
cof_red |
|
cof_pval_sign |
|
window |
|
thre.QTL |
|
win.QTL |
|
text.size |
|
n.cores |
|
maxIter |
maximum number of iterations for the lme optimization algorithm. Default = 100. |
msMaxIter |
maximum number of iterations for the optimization step inside the lme optimization. Default = 100. |
verbose |
|
output.loc |
Path where a folder will be created to save the results. Default = NULL. |
The procedure is the following:
Simple interval mapping (SIM) to select cofactors
(mppGE_SIM
).
Composite interval mapping (CIM) with selected cofactors
(mppGE_CIM
).
Estimation of QTLs additive allelic effect
(QTL_effect_GE
).
Estimation of the global QTLs effect R squared and individual QTL effect
R squared (QTL_R2_GE
).
Return:
List containing the following items:
n.QTL |
Number of detected QTLs. |
cofactors |
|
QTL |
|
Q_eff |
|
R2 |
|
Some output files are also saved at the specified location
(output.loc
):
The SIM and CIM results in a RData file (SIM.RData, CIM.RData).
The list of cofactors (cofactors.RData).
The list of QTL (QTLs.RData).
The list of QTL allelic effects (QTL_effects.RData).
The QTL R squared statistics (QTL_R2.RData)
The number of detected QTLs and adjusted R2 (Glb_res.RData)
The plot of the CIM profile (QTL_profile.pdf) with dotted vertical
lines representing the cofactors positions and the
plot of the genetic effects per cross or parents obtained with
plot_allele_eff_GE
(gen_eff.pdf) with dashed
lines representing the QTL positions.
Vincent Garin
mppGE_CIM
,
mppGE_SIM
,
QTL_effect_GE
,
QTL_R2_GE
## Not run: data(mppData_GE) MPP_GE_QTL <- mppGE_proc(pop.name = 'EUNAM', trait.name = 'DMY', mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), n.cores = 1, output.loc = tempdir()) ## End(Not run)
## Not run: data(mppData_GE) MPP_GE_QTL <- mppGE_proc(pop.name = 'EUNAM', trait.name = 'DMY', mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), n.cores = 1, output.loc = tempdir()) ## End(Not run)
Computes single QTL models along the genome using an approximate mixed model
computation. An initial variance covariance (VCOV) structure is calculated
using function from the nlme
package. Then, this information is used
to estimate the QTL global and within parental effect significance using a
Wald test.
mppGE_SIM( mppData, trait, VCOV = "UN", ref_par = NULL, n.cores = 1, maxIter = 100, msMaxIter = 100 )
mppGE_SIM( mppData, trait, VCOV = "UN", ref_par = NULL, n.cores = 1, maxIter = 100, msMaxIter = 100 )
mppData |
An object of class |
trait |
|
VCOV |
VCOV |
ref_par |
Optional |
n.cores |
|
maxIter |
maximum number of iterations for the lme optimization algorithm. Default = 100. |
msMaxIter |
maximum number of iterations for the optimization step inside the lme optimization. Default = 100. |
The estimated model is the following:
For further details see the vignette.
Return:
SIM |
|
Vincent Garin
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2021). nlme: Linear and Nonlinear Mixed Effects Models_. R package version 3.1-152, <URL: https://CRAN.R-project.org/package=nlme>.
data(mppData_GE) SIM <- mppGE_SIM(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM')) Qpos <- QTL_select(Qprof = SIM, threshold = 3, window = 50) plot(x = SIM, QTL = Qpos) plot_allele_eff_GE(mppData = mppData_GE, nEnv = 2, EnvNames = c('CIAM', 'TUM'), Qprof = SIM, Q.eff = 'par', QTL = Qpos, text.size = 14)
data(mppData_GE) SIM <- mppGE_SIM(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM')) Qpos <- QTL_select(Qprof = SIM, threshold = 3, window = 50) plot(x = SIM, QTL = Qpos) plot_allele_eff_GE(mppData = mppData_GE, nEnv = 2, EnvNames = c('CIAM', 'TUM'), Qprof = SIM, Q.eff = 'par', QTL = Qpos, text.size = 14)
Compute a multi-QTL model with a list of QTL candidates (QTL
) and return
the decomposed QTL genetic effects per cross or per parents. The list of QTL
can be of different types (cross-specific, parental, ancestral or bi-allelic).
The type of QTL effects are specified in the vector Q.eff
.
MQE_gen_effects(mppData = NULL, trait = 1, QTL = NULL, Q.eff, ref.par = NULL)
MQE_gen_effects(mppData = NULL, trait = 1, QTL = NULL, Q.eff, ref.par = NULL)
mppData |
An object of class |
trait |
|
QTL |
Vector of |
Q.eff |
|
ref.par |
Optional |
This function computes for each QTL position the genetic effects of the
cross, parental, ancestral or SNP allele components. For the cross-specific
model (Q.eff = "cr"
), the genetics effects represent the substitution
effect of an single allele from the parent 2 (or B) with respect to an allele
coming from the parent 1 or A. All effects are given in absolute value with
the parent that cary the positive allele.
For the parental and the ancestral model (Q.eff = "par" or "anc"
), the
reference allele is defined per interconneted part. The most frequent
parental (ancestral) allele is set as reference. Effects of the other alleles
are estimated as deviation with respect to the reference. For more details
about reference definition see QTL_gen_effects
and
design_connectivity
.
For the bi-allelic model (Q.eff = "biall"
), the genetic effects
represent the effects of a single allele copy of the least frequent allele.
Return:
results |
|
Vincent Garin
MQE_proc
, mpp_SIM
,
QTL_gen_effects
, design_connectivity
data(mppData) SIM <- mpp_SIM(mppData = mppData) QTL <- QTL_select(SIM) QTL.eff <- MQE_gen_effects(mppData = mppData, QTL = QTL[, 1], Q.eff = c("anc", "par", "biall"))
data(mppData) SIM <- mpp_SIM(mppData = mppData) QTL <- QTL_select(SIM) QTL.eff <- MQE_gen_effects(mppData = mppData, QTL = QTL[, 1], Q.eff = c("anc", "par", "biall"))
Build multi-QTL effects (MQE) models in which different QTL effects (cross-specific, parental, ancestral or bi-allelic) can be assumed at different loci.
MQE_proc( pop.name = "MPP_MQE", trait.name = "trait1", mppData = NULL, trait = 1, Q.eff, ref.par = NULL, threshold = 4, window = 30, backward = TRUE, alpha.bk = 0.05, plot.MQE = FALSE, CI = FALSE, drop = 1.5, n.cores = 1, verbose = TRUE, output.loc )
MQE_proc( pop.name = "MPP_MQE", trait.name = "trait1", mppData = NULL, trait = 1, Q.eff, ref.par = NULL, threshold = 4, window = 30, backward = TRUE, alpha.bk = 0.05, plot.MQE = FALSE, CI = FALSE, drop = 1.5, n.cores = 1, verbose = TRUE, output.loc )
pop.name |
|
trait.name |
|
mppData |
An object of class |
trait |
|
Q.eff |
|
ref.par |
Optional |
threshold |
|
window |
|
backward |
|
alpha.bk |
|
plot.MQE |
|
CI |
|
drop |
|
n.cores |
|
verbose |
|
output.loc |
Path where a folder will be created to save the results. |
The possible QTL effect that the user wants to allow must be
specified in Q.eff
. The procedure is the following:
Forward regression to determine a MQE model with different possible assumptions for the QTL effect at different loci. The function use.
Optional backward elimination (backward = TRUE
) on the final
list of detected QTLs.
Estimation of the QTL genetic effects and R squared statistics.
If plot.MQE = TRUE
, plot of the last CIM run of the
forward regression.
If CI = TRUE
, confidence interval calculation based on a
CIM- (CIM without cofactor on the selected chromosome) of the last run of the
forward regression.
Return:
List
containing the following items:
n.QTL |
Number of detected QTLs. |
QTL |
|
R2 |
|
QTL.effects |
|
QTL.CI |
If |
Some output files are also saved at the location specified
(output.loc
):
A QTL report (QTL_REPORT.txt) with: 1) the number of detected QTLs; 2) the global R squared statistics; 3) for each QTL, position information and estimated QTL genetic effect per cross or parents.
The list of QTLs (QTL.txt).
The QTL R squared statistics (QTL_R2.txt) (for details see
QTL_R2
).
General results of the QTL detection process: Number of QTL and global adjusted and non-adjusted R squared statistics. (QTL_genResults.txt).
if plot.MQE = TRUE
, a plot of the last QTL detection run profile
(plot_MQE.pdf).
If CI = TRUE
, the QTL confidence intervals (QTL_CI.txt).
Vincent Garin
## Not run: data(mppData) # Specify a location where your results will be saved my.loc <- tempdir() MQE <- MQE_proc(pop.name = "USNAM", trait.name = "ULA", mppData = mppData, Q.eff = c("par", "biall"), verbose = FALSE, output.loc = my.loc) ## End(Not run)
## Not run: data(mppData) # Specify a location where your results will be saved my.loc <- tempdir() MQE <- MQE_proc(pop.name = "USNAM", trait.name = "ULA", mppData = mppData, Q.eff = c("par", "biall"), verbose = FALSE, output.loc = my.loc) ## End(Not run)
Example of parental clustering object.
data(par_clu)
data(par_clu)
The parent clustering matrix specifies at each genome position the results of a parent clustering into ancestral groups. The matrix rows represent the position and the columns correspond to each parent. For example, if we have at the ith row (1, 2, 3, 2, 1), this means that parents 1 and 5 are in the same group, that 2 and 4 are in another one and that the third parent was assigned to any group.
data(par_clu)
data(par_clu)
mppData
objectsIntegrate the parent clustering information to the mppData object. The parent clustering is necessary to compute the ancestral model. If the parent clustering step is skipped, the ancestral model can not be used but the other models (cross-specific, parental, and bi-allelic) can still be computed.
parent_cluster.mppData(mppData, par.clu = NULL)
parent_cluster.mppData(mppData, par.clu = NULL)
mppData |
An object of class |
par.clu |
|
At a single marker position, two parents can be grouped into a similar
ancestral classes if we assume that they receive there allele from a common
ancestor. The parent clustering information (par.clu
) describe parental
relatedness and which parent belong to which ancestral group. For example,
at marker i, we could have five parents (pA, pB, pC, pD, pE) and the following
clustering information (1, 2, 1, 2, 3). This means that pA and pC received
their allele from the same ancestor (A1). pB and pD also have a shared
ancestor (A2) who is different from (A1). And pE was not included in any
group and can be seen as an independent ancestral group (A3).
The parent clustering information is provided via par.clu
. It is an
interger matrix
with markers in row and parents in columns.
At a particular marker position, parents with the same value are assumed to
inherit from the same ancestor. for more details, see par_clu
.
The marker positions that are considered as monomorphic given the parent clustering information are set back to one allele per parent to still allow the computation of the QTL allelic effect at those positions later.
The parent clustering can be performed using the R package 'clusthaplo' that can be found there: https://cran.r-project.org/src/contrib/Archive/clusthaplo/. The 'clusthaplo' option is not integrated in this version of mppR. However, a version of mppR with function calling clusthaplo can be found on github https://github.com/vincentgarin/mppR (branch master).
An increased mppData
object containing the the same elements
as the mppData
object provided as argument and the
following new elements:
par.clu |
|
n.anc |
Average number of ancestral clusters along the genome. |
mono.anc |
Positions for which the ancestral clustering was monomorphic. |
Vincent Garin
create.mppData
, QC.mppData
,
IBS.mppData
, IBD.mppData
, par_clu
data(mppData_init) data(par_clu) mppData <- QC.mppData(mppData_init) mppData <- IBS.mppData(mppData = mppData) mppData <- IBD.mppData(mppData = mppData, type = 'RIL', type.mating = 'selfing') mppData <- parent_cluster.mppData(mppData = mppData, par.clu = par_clu)
data(mppData_init) data(par_clu) mppData <- QC.mppData(mppData_init) mppData <- IBS.mppData(mppData = mppData) mppData <- IBD.mppData(mppData = mppData, type = 'RIL', type.mating = 'selfing') mppData <- parent_cluster.mppData(mppData = mppData, par.clu = par_clu)
Plot of the genome wide significance of the QTL allelic effects in multiple environments.
plot_allele_eff_GE( mppData, nEnv, EnvNames, Qprof, Q.eff = "par", QTL = NULL, ref_par = NULL, main = "QTL genetic effects plot", text.size = 18 )
plot_allele_eff_GE( mppData, nEnv, EnvNames, Qprof, Q.eff = "par", QTL = NULL, ref_par = NULL, main = "QTL genetic effects plot", text.size = 18 )
mppData |
An object of class |
nEnv |
|
EnvNames |
|
Qprof |
|
Q.eff |
one of "cr", "par", "anc" or "biall". For the moment only "par" is available. |
QTL |
Optional argument. Object of class |
ref_par |
|
main |
Title of the graph. Default = "QTL genetic effects plot". |
text.size |
|
Vincent Garin
data(mppData_GE) SIM <- mppGE_SIM(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM')) Qpos <- QTL_select(Qprof = SIM, threshold = 3, window = 50) plot_allele_eff_GE(mppData = mppData_GE, nEnv = 2, EnvNames = c('CIAM', 'TUM'), Qprof = SIM, Q.eff = 'par', QTL = Qpos, text.size = 14)
data(mppData_GE) SIM <- mppGE_SIM(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM')) Qpos <- QTL_select(Qprof = SIM, threshold = 3, window = 50) plot_allele_eff_GE(mppData = mppData_GE, nEnv = 2, EnvNames = c('CIAM', 'TUM'), Qprof = SIM, Q.eff = 'par', QTL = Qpos, text.size = 14)
Plot allowing the visualization of the QTL parental allelic effect variation given an environmental covariate (EC). The function plot the sensitivity curve of the parent allelic effects.
plot_QxEC( Qeff, EC, env_id = NULL, QTL, sign_thre = 0.05, EC_id = "EC", trait_id = "trait", main = "QTLxEC", col_vec = NULL, text_size = 14 )
plot_QxEC( Qeff, EC, env_id = NULL, QTL, sign_thre = 0.05, EC_id = "EC", trait_id = "trait", main = "QTLxEC", col_vec = NULL, text_size = 14 )
Qeff |
output from the function |
EC |
|
env_id |
|
QTL |
|
sign_thre |
|
EC_id |
|
trait_id |
|
main |
|
col_vec |
|
text_size |
|
QTLxEC sensitivity plot
Vincent Garin
## Not run: data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") EC <- matrix(c(180, 310, 240, 280), 4, 1) rownames(EC) <- c('CIAM', 'TUM', 'INRA', 'KWS') colnames(EC) <- 'cum_rain' Qeff <- QTL_effect_main_QxEC(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'), env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), QTL = Qpos, EC = EC) pl <- plot_QxEC(Qeff, EC = EC, env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), QTL = 2, EC_id = 'cum rain', trait_id = 'DMY') ## End(Not run)
## Not run: data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") EC <- matrix(c(180, 310, 240, 280), 4, 1) rownames(EC) <- c('CIAM', 'TUM', 'INRA', 'KWS') colnames(EC) <- 'cum_rain' Qeff <- QTL_effect_main_QxEC(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'), env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), QTL = Qpos, EC = EC) pl <- plot_QxEC(Qeff, EC = EC, env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), QTL = 2, EC_id = 'cum rain', trait_id = 'DMY') ## End(Not run)
Plots the -log10(p-val) profile of a QTL analysis or a genome-wide genetic effect plot using package ggplot2.
## S3 method for class 'QTLprof' plot( x, gen.eff = FALSE, mppData, Q.eff, QTL = NULL, type = "l", main = "QTL profile", threshold = 3, text.size = 18, ... )
## S3 method for class 'QTLprof' plot( x, gen.eff = FALSE, mppData, Q.eff, QTL = NULL, type = "l", main = "QTL profile", threshold = 3, text.size = 18, ... )
x |
Object of class |
gen.eff |
|
mppData |
An object of class |
Q.eff |
|
QTL |
Optional argument. List of QTL positions. Object of class
|
type |
|
main |
Title of the graph. Default = "QTL profile". |
threshold |
|
text.size |
|
... |
Ignored. |
The user can plot regular QTL profiles (gen.eff = FALSE
) with
-log10(p-val) plotted against genetic position or genome-wide genetic
effects plots (gen.eff = TRUE
). To plot the genome-wide genetic
effects, the SIM and CIM QTL profile must have been computed with
plot.gen.eff = TRUE
.
The genome-wide genetic effects plots is a visualisation of the significance
of the QTL effect per cross or per parents along the genome. For a
cross-specific QTL profile (Q.eff = "cr"
): Blue color means
that the allele coming from parent A(1) increases the phenotypic value and
parent B(2) decreases it and red that parent A(1) decreases the trait and
parent B(2) increases it.
For a parental (Q.eff = "par"
) or an ancestral model
(Q.eff = "anc"
), the results are given per parents. The significance
of the effect must be interpreted as a deviation with respect to the
reference of each connected part. The reference allele is always defined as
the most frequent one. Blue (Red) colour means a signicative negative
(positive) effect with respect to the reference of the connected part.
The reference parental allele can change at each position according to the segregation rate. The parent are plotted from the top to the bottom according to the number of time their allele is set as reference. Therefore interpretation of the genetic effect plot should be done with caution. In that case, the plot should be taken as a rough indication of the signal distribution.
The colour intensity increase with the significance of the effect (p-val). The p-val are transformed into a color code (z). If p-val c [0.00001; 0.05]: z = -log10(p-val). If p-val < 0.00001: z=6. This scale allows to plot only the significant effects (p-val <= 0.05) and prevent the color scale to be determine by highly significant values (p-val < 0.00001). The colours red (positive) and blue (negative) correspond to the sign of the QTL effect.
For both type of plot, the user can pass a list of cofactors or QTL position
to the argument QTL
. These positions will be drawn on the graph using
dotted lines.
Vincent Garin
data(mppData) SIM <- mpp_SIM(mppData = mppData) QTL <- QTL_select(SIM) plot(x = SIM, QTL = QTL) SIM <- mpp_SIM(mppData = mppData, Q.eff = "cr", plot.gen.eff = TRUE) QTL <- QTL_select(SIM) plot(x = SIM, gen.eff = TRUE, mppData = mppData, Q.eff = "cr", QTL = QTL)
data(mppData) SIM <- mpp_SIM(mppData = mppData) QTL <- QTL_select(SIM) plot(x = SIM, QTL = QTL) SIM <- mpp_SIM(mppData = mppData, Q.eff = "cr", plot.gen.eff = TRUE) QTL <- QTL_select(SIM) plot(x = SIM, gen.eff = TRUE, mppData = mppData, Q.eff = "cr", QTL = QTL)
Print summary.mppData object
## S3 method for class 'summary.mppData' print(x, ...)
## S3 method for class 'summary.mppData' print(x, ...)
x |
object of class |
... |
Ignored. |
data(mppData) sum.mppData <- summary(mppData) print(sum.mppData)
data(mppData) sum.mppData <- summary(mppData) print(sum.mppData)
Print summary.QeffRes object
## S3 method for class 'summary.QeffRes' print(x, ...)
## S3 method for class 'summary.QeffRes' print(x, ...)
x |
object of class |
... |
Ignored. |
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "cr") sum.QeffRes <- summary(QTL.effects) print(sum.QeffRes)
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "cr") sum.QeffRes <- summary(QTL.effects) print(sum.QeffRes)
Print summary.QR2Res object
## S3 method for class 'summary.QR2Res' print(x, ...)
## S3 method for class 'summary.QR2Res' print(x, ...)
x |
object of class |
... |
Ignored. |
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) Q_R2 <- QTL_R2(mppData = mppData, QTL = QTL, Q.eff = "cr") sum.QR2Res <- summary(Q_R2) print(sum.QR2Res)
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) Q_R2 <- QTL_R2(mppData = mppData, QTL = QTL, Q.eff = "cr") sum.QR2Res <- summary(Q_R2) print(sum.QR2Res)
mppData
objectsPerform different operations of quality control (QC) on the marker data of an
mppData
object.
QC.mppData( mppData, mk.miss = 0.1, gen.miss = 0.25, n.lim = 15, MAF.pop.lim = 0.05, MAF.cr.lim = NULL, MAF.cr.miss = TRUE, MAF.cr.lim2 = NULL, verbose = TRUE, n.cores = 1 )
QC.mppData( mppData, mk.miss = 0.1, gen.miss = 0.25, n.lim = 15, MAF.pop.lim = 0.05, MAF.cr.lim = NULL, MAF.cr.miss = TRUE, MAF.cr.lim2 = NULL, verbose = TRUE, n.cores = 1 )
mppData |
An object of class |
mk.miss |
|
gen.miss |
|
n.lim |
|
MAF.pop.lim |
|
MAF.cr.lim |
|
MAF.cr.miss |
|
MAF.cr.lim2 |
|
verbose |
|
n.cores |
|
The different operations of the quality control are the following:
Remove markers with more than two alleles.
Remove markers that are monomorphic or fully missing in the parents.
Remove markers with a missing rate higher than mk.miss
.
Remove genotypes with more missing markers than gen.miss
.
Remove crosses with less than n.lim
genotypes.
Keep only the most polymorphic marker when multiple markers map at the same position.
Check marker minor allele frequency (MAF). Different strategy can be used to control marker MAF:
A) A first possibility is to filter marker based on MAF at the whole population
level using MAF.pop.lim
, and/or on MAF within crosses using
MAF.cr.lim
.
The user can give the its own vector of critical values for MAF within cross
using MAF.cr.lim
. By default, the within cross MAF values are defined
by the following function of the cross-size n.ci: MAF(n.ci) = 0.5 if n.ci c
[0, 10] and MAF(n.ci) = (4.5/n.ci) + 0.05 if n.ci > 10. This means that up
to 10 genotypes, the critical within cross MAF is set to 50
decreases when the number of genotype increases until 5
If the within cross MAF is below the limit in at least one cross, then marker
scores of the problematic cross are either put as missing
(MAF.cr.miss = TRUE
) or the whole marker is discarded
(MAF.cr.miss = FALSE
). By default, MAF.cr.miss = TRUE
which
allows to include a larger number of markers and to cover a wider genetic
diversity.
B) An alternative is to select only markers that segregate in at least
on cross at the MAF.cr.lim2
rate.
a filtered mppData
object containing the the same elements
as create.mppData
after filtering. It contains also the
following new elements:
geno.id |
|
ped.mat |
Four columns |
geno.par.clu |
Parent marker matrix without monomorphic or completely missing markers. |
haplo.map |
Genetic map corresponding to the list of marker of the
|
parents |
List of parents. |
n.cr |
Number of crosses. |
n.par |
Number of parents. |
rem.mk |
Vector of markers that have been removed. |
rem.geno |
Vector of genotypes that have been removed. |
Vincent Garin
data(mppData_init) mppData <- QC.mppData(mppData = mppData_init, n.lim = 15, MAF.pop.lim = 0.05, MAF.cr.miss = TRUE, mk.miss = 0.1, gen.miss = 0.25, verbose = TRUE)
data(mppData_init) mppData <- QC.mppData(mppData = mppData_init, n.lim = 15, MAF.pop.lim = 0.05, MAF.cr.miss = TRUE, mk.miss = 0.1, gen.miss = 0.25, verbose = TRUE)
Estimate the QTL parental allelic effects within environment. The estimation
is performed using an exact mixed model with function from R package
nlme
. The significance of the allele effect is assessed using a
Wald test.
QTL_effect_GE( mppData, trait, VCOV = "UN", ref_par = NULL, QTL = NULL, maxIter = 100, msMaxIter = 100 )
QTL_effect_GE( mppData, trait, VCOV = "UN", ref_par = NULL, QTL = NULL, maxIter = 100, msMaxIter = 100 )
mppData |
An object of class |
trait |
|
VCOV |
VCOV |
ref_par |
Optional |
QTL |
Object of class |
maxIter |
maximum number of iterations for the lme optimization algorithm. Default = 100. |
msMaxIter |
maximum number of iterations for the optimization step inside the lme optimization. Default = 100. |
The estimated model is the following:
For further details see the vignette.
Return:
Qeff |
|
Vincent Garin
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2021). nlme: Linear and Nonlinear Mixed Effects Models_. R package version 3.1-152, <URL: https://CRAN.R-project.org/package=nlme>.
data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") Qeff <- QTL_effect_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), QTL = Qpos) Qeff
data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") Qeff <- QTL_effect_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), QTL = Qpos) Qeff
The function estimate a QTL model where each parental QTL allelic effect is decomposed into a main effect and a QTL by environment effect (QEI). It allows the user to determine which parental allelic effects have a significant interaction with the environment.
QTL_effect_main_QEI( mppData, trait, env_id = NULL, ref_env = NULL, ref_par = NULL, VCOV = "UN", QTL = NULL, maxIter = 100, msMaxIter = 100 )
QTL_effect_main_QEI( mppData, trait, env_id = NULL, ref_env = NULL, ref_par = NULL, VCOV = "UN", QTL = NULL, maxIter = 100, msMaxIter = 100 )
mppData |
An object of class |
trait |
|
env_id |
|
ref_env |
Optional |
ref_par |
Optional |
VCOV |
VCOV |
QTL |
Object of class |
maxIter |
maximum number of iterations for the lme optimization algorithm. Default = 100. |
msMaxIter |
maximum number of iterations for the optimization step inside the lme optimization. Default = 100. |
The function estimate the following model
where the QTL effect is decomposed into that represent the
main parental allelic effect across environments and
which is
the QEI effect. allelic effects must be interpreted as deviation with respect
to the reference parent ('ref_par') in the reference environment ('ref_env').
By default the reference parent is the one with the highest allelic frequency
(e.g. central parent in a NAM population).
The estimation is performed using an exact mixed model with function from R
package nlme
. The significance of the allele effect is assessed using a
Wald test.
Return:
List
with one data.frame
per QTL that contains the following
elements:
To be filled
To be filled
Significance of the parent main effect expressed as the -log10(p-val)
Significance of the parent QTLxE effect expressed as the -log10(p-val)
Vincent Garin
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2021). nlme: Linear and Nonlinear Mixed Effects Models_. R package version 3.1-152, <URL: https://CRAN.R-project.org/package=nlme>.
## Not run: data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") Qeff <- QTL_effect_main_QEI(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'), env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), QTL = Qpos) Qeff ## End(Not run)
## Not run: data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") Qeff <- QTL_effect_main_QEI(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'), env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), QTL = Qpos) Qeff ## End(Not run)
After estimating which parental allelic effects have a significant interaction with the environment (QEI), the function extends the model for the allelic effect with a significant QEI to characterize this interaction in terms of sensitivity to (a) specific environmental covariate(s).
QTL_effect_main_QxEC( mppData, trait, env_id = NULL, ref_env = NULL, ref_par = NULL, VCOV = "UN", QTL = NULL, thre_QTL = 2, EC, Qmain_QEI = NULL, maxIter = 100, msMaxIter = 100 )
QTL_effect_main_QxEC( mppData, trait, env_id = NULL, ref_env = NULL, ref_par = NULL, VCOV = "UN", QTL = NULL, thre_QTL = 2, EC, Qmain_QEI = NULL, maxIter = 100, msMaxIter = 100 )
mppData |
An object of class |
trait |
|
env_id |
|
ref_env |
Optional |
ref_par |
Optional |
VCOV |
VCOV |
QTL |
Object of class |
thre_QTL |
|
EC |
|
Qmain_QEI |
results from |
maxIter |
maximum number of iterations for the lme optimization algorithm. Default = 100. |
msMaxIter |
maximum number of iterations for the optimization step inside the lme optimization. Default = 100. |
The function first estimate the parental QTL allele main and QTLxE effect
using the function QTL_effect_main_QEI
. Optionally the output
of QTL_effect_main_QEI
can be passed through the 'Qmain_QEI'
argument. The function consider that a parental QTL allele significantly
interacts with the environment if its QTLxE term is significant at the
'thre_QTL' level. Thre_QTL is expressed in terms of -log10(p-val).
For example, for p-val = 0.01, thre_QTL = -log10(p-val) = 2. Given this
information, the effect of the parental QTL allele with a significant QEI
are extended like that where
represents the EC value in environment j associated with the
sensitivity term
. The
determines the rate of change of
the parental QTL allelic additive effect given an extra unit of EC. Finally,
is a residual effect. The fitted model becomes:
The estimation is performed using an exact mixed model with function from R
package nlme
. The significance of is assessed using a
Wald test.
Return:
List
with one data.frame
per QTL that contains the following
elements:
QTL parent allele main effect expressed as deviation with respect to the reference parent
QTL parent allele effect in environment j expressed as deviation with respect to the reference parent
Significance of the parent main effect expressed as the -log10(p-val)
Significance of the parent QTLxE effect expressed as the -log10(p-val)
Vincent Garin
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2021). nlme: Linear and Nonlinear Mixed Effects Models_. R package version 3.1-152, <URL: https://CRAN.R-project.org/package=nlme>.
## Not run: data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") EC <- matrix(c(180, 310, 240, 280), 4, 1) rownames(EC) <- c('CIAM', 'TUM', 'INRA', 'KWS') colnames(EC) <- 'cum_rain' Qeff <- QTL_effect_main_QxEC(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'), env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), QTL = Qpos, EC = EC) Qeff ## End(Not run)
## Not run: data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") EC <- matrix(c(180, 310, 240, 280), 4, 1) rownames(EC) <- c('CIAM', 'TUM', 'INRA', 'KWS') colnames(EC) <- 'cum_rain' Qeff <- QTL_effect_main_QxEC(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'), env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), QTL = Qpos, EC = EC) Qeff ## End(Not run)
Determines a multi-QTL model using a forward regression.
QTL_forward( mppData = NULL, trait = 1, Q.eff, threshold = 4, window = 30, n.cores = 1, verbose = TRUE )
QTL_forward( mppData = NULL, trait = 1, Q.eff, threshold = 4, window = 30, n.cores = 1, verbose = TRUE )
mppData |
An object of class |
trait |
|
Q.eff |
|
threshold |
|
window |
|
n.cores |
|
verbose |
|
Forward regression to determine the a multi-QTL model. The function selects successively QTL positions with -log10(pval) above the threshold. Those positions are added as cofactors for following detection run. The procedure stop when no more position has a -log10(pval) above the threshold.
Return:
QTL |
|
Vincent Garin
data(mppData) QTL <- QTL_forward(mppData = mppData, Q.eff = "par")
data(mppData) QTL <- QTL_forward(mppData = mppData, Q.eff = "par")
Computes a multi-QTL model with a list of QTL candidates (QTL
) and
return the decomposed QTL effects per cross or per parents.
QTL_gen_effects( mppData, trait = 1, QTL = NULL, Q.eff = "cr", ref.par = NULL, sum_zero = FALSE )
QTL_gen_effects( mppData, trait = 1, QTL = NULL, Q.eff = "cr", ref.par = NULL, sum_zero = FALSE )
mppData |
An object of class |
trait |
|
QTL |
Object of class |
Q.eff |
|
ref.par |
Optional |
sum_zero |
Optional |
This function computes for each QTL position the genetic effects of the
cross, parental, ancestral or SNP allele components. For the cross-specific
model (Q.eff = "cr"
), the genetics effects represent the substitution
effect of an single allele from the parent 2 (or B) with respect to an allele
coming from the parent 1 or A. All effects are given in absolute value with
the parent that carries the positive allele.
For the parental and the ancestral model (Q.eff = "par" or "anc"
), it
is possible to estimate maximum n-1 parental or ancestral alleles per
interconnected part of the design. For these two models, one
parental (ancestral) allele is set as reference per interconnected part of the
design. Effects of the other alleles are estimated as deviation with respect
to the reference. Connected parts of the design can be determined using Weeks
and Williams (1964) method (design_connectivity
). By default,
the reference allele is the most frequent one. The user can also specify a
parental allele that will be used as reference using the argument
ref.par
. This option is only available if the MPP design is composed
of a unique connected part.
For the parental and ancestral model it is also possible to estimate the QTL
effects using a sum to zero constraint sum_zero = TRUE
. In that case,
the effects of the different parental (ancestral) allele will represent the
deviation with respect to the average trait value.
For the bi-allelic model (Q.eff = "biall"
), the genetic effects
represent the effects of a single allele copy of the least frequent allele.
Return:
Object of class QeffRes
containing the following elements:
Qeff |
|
tab.Qeff |
|
Vincent Garin
Weeks, D. L., & Williams, D. R. (1964). A note on the determination of connectedness in an N-way cross classification. Technometrics, 6(3), 319-324.
data(mppData) # QTL candidates SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) # Cross-specific model QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "cr") summary(QTL.effects) # Parental model QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "par") summary(QTL.effects) # Ancestral model QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "anc") summary(QTL.effects) # Bi-allelic model QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "biall") summary(QTL.effects)
data(mppData) # QTL candidates SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) # Cross-specific model QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "cr") summary(QTL.effects) # Parental model QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "par") summary(QTL.effects) # Ancestral model QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "anc") summary(QTL.effects) # Bi-allelic model QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "biall") summary(QTL.effects)
Compute predicted R squared in a validation set using QTLs detected in a
training set. These values are corrected by the heritability her
.
QTL_pred_R2( mppData.ts, mppData.vs, trait = 1, Q.eff = "cr", QTL = NULL, her = 1 )
QTL_pred_R2( mppData.ts, mppData.vs, trait = 1, Q.eff = "cr", QTL = NULL, her = 1 )
mppData.ts |
An object of class |
mppData.vs |
An object of class |
trait |
|
Q.eff |
|
QTL |
Object of class |
her |
|
Compute QTLs predicted R squared in a validation set (mppData.vs
).
These QTLs have been previously detected in a training set
(mppData.ts
). The global R squared (R2 = cor(y.ts,y.pred.ts)^2) is
obtained using the Pearson squared correlation between the observed trait
values in the validation set (y.vs) and predicted values using estimated QTL
effects in the training set (y.pred.vs = X.vs * B.ts).
After that the values are corrected by the general or within cross
heritability her
. By default her = 1
which means that the
R squared represent the proportion of explained phenotypic variance. The
values are returned per cross (R2.cr
) or averaged at the population
level (glb.R2
).
Partial R squared statistics are also calculated for each individual position. The partial R squared are computed by making the difference between the global R squared and the R squared computed without the ith position.
Return:
List
containing the following objects:
glb.R2 |
Global predicted R squared corrected for the heritability of all QTL terms. Doing the average of the within cross predicted R squared (R2.cr) |
R2.cr |
Within cross predicted R squared corrected for the heritability |
part.R2.diff |
Vector of predicted partial R squared corrected for the heritability doing the difference between the full model and a model minus the ith QTL. |
Vincent Garin
data(mppData) folds <- CV_partition(cross.ind = mppData$cross.ind, k = 5) mppData.ts <- subset(x = mppData, gen.list = folds[[1]]$train.set) mppData.vs <- subset(x = mppData, gen.list = folds[[1]]$val.set) SIM <- mpp_SIM(mppData = mppData) QTL <- QTL_select(SIM) QTL_pred_R2(mppData.ts = mppData.ts, mppData.vs = mppData.vs, QTL = QTL)
data(mppData) folds <- CV_partition(cross.ind = mppData$cross.ind, k = 5) mppData.ts <- subset(x = mppData, gen.list = folds[[1]]$train.set) mppData.vs <- subset(x = mppData, gen.list = folds[[1]]$val.set) SIM <- mpp_SIM(mppData = mppData) QTL <- QTL_select(SIM) QTL_pred_R2(mppData.ts = mppData.ts, mppData.vs = mppData.vs, QTL = QTL)
Computes the global and partial (adjusted) R squared of a list of QTLs using a linear model.
QTL_R2(mppData, trait = 1, QTL = NULL, Q.eff = "cr", glb.only = FALSE)
QTL_R2(mppData, trait = 1, QTL = NULL, Q.eff = "cr", glb.only = FALSE)
mppData |
An object of class |
trait |
|
QTL |
Object of class |
Q.eff |
|
glb.only |
|
The function computes R squared statistics using a linear model. The extra variance explained by a full model containing the QTL terms with respect to a reduced model containing only the cross intercept terms and uses the ratio between the residual sum of square of these two models: R2 = 1-(RSS(f))/(RSS(r)).
Partial R squared for each individual QTL position can also be calculated. Two types of partial R squared are returned. The first one uses the difference between the R squared obtained with all QTL positions and the R squared obtain with all position minus the ith one (difference R squared). The second method used only the ith QTL position in the model (single R squared).
For both global and partial R squared, it is possible to obtained adjusted measurements taking the number of degrees of freedom into consideration using an adaptation of the formula given by Utz et al. (2000): R.adj = R-(z/(N-z-n.cr))*(1-R) where z is the total number of estimated components of the genetic effect. N is the total number of phenotypic information, and n.cr is the number of intercept (cross) terms.
Return:
object of class QR2Res
containing the following objects:
glb.R2 |
Global R squared of all QTL terms. |
glb.adj.R2 |
Global adjusted R squared of all QTL terms. |
part.R2.diff |
Vector of partial R squared doing the difference between the full model and a model minus the ith QTL. |
part.adj.R2.diff |
Vector of partial adjusted R squared doing the difference between the full model and a model minus the ith QTL. |
part.R2.sg |
Vector of partial R squared using only the ith QTL. |
part.adj.R2.sg |
Vector of partial adjusted R squared using only the ith QTL. |
Vincent Garin
Utz, H. F., Melchinger, A. E., & Schon, C. C. (2000). Bias and sampling error of the estimated proportion of genotypic variance explained by quantitative trait loci determined from experimental data in maize using cross validation and validation with independent samples. Genetics, 154(4), 1839-1849.
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(Qprof = SIM, threshold = 3, window = 20) Q_R2 <- QTL_R2(mppData = mppData, QTL = QTL, Q.eff = "cr") summary(Q_R2)
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(Qprof = SIM, threshold = 3, window = 20) Q_R2 <- QTL_R2(mppData = mppData, QTL = QTL, Q.eff = "cr") summary(Q_R2)
Compute global and partial R2 statistics for MPP GxE QTL using a linear model. The global R2 is the contribution of all QTL positions while the partial R2 is the specific contribution of an individual QTL position.
QTL_R2_GE(mppData, trait, QTL = NULL, glb.only = FALSE)
QTL_R2_GE(mppData, trait, QTL = NULL, glb.only = FALSE)
mppData |
An object of class |
trait |
|
QTL |
Object of class |
glb.only |
|
Return:
List
containing the global unadjusted R2, the global adjusted R2,
the partial unadjusted R2, and the partial adjusted R2.
Vincent Garin
data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") R2 <- QTL_R2_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), QTL = Qpos)
data(mppData_GE) Qpos <- c("PZE.105068880", "PZE.106098900") R2 <- QTL_R2_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), QTL = Qpos)
Selection of QTL candidate positions.
QTL_select(Qprof, threshold = 3, window = 50, verbose = TRUE)
QTL_select(Qprof, threshold = 3, window = 50, verbose = TRUE)
Qprof |
Object of class |
threshold |
|
window |
|
verbose |
|
The function select QTL positions that are above the given
threshold
per chromosome. Once a position has been selected, and
exclusion window
is set around that position. Positions falling into
that region will not be candidate anymore. The search continue until there
is no more candidate position above the threshold
.
Return:
QTL |
|
This function is a modification of the QTL.reduce function coming from the Biometris pipepline.
RAP (R Analytical Pipeline) (V0.9.1) May 2011
Authors: Paul Eilers (1), Gerrit Gort (1), Sabine Schnabel (1), Lucia Gutierrez(1, 2), Marcos Malosetti(1), Joost van Heerwaarden, and Fred van Eeuwijk(1)
(1) Wageningen University and Research Center, Netherlands (2) Facultad de Agronomia, UDELAR, Uruguay
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(Qprof = SIM, threshold = 3)
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(Qprof = SIM, threshold = 3)
mppData
objectPull out a specified set of markers and/or genotypes from a mppData
object.
## S3 method for class 'mppData' subset(x, mk.list = NULL, gen.list = NULL, ...)
## S3 method for class 'mppData' subset(x, mk.list = NULL, gen.list = NULL, ...)
x |
An object of class |
mk.list |
Optional |
gen.list |
Optional |
... |
Ignored. |
Return:
The mppData object but with only the specified subset of data.
Vincent Garin
### Marker subset data(mppData) # Random selection of markers mk.list <- sample(mppData$map[, 1], 50) mppData_sub <- subset(x = mppData, mk.list = mk.list) # Selection of chromosome 1 marker mk.list <- (mppData$map[, 2] == 1) mppData_sub <- subset(x = mppData, mk.list = mk.list) ### Genotype subset # Random selection of genotypes gen.list <- sample(mppData$geno.id, 200) mppData_sub <- subset(x = mppData, gen.list = gen.list) # Selection of genotype from cross 2 and 5 crosses <- unique(mppData$cross.ind) gen.list <- mppData$geno.id[mppData$cross.ind %in% crosses[c(2, 5)]] mppData_sub <- subset(x = mppData, gen.list = gen.list) ### Marker and genotype subset mk.list <- sample(mppData$map[, 1], 50) gen.list <- sample(mppData$geno.id, 200) mppData_sub <- subset(x = mppData, mk.list = mk.list, gen.list = gen.list)
### Marker subset data(mppData) # Random selection of markers mk.list <- sample(mppData$map[, 1], 50) mppData_sub <- subset(x = mppData, mk.list = mk.list) # Selection of chromosome 1 marker mk.list <- (mppData$map[, 2] == 1) mppData_sub <- subset(x = mppData, mk.list = mk.list) ### Genotype subset # Random selection of genotypes gen.list <- sample(mppData$geno.id, 200) mppData_sub <- subset(x = mppData, gen.list = gen.list) # Selection of genotype from cross 2 and 5 crosses <- unique(mppData$cross.ind) gen.list <- mppData$geno.id[mppData$cross.ind %in% crosses[c(2, 5)]] mppData_sub <- subset(x = mppData, gen.list = gen.list) ### Marker and genotype subset mk.list <- sample(mppData$map[, 1], 50) gen.list <- sample(mppData$geno.id, 200) mppData_sub <- subset(x = mppData, mk.list = mk.list, gen.list = gen.list)
mppData
objectsummary
for object of class mppData
.
## S3 method for class 'mppData' summary(object, ...)
## S3 method for class 'mppData' summary(object, ...)
object |
An object of class |
... |
Ignored. |
data(mppData) summary(mppData)
data(mppData) summary(mppData)
QeffRes
objectsummary
for object of class QeffRes
.
## S3 method for class 'QeffRes' summary(object, QTL = NULL, ...)
## S3 method for class 'QeffRes' summary(object, QTL = NULL, ...)
object |
An object of class |
QTL |
|
... |
Ignored. |
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "cr") summary(QTL.effects)
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) QTL.effects <- QTL_gen_effects(mppData = mppData, QTL = QTL, Q.eff = "cr") summary(QTL.effects)
QR2Res
objectsummary
for object of class QR2Res
.
## S3 method for class 'QR2Res' summary(object, ...)
## S3 method for class 'QR2Res' summary(object, ...)
object |
An object of class |
... |
Ignored. |
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) Q_R2 <- QTL_R2(mppData = mppData, QTL = QTL, Q.eff = "cr") summary(Q_R2)
data(mppData) SIM <- mpp_SIM(mppData) QTL <- QTL_select(SIM) Q_R2 <- QTL_R2(mppData = mppData, QTL = QTL, Q.eff = "cr") summary(Q_R2)
Selection of markers and genotypes from the maize US nested association mapping (NAM) population (McMullen et al., 2009).
data(USNAM_geno)
data(USNAM_geno)
data.frame
Sample of the marker matrix of the US-NAM population. The selection correspond
to 102 markers coming from the two first
chromosomes present in USNAM_map
and the 506 genotypes. These
genotypes correspond to the selected phenotypic values in
USNAM_pheno
. The selected genotypes come from
the following crosses: (B73 x CML103), (B73 x CML322), (B73 x CML52),
(B73 x Hp301), (B73 x M37W). The data of the 6 parental lines are
also included. The data are available on www.panzea.org.
McMullen, M. D., Kresovich, S., Villeda, H. S., Bradbury, P., Li, H., Sun, Q., ... & Buckler, E. S. (2009). Genetic properties of the maize nested association mapping population. Science, 325(5941), 737-740.
data(USNAM_geno)
data(USNAM_geno)
Reduced map of the maize US nested association mapping (NAM) population (McMullen et al., 2009).
data(USNAM_map)
data(USNAM_map)
data.frame
Selection of 102 markers from the two first chromosomes of the Maize US-NAM population (McCullen et al., 2009). The data are available on www.panzea.org.
McMullen, M. D., Kresovich, S., Villeda, H. S., Bradbury, P., Li, H., Sun, Q., ... & Buckler, E. S. (2009). Genetic properties of the maize nested association mapping population. Science, 325(5941), 737-740.
data(USNAM_map)
data(USNAM_map)
Reduced phenotype data from the Maize US nested association mapping (NAM) population (McMullen et al., 2009).
data(USNAM_pheno)
data(USNAM_pheno)
data.frame
Upper leaf angle (ULA) trait values with genotypes identifiers as rownames. These genotypes correspond to the 500 offspring genotypes of the marker matrix USNAM_geno
. The data are available on www.panzea.org.
McMullen, M. D., Kresovich, S., Villeda, H. S., Bradbury, P., Li, H., Sun, Q., ... & Buckler, E. S. (2009). Genetic properties of the maize nested association mapping population. Science, 325(5941), 737-740.
data(USNAM_pheno)
data(USNAM_pheno)