“For such a model there is no need to ask the question “Is the model true?”. If “truth” is to be the “whole truth” the answer must be “No”. The only question of interest is “Is the model illuminating and useful?”

George Box. 1976, Journal of the American Statistical Association


Concepts

  • Data dimensionality (p>n)

  • Rationale

  • Statistical models (BLUP, Bayes and others)

  • Accuracy

  • Examples in animal and plant breeding

  • Cross Validation

  • Mixed models

  • Pedigree and Relationship Matrices

  • Genetic gain

Slides

  1. Introduction to genomic in breeding

  2. Genomic Selection concepts

  3. GS models and implementation

  4. Genomic BLUP (part 1)

  5. Genomic BLUP (part 2)

Papers for discussion (by Patricio Munoz)

Next Tuesday (October,31th), we will discuss two of the papers sent by Patricio last week (numbers 2 and 3 in the list below). They provide a nice review about factors affecting GS, as well as some real applications. I would like each one of you to bring to class 4-5 bullet points with questions and comments about the paper.

  1. Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps

  2. Genomic selection in dairy cattle: Progress and challenges

  3. Genomic selection in forest tree breeding

R Script (by Marcio Resende)

Bellow, it is the script that was discussed during the class. The idea was to demonstrate the effect of LD in the estimate of marker effect.

set.seed(1)
Nmarkers = 3100
NQTLs = 10
Nind=2000

#Defining random QTL positions
QTLs = matrix(sample(c(1:Nmarkers), NQTLs, re = FALSE))

#Choosing one QTL randomly to demonstrate the effect of LD and multiple markers
randomQTLS = sample(QTLs, 1, re = FALSE)

#Simulating the genotypic calls and extracting the QTL's genotype for all the population
genotypes = matrix(sample(c(0,1,2),(Nmarkers * Nind), re = T),ncol = Nmarkers, dimnames = list(c(1: Nind),c(1:Nmarkers)))
QTLsGenotypes = genotypes[,QTLs]

#Simulating additive values, assuming that each allele has an effect of 5 (random choice)
doses = matrix(apply(QTLsGenotypes,1,sum))
geneticEffect = doses * 10

#Simulating the environmental and phenotypic effects
EnvEffects = rnorm(Nind,mean(geneticEffect),sd(geneticEffect))
phenoValues = geneticEffect+EnvEffects

#Heritability
var(geneticEffect)/var(phenoValues)

###########################
SolutionEffect = matrix(0,(Nmarkers + 1),3)

# Loop to create 3 scenarios. 1, 3 and 12 identical genotypic calls flanking the variable "randomQTLS"
for (j in c(1,3,12)){
Geno = genotypes
Geno[,c(seq(from=randomQTLS, to=randomQTLS+(j-1)))] = Geno[,randomQTLS]

#Defining lambda (shrinkage) from the BLUP equation
lambda = as.numeric((var(EnvEffects)/var(geneticEffect))*Nmarkers)
Z = Geno
id = diag(ncol(genotypes))
X = matrix(1,nrow(genotypes),ncol=1)

## Left Hand Side
LHSblup = rbind(cbind(crossprod(X), crossprod (X, Z)), cbind (crossprod (Z, X), (crossprod (Z)+id*lambda)))

## Right Hand Side
RHSblup = rbind (crossprod (X, phenoValues), crossprod (Z, phenoValues))

#Solving the equation
GS = solve(LHSblup, RHSblup)
SolutionEffect[,which(c(1,3,12)==j)] = GS
}

#Plotting the marker effects with j=1
plot(x = c(1:Nmarkers),SolutionEffect[-1,1])
#Plotting the position of the QTLS (green cross)
points(x = QTLs, y = rep(max(SolutionEffect[-1,1]),nrow(QTLs)),pch=3,col = "green")
#Plotting the marker effects with j=3
points(x = c(1:Nmarkers),SolutionEffect[-1,2],col = "red")
#Plotting the position of the QTL being tested (Orange square)
points(x = randomQTLS, y = max(SolutionEffect[-1,1])+0.05,pch=15,col = "orange")
#Plotting the marker effects with j=12
points(x = c(1:Nmarkers),SolutionEffect[-1,3],col = "blue")

Hands-on

Genomic Selection (by Ivone de Bem Oliveira)

This hands-on aims to present basic concepts underlying the use of statistical genetic methods in genomic selectio (GS). The data set presented during the class is available in the BGLR package.

Genomic Selection project

  • Dealine: November 10th.

Contents

Introduction

An introduction about Genomic Selection - 300 to 600 words (2 pts)

Analyses

  1. Material used: Describe the population and markers used. Include comments about the power and resolution of the analysis (see Pérez and De Los Campos 2014; Crossa et al. 2010). (2.0 pts)

  2. Methods and Results: For the GS project, each student was assigned the same environment that was used for the GWAS project. The analysis should be performed using the package rrBLUP (Endelman, 2011). Use graphics to represent, and describe the results. Remember that you are assigned one of the three environments not used in the demonstration class. Your environment is identified by the column position not by the header ID of the column. So, Column 1 = environment 1, Column 2= Environment 2, but Column 3= Environment 4. (2.0 pts).

Discussion and Conclusion

Interpretation about how GS and these results could be used in a plant breeding scenario. Based on the literature, add a discussion considering the differences when GS is compared to other models (e.g QTL and GWAS analysis). To start the discussion you can use as references the paper that published the analysis with the data (Crossa et al. 2010).

You can use this section to expand your knowledge, discuss advantages and disadvantages, mostly compare the results obtained with GS to those obtained in GWAS with the same data, also discuss methods to improve the analysis, the importance to consider different factors in GS analysis, etc. 100 to 300 words (4 pts)

References

(Use at least 5 references)

Crossa, J., de Los Campos, G., Pérez, P., Gianola, D., Burgueño, J., Araus, J. L., … & Arief, V. (2010). Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics, 186(2), 713-724.

Pérez, P., & de Los Campos, G. (2014). Genome-wide regression & prediction with the BGLR statistical package. Genetics, genetics-114.

Endelman, J.B. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250-255.

Please send the project to Ivone (idebem.oliveira@ufl.edu).