Last updated: 2021-05-10

Checks: 7 0

Knit directory: PredictOutbredCrossVar/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20191123) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 545ab6c. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    output/.DS_Store

Untracked files:
    Untracked:  Icon
    Untracked:  PredictOutbredCrossVarMS_ResponseToReviews_R1.gdoc
    Untracked:  analysis/DavisResearchSeminar.Rmd
    Untracked:  analysis/accuracies.png
    Untracked:  analysis/docs/
    Untracked:  analysis/exploratorypreds.png
    Untracked:  analysis/images/
    Untracked:  manuscript/
    Untracked:  output/crossPredictions/
    Untracked:  output/gblups_DirectionalDom_parentwise_crossVal_folds.rds
    Untracked:  output/gblups_geneticgroups.rds
    Untracked:  output/gblups_parentwise_crossVal_folds.rds
    Untracked:  output/mtMarkerEffects/

Unstaged changes:
    Modified:   analysis/NGCleadersCall.Rmd
    Modified:   data/blups_forawcdata.rds
    Modified:   data/genmap_awc_May2020.rds
    Modified:   data/parentwise_crossVal_folds.rds
    Modified:   data/ped_awc.rds
    Modified:   data/selection_index_weights_4traits.rds
    Modified:   output/CrossesToPredict_top100stdSI_and_209originalParents.rds
    Modified:   output/accuraciesMeans.rds
    Modified:   output/accuraciesUC.rds
    Modified:   output/accuraciesVars.rds
    Modified:   output/crossRealizations/realizedCrossMeans.rds
    Modified:   output/crossRealizations/realizedCrossMeans_BLUPs.rds
    Modified:   output/crossRealizations/realizedCrossMetrics.rds
    Modified:   output/crossRealizations/realizedCrossVars.rds
    Modified:   output/crossRealizations/realizedCrossVars_BLUPs.rds
    Modified:   output/crossRealizations/realized_cross_means_and_covs_traits.rds
    Modified:   output/crossRealizations/realized_cross_means_and_vars_selindices.rds
    Modified:   output/ddEffects.rds
    Modified:   output/gebvs_ModelA_GroupAll_stdSI.rds
    Modified:   output/obsVSpredMeans.rds
    Modified:   output/obsVSpredUC.rds
    Modified:   output/obsVSpredVars.rds
    Modified:   output/pmv_DirectionalDom_varcomps_geneticgroups.rds
    Modified:   output/pmv_varcomps_geneticgroups.rds
    Modified:   output/pmv_varcomps_geneticgroups_tidy_includingSIvars.rds
    Modified:   output/propHomozygous.rds
    Modified:   output/top100stdSI.rds

Staged changes:
    Modified:   README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd) and HTML (docs/index.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 545ab6c wolfemd 2021-05-10 Final polishes for continuity with revised manuscript re-submission.
html 3100a7d wolfemd 2021-03-24 Build site.
Rmd 45e6b20 wolfemd 2021-03-24 Added the Appendix as a standalone HTML page. Include links on the Index page and Navbar (_site.yml).
html be1e9fc wolfemd 2021-03-24 Build site.
Rmd 5018bd0 wolfemd 2021-03-24 Start workflowr project.
html 4de1330 wolfemd 2021-02-01 Build site.
Rmd 883b1d4 wolfemd 2021-02-01 Update the syntax highlighting and code-block formatting throughout for
Rmd 6a10c30 wolfemd 2021-01-04 Submission and GitHub ready version.
html 6a10c30 wolfemd 2021-01-04 Submission and GitHub ready version.

PREPRINT

This repository and website documents all analyses, summary, tables and figures associated with the following PREPRINT: Genomic mating in outbred species: predicting cross usefulness with additive and total genetic covariance matrices!

Abstract

Diverse crops are both outbred and clonally propagated. Breeders typically use truncation selection of parents and invest significant time, land and money evaluating the progeny of crosses to find exceptional genotypes. We developed and tested genomic mate selection criteria suitable for organisms of arbitrary homozygosity level where the full-sibling progeny are of direct interest as future parents and/or cultivars. We extended cross variance and covariance variance prediction to include dominance effects and predicted the multivariate selection index genetic variance of crosses based on haplotypes of proposed parents, marker effects and recombination frequencies. We combined the predicted mean and variance into usefulness criteria for parent and variety development. We present an empirical study of cassava (Manihot esculenta), a staple tropical root crop. We assessed the potential to predict the multivariate genetic distribution (means, variances and trait covariances) of 462 cassava families in terms of additive and total value using cross-validation. Most variance (89%) and covariance (70%) prediction accuracy estimates were greater than zero. The usefulness of crosses were accurately predicted with good correspondence between the predicted and the actual mean performance of family members breeders selected for advancement as new parents and candidate varieties. We also used a directional dominance model to quantify significant inbreeding depression for most traits. We predicted 47,083 possible crosses of 306 parents and contrasted them to those previously tested to show how mate selection can reveal new potential within the germplasm. We enable breeders to consider the potential of crosses to produce future parents (progeny with top breeding values) and varieties (progeny with top own performance).

Manuscript

  1. Results: numeric summaries for the text
  2. Figures: main text figures
  3. Supplementary Tables
  4. Supplementary Figures: referenced in both main text and appendix
  5. Appendix: additional comparisons that did not quite belong in the primary article

See also drop-down navigation menu “Manuscript”.

Data availability and reproducibility

The R package workflowr was used to document this study reproducibly.

Much of the supporting data and output from the analyses documented here are too large for GitHub.

The repository will be mirrored, here: ftp://ftp.cassavabase.org/manuscripts/Wolfe_et_al_2021

or until publication here.

Analyses

  1. Preprocess data files: Initial processing and subsetting of all input files.
  2. Fit multi-trait Bayesian ridge regressions (MtBRRs): We fit the following MtBRR models: (1) additive-only (A), (2) additive plus dominance (AD or ClassicAD) and (3) additive plus directional dominance (DirDom or DirDomAD). First, we define parent-wise cross-validation folds. Next, we fit MtBRRs for both the parent-wise cross-validation and each of the genetic groups in the pedigree. Finally, using MtBRR marker-effects, we compute genomic BLUPs (i.e. GEBV and GETGV) for later use.
  3. Predict cross means, variances and co-variances: Using training-set marker effects obtained in the previous step, we predict (co)variances and means for each cross according to the cross-validation scheme.
  4. Compute observed cross means, variances and covariances: This step is to create validation data to which predictions of cross means and (co)variances will be correlated. We compute the empirical means and variances for each validation-family in two ways. First, we use GEBV and GETGV of actual offspring based on test-set marker effeects. Second, and for comparison only, we use phenotypic BLUPs of the offspring, where available. Lastly, we measure the realized selection intensity for each cross and compute the mean GEBV / GETGV of selected family-members as validation data for predicted cross “usefulness criteria”.
  5. Compute prediction accuracy: Format predicted and observed values for each cross specified by the cross-validation scheme.
  6. Exploratory analysis: predict the means and variances of previously untested crosses: Using marker-effects obtained from fitting All data, predict all possible crosses between (1) the 100 clones with top rank on the “StdSI” and (2) the 209 clones already used as parents.
  7. Compute posterior mean (co)variance parameters from MtBRRs: For comparison and context, compute genetic variances and covariances accounting for LD (Lehermeier M2) for each of the genetic groups (“All”,“GG”,“TMS13”,“TMS14”,“TMS15”).

Supporting R package predCrossVar

In addition, we combined many of the core (and useful) support functions for predicting crosses into an R package predCrossVar, which is available on GitHub and can be installed with, e.g.:

devtools::install_github("wolfemd/predCrossVar", ref = 'master') 

The functions in predCrossVar are used throughout.

Supporting functions code/

The analyses in the html / Rmd files referenced above often source R scripts in the code/ sub-folder. These are wrapper functions around the packaged core functions in predCrossVar, to do the specific analyses for this paper.


sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        whisker_0.4       knitr_1.32        magrittr_2.0.1   
 [5] R6_2.5.0          rlang_0.4.10      fansi_0.4.2       stringr_1.4.0    
 [9] tools_4.0.3       xfun_0.22         utf8_1.2.1        git2r_0.28.0     
[13] jquerylib_0.1.3   htmltools_0.5.1.1 ellipsis_0.3.1    rprojroot_2.0.2  
[17] yaml_2.2.1        digest_0.6.27     tibble_3.1.1      lifecycle_1.0.0  
[21] crayon_1.4.1      later_1.1.0.1     sass_0.3.1        vctrs_0.3.7      
[25] promises_1.2.0.1  fs_1.5.0          glue_1.4.2        evaluate_0.14    
[29] rmarkdown_2.7     stringi_1.5.3     bslib_0.2.4       compiler_4.0.3   
[33] pillar_1.6.0      jsonlite_1.7.2    httpuv_1.5.5      pkgconfig_2.0.3