v <- voom(d, design, plot = T)
vfit <- lmFit(v, design)
vfit_c <- contrasts.fit(vfit, contrasts=contrast.matrix)
# Empirical Bayes moderation
vfit_eb <- eBayes(vfit_c, 0.01) # estimating that 1% of genes are differentially expressed
x <- topTable(vfit_eb, number=Inf, coef=z, sort.by="none")
# z is the contrast of interest
# No restrictions are placed on the output of topTable (number=Inf)
df <- as.data.frame(x[,c(1:6)]) # to extract important columnsThe empirical Bayes function of limma isn’t always desired, but the useful output tables are still wanted
Situation
You are preparing differential gene expression calculations with the R/Bioconductor limma package (including RNA-sequencing extensions such as limma-voom). While limma offers empirical Bayes moderation of t-statistics (‘borrowing’ variance estimates across genes), you have a large number of individual samples or have other reasons to avoid this procedure.
Problem
Omitting the eBayes step in the limma workflow means that t-statistics are not calculated, and the useful topTable function of the package to summarize differential expression calculations does not function.
For example (assuming that you have prepared a DGEList object d, a model formula design and a contrasts matrix contrasts.matrix), a typical workflow including empirical Bayes moderation would be as follows:
Solution
An alternative workflow which does not use empirical Bayes moderation and calculates t-statistics separately would be as follows:
v <- voom(d, design, plot = T)
vfit <- lmFit(v, design)
vfit_c <- contrasts.fit(vfit, contrasts=contrast.matrix)
vfit_c$t <- vfit_c$coef/vfit_c$stdev.unscaled/vfit_c$sigma
vfit_c$p.value <- 2 * pt(-abs(vfit_c$t), df = vfit_c$df.residual)
vfit_c$lods <- vfit_eb$lods
# log-odds of differential expression:
# this is not technically correct in this instance but permits the subsequent topTable function to work
x <- topTable(vfit_c, number=Inf, coef=z, sort.by="none") # z is the contrast of interest
df <- as.data.frame(x[,c(1:5)]) # this drops the B value (lods) from the output as it is not correctNote the last line of the second workflow, which drops the ‘dummy’ B value needed for topTable to work. This is a limitation of this workaround. I suspect the ‘dummy’ B value could be almost anything, but for convenience I extracted it from the initial, empirical Bayes-moderated vfit_eb object. Typically I am interested in parallel calculations with and without the empirical Bayes step.
Credit where credit is due
I discovered this workaround in Google and/or Stack Overflow searches in 2023 but did not carefully note its source. It may well have been suggested by one of the limma package authors.
I have placed this same material in a gist
Citation
@online{matkovich2024,
author = {Matkovich, SJ},
title = {Using Limma Without Empirical {Bayes}},
date = {2024-09-07},
url = {https://sjmatkovich.github.io/posts/2024-09-07_limma_without_empirical_Bayes/},
langid = {en}
}