A few tentative steps beyond the R environment I know well

R was the first language in which I developed a level of proficiency and continues to be the major language I use for biological data science. The Bioconductor environment was the selling point for me.

I’ve taken seriously the recommendations of several experts to become highly competent in one programming language before moving on to others, and knowing a little about a large number of languages only goes so far.

Nonetheless, I’ve wanted to develop some ability in Python for a couple of reasons:
* One of the three major environments for single-cell / single-nucleus is Python-based (the scverse), in contrast to the R-based Seurat and Bioconductor SingleCellExperiment approaches
* Google has a mini machine-learning course that uses Python for its exercises. I’d like to take advantage of a quick refresher course such as this to update my learning from the ‘Practical Machine Learning’ module of the Johns Hopkins Data Science Specialization on Coursera, which I took a few years ago.

Python

Just to get my feet wet with the essential syntax and structures, I found this introductory series on Kaggle to be very helpful: https://www.kaggle.com/learn/python

I’m going to supplement this with a couple of recommended prerequisites for Google’s course: NumPy and pandas

Nextflow

Nextflow? Why Nextflow?

My workplace is migrating from an on-premises compute environment for genomic and genetic analyses. Most of the pre-defined omics workflows are from the nfcore project. While it’s relatively unlikely that I’ll be developing new pipelines or substantially modifying current ones, I had the opportunity to take two days of live training based on Seqera’s online material, so why not?

One of the comments made by the trainers was that Nextflow should be considered for individual workflows, not just for analyses requiring compute-cluster scale. The advantages lie in reproducibility, and in avoiding the need to re-compute time-consuming steps of a workflow that have not changed (compared to notebooks / Quarto documents).

How might this compare to e.g. the R {targets} package?

Citation

BibTeX citation:

@online{matkovich2024,
  author = {Matkovich, SJ},
  title = {Exploring Other Languages},
  date = {2024-11-27},
  url = {https://sjmatkovich.github.io/posts/2024-11-27_exploring_other_languages/},
  langid = {en}
}

For attribution, please cite this work as:

Matkovich, SJ. 2024. “Exploring Other Languages.” November 27. https://sjmatkovich.github.io/posts/2024-11-27_exploring_other_languages/.