Here I will place various components which I hope will be useful for people interested in Conservation Genomics. The aim is to include both raw data processing, including quality control, mapping, variant calling etc., and downstream population genetic inferences, such as population structure, phylogenetics, genetic diversity, gene flow, introgression, etc.
For now, the following sections are available:
Most of the work will be done using command line. If youâre not familiar with this, there are good (free!) introductory courses on Datacamp and CodeAcademy. You can also find a good overview of the most common commands here. There are tons of good resources out there, and ChatGPT can be helpful, especially when it comes to fixing the syntax.
To access Unix/Linux shell environment, you can use the native terminal when youâre a Mac/Linux user. If youâre on Windows, please install MobaXterm.
For the downstream population genetics, weâll mostly be using R. Also for R, there are good (free!) introductory courses on Datacamp and CodeAcademy. There are many good resources out there, and also here, ChatGPT can be really helpful.
Depending on the computer infrastructure you are using, you have to run your analyses using a job submissions script. When you log in to the cluster, youâre on a login node, where you can prepare your analyses, write your code, transfer data to and from etc. The actual analyses should be run on the compute nodes, which you cannot access directly. Instead, you submit your job to a scheduler through a job submission script. An example of HPC structure is below:
You can create a small shell script with information about the job submission and the actual code youâd like to run. You can do that in a simple text editor like Notepad (donât use Word, those programs add weird and invisible formatting which can affect your script!), or directly on the cluster by using nano. Start the shell script with:
#!/bin/bash
#$ -pe smp 24
#$ -o output_log.txt
#$ -e error_log.txt
#$ -N jobname
#$ -cwd
If you save this file as a .sh file, you can submit it to the scheduler like this:
qsub your_script.sh
You can track the status of your jobs by doing:
qstat
If your job is not appearing, it means that it has finished (or ended with an error :grimacing:). Otherwise, you can see in the status column what it is currently doing:
| Code | Meaning |
|---|---|
| r | Running |
| qw | Queued and waiting |
| Eqw | Error in queue waiting |
| hqw | Held in queue waiting |
| t | Transferring |
| d | Deleting |
| s | Suspended |
If you want delete a job, because it got stuck or because you realized you made a mistake in the script, use:
qdel <jobID>
Note: Details about the the structure of your job submission script depends on the nature of your HPC and job scheduler.
Everything is still under construction, but feel free to reach out with feedback!