1.bash_basics
bash: How to set up multiple jobs as a pipeline
1. Creating a script
You may like to create one directory like to hold your scripts
mkdir ~/scripts
, and to put this path to .bashrc
or .bash_profile
In order to ensure that no confusion can rise, bash script names often end in ".sh".
create one example bash scripts
vi ~/scripts/myBash1.sh
!/bin/bash
clear echo "This is information provided by mysystem.sh. Program starts now."
echo "Hello, $USER"
echo "Today's date is date
, this is week date +"%V"
." echo "This is uname -s
running on a uname -m
processor."
echo "This is the uptime information:" uptime
echo "I'm creating two variables" USERS=uptime | cut -d "," -f 3
VALUE="4" echo "There are$USERS have used this computer." echo "This is the number: $VALUE"
(Tips: if you use vim, you may like to activate syntax highlighting, type ":syntax enable" in vim, you can add this setting to your .vimrc
file to make it permanent.)
2. Running a script
The script should have execute permissions for the correct owners in order to be runnable.
chmod u+x ~/scripts/myBash1.sh
type ~/scripts/myBash1.sh
, bash ~/scripts/myBash1.sh
or bash -x ~/scripts/myBash1.sh
to run the script.
3. Bash basics
3.1 Variables (page 299)
To set a variable in the shell, use
VARNAME="value"
Setting and exporting is usually done in one step:
export VARNAME="value"
3.2 Quoting characters (page 327)
Escape characters:
Single quotes:
Double quotes:
3.3 Shell expansion (page 325)
Brace expansion {}:
Variable expansion $:
Command substitution:
echo $(date) echo date
echo date
Arithmetic expansion:
3.4 Regular expressions (page 346)
Operator
Effect
.
Matches any single character.
?
The preceding item is optional and will be matched, at most, once.
*
The preceding item will be matched zero or more times.
+
The preceding item will be matched one or more times.
{N}
The preceding item is matched exactly N times.
{N,}
The preceding item is matched N or more times.
{N,M}
The preceding item is matched at least N times, but not more than M times.
-
represents the range if it's not first or last in a list or the ending point of a range in a list.
^
Matches the empty string at the beginning of a line; also represents the characters not in the range of a list.
$
Matches the empty string at the end of a line.
\b
Matches the empty string at the edge of a word.
\B
Matches the empty string provided it's not at the edge of a word.
\<
Match the empty string at the beginning of word.
\>
Match the empty string at the end of word.
3.5 grep, awk, sed, pipe(|), cut, sort, uniq, join, cat, paste (Week 1)
3.6 Conditional statements (page 379)
general:
Primary
Meaning
[-a FILE]
True if FILE exists.
[-b FILE]
True if FILE exists and is a block-special file.
[-c FILE]
True if FILE exists and is a character-special file.
[-d FILE]
True if FILE exists and is a directory.
[-e FILE]
True if FILE exists.
[-f FILE]
True if FILE exists and is a regular file.
[-g FILE]
True if FILE exists and its SGID bit is set.
[-h FILE]
True if FILE exists and is a symbolic link.
[-k FILE]
True if FILE exists and its sticky bit is set.
[-p FILE]
True if FILE exists and is a named pipe (FIFO).
[-r FILE]
True if FILE exists and is readable.
[-s FILE]
True if FILE exists and has a size greater than zero.
[-t FD]
True if file descriptor FD is open and refers to a terminal.
[-u FILE]
True if FILE exists and its SUID (set user ID) bit is set.
[-w FILE]
True if FILE exists and is writable.
[-x FILE]
True if FILE exists and is executable.
[-O FILE]
True if FILE exists and is owned by the effective user ID.
[-G FILE]
True if FILE exists and is owned by the effective group ID.
[-L FILE]
True if FILE exists and is a symbolic link.
[-N FILE]
True if FILE exists and has been modified since it was last read.
[-S FILE]
True if FILE exists and is a socket.
[FILE1 -nt FILE2]
True if FILE1 has been changed more recently than FILE2, or if FILE1 exists and FILE2 does not.
[FILE1 -ot FILE2]
True if FILE1 is older than FILE2, or is FILE2 exists and FILE1 does not.
[FILE1 -ef FILE2]
True if FILE1 and FILE2 refer to the same device and inode numbers.
for loop:
for NAME [in LIST ]; do COMMANDS; done
while loop:
while CONTROL-COMMAND; do CONSEQUENT-COMMANDS; done
until loop:
until TEST-COMMAND; do CONSEQUENT-COMMANDS; done
break and continue
3.7 Functions
FUNCTION () { COMMANDS; }
4. Example
Assuming that you have 5 sequencing data, you are trying to check the mapping quality for each sample based on the output log. The output log is looks like:
Based on these information, you need to write one bash script to extract the number of input reads, uniquely mapped reads number and multi-mapped reads number, and generate one summary file. Usually, the uniquely mapped ratio were used to measure the mapping quality. The sample do not pass the criteria should be labeled.
!/usr/bin/bash
set -o nounset set -o errexit
echo "$OPTIND start at $OPTIND"
while getopts ":i:o:n:p:" optname; do case $optname in i) input="$OPTARG";; o) outputDir="$OPTARG";; n) cutoff="$OPTARG";; p) prefix="$OPTARG";; ?) echo "Usage: basename $0
-i input -o outputDir -n cutoff -p prefix";; :) echo "No argument value for option $OPTARG";; esac
echo "$OPTIND is now $OPTIND"
echo $
done;
Initialize variables
if [ $# -eq 8 ]; then outputDir="${outputDir%*/}" echo "The input file is "basename ${input}
echo "The output directory in "${outputDir} echo "The cutoff is "${cutoff} echo "the prefix for output file is "${prefix}
get values from input file
totalN=cat ${input} | grep 'Number of input reads' | cut -f 2
uniqN=cat ${input} | grep 'Uniquely mapped reads number' | cut -f 2
ratio=bc <<< "scale=4; $uniqN/$totalN"
multiN=cat ${input} | grep 'Number of reads mapped to multiple loci' | cut -f 2
if (( $(echo "$ratio > $cutoff" | bc -l) )); then echo "The mapping result is pass the cutoff" echo -e "${totalN}\t${uniqN}\t${multiN}" | awk 'BEGIN{FS=OFS="\t"}{print $1,$2,$3,$2/$1,$2+$3,($2+$3)/$1,"pass"}' >> $outputDir/$prefix.sta else echo "The mapping result is NOT pass the cutoff" echo -e "${totalN}\t${uniqN}\t${multiN}" | awk 'BEGIN{FS=OFS="\t"}{print $1,$2,$3,$2/$1,$2+$3,($2+$3)/$1,"fail"}' >> $outputDir/$prefix.sta fi
echo "Job finished!" fi
Run the script:
for i in `ls 01.input/`;
do echo $i;
bash bin/sta.sh -i 01.input/$i/Log.final.out -o 02.output/ -n 0.9 -p summary;
done
Homework
level 1: type the code in your computer and understand the meaning for each command.
download link: Week_2_files/bash_example.zip
level 2: try to write one bash script to check the md5 of files in folder and output the file name of the truncated file.
download link: Week_2_files/homework/checkMD5.zip
level 3: try to write one bash script your own.
Reference
https://www.tldp.org/LDP/Bash-Beginners-Guide/html/Bash-Beginners-Guide.html
Last updated