1.bash_basics

bash: How to set up multiple jobs as a pipeline

1. Creating a script

You may like to create one directory like to hold your scripts

mkdir ~/scripts, and to put this path to .bashrc or .bash_profile

In order to ensure that no confusion can rise, bash script names often end in ".sh".

create one example bash scripts

vi ~/scripts/myBash1.sh

!/bin/bash

clear echo "This is information provided by mysystem.sh. Program starts now."

echo "Hello, $USER"

echo "Today's date is date, this is week date +"%V"." echo "This is uname -s running on a uname -m processor."

echo "This is the uptime information:" uptime

echo "I'm creating two variables" USERS=uptime | cut -d "," -f 3 VALUE="4" echo "There are$USERS have used this computer." echo "This is the number: $VALUE"

(Tips: if you use vim, you may like to activate syntax highlighting, type ":syntax enable" in vim, you can add this setting to your .vimrc file to make it permanent.)

2. Running a script

The script should have execute permissions for the correct owners in order to be runnable.

chmod u+x ~/scripts/myBash1.sh

type ~/scripts/myBash1.sh, bash ~/scripts/myBash1.sh or bash -x ~/scripts/myBash1.sh to run the script.

3. Bash basics

3.1 Variables (page 299)

To set a variable in the shell, use

VARNAME="value"

Setting and exporting is usually done in one step:

export VARNAME="value"

3.2 Quoting characters (page 327)

  • Escape characters:

  • Single quotes:

  • Double quotes:

3.3 Shell expansion (page 325)

  • Brace expansion {}:

  • Variable expansion $:

  • Command substitution:

echo $(date) echo date echo date

  • Arithmetic expansion:

3.4 Regular expressions (page 346)

Operator

Effect

.

Matches any single character.

?

The preceding item is optional and will be matched, at most, once.

*

The preceding item will be matched zero or more times.

+

The preceding item will be matched one or more times.

{N}

The preceding item is matched exactly N times.

{N,}

The preceding item is matched N or more times.

{N,M}

The preceding item is matched at least N times, but not more than M times.

-

represents the range if it's not first or last in a list or the ending point of a range in a list.

^

Matches the empty string at the beginning of a line; also represents the characters not in the range of a list.

$

Matches the empty string at the end of a line.

\b

Matches the empty string at the edge of a word.

\B

Matches the empty string provided it's not at the edge of a word.

\<

Match the empty string at the beginning of word.

\>

Match the empty string at the end of word.

3.5 grep, awk, sed, pipe(|), cut, sort, uniq, join, cat, paste (Week 1)

3.6 Conditional statements (page 379)

  • general:

Primary

Meaning

[-a FILE]

True if FILE exists.

[-b FILE]

True if FILE exists and is a block-special file.

[-c FILE]

True if FILE exists and is a character-special file.

[-d FILE]

True if FILE exists and is a directory.

[-e FILE]

True if FILE exists.

[-f FILE]

True if FILE exists and is a regular file.

[-g FILE]

True if FILE exists and its SGID bit is set.

[-h FILE]

True if FILE exists and is a symbolic link.

[-k FILE]

True if FILE exists and its sticky bit is set.

[-p FILE]

True if FILE exists and is a named pipe (FIFO).

[-r FILE]

True if FILE exists and is readable.

[-s FILE]

True if FILE exists and has a size greater than zero.

[-t FD]

True if file descriptor FD is open and refers to a terminal.

[-u FILE]

True if FILE exists and its SUID (set user ID) bit is set.

[-w FILE]

True if FILE exists and is writable.

[-x FILE]

True if FILE exists and is executable.

[-O FILE]

True if FILE exists and is owned by the effective user ID.

[-G FILE]

True if FILE exists and is owned by the effective group ID.

[-L FILE]

True if FILE exists and is a symbolic link.

[-N FILE]

True if FILE exists and has been modified since it was last read.

[-S FILE]

True if FILE exists and is a socket.

[FILE1 -nt FILE2]

True if FILE1 has been changed more recently than FILE2, or if FILE1 exists and FILE2 does not.

[FILE1 -ot FILE2]

True if FILE1 is older than FILE2, or is FILE2 exists and FILE1 does not.

[FILE1 -ef FILE2]

True if FILE1 and FILE2 refer to the same device and inode numbers.

  • for loop:

for NAME [in LIST ]; do COMMANDS; done

  • while loop:

while CONTROL-COMMAND; do CONSEQUENT-COMMANDS; done

  • until loop:

until TEST-COMMAND; do CONSEQUENT-COMMANDS; done

  • break and continue

3.7 Functions

FUNCTION () { COMMANDS; }

4. Example

Assuming that you have 5 sequencing data, you are trying to check the mapping quality for each sample based on the output log. The output log is looks like:

Based on these information, you need to write one bash script to extract the number of input reads, uniquely mapped reads number and multi-mapped reads number, and generate one summary file. Usually, the uniquely mapped ratio were used to measure the mapping quality. The sample do not pass the criteria should be labeled.

!/usr/bin/bash

set -o nounset set -o errexit

echo "$OPTIND start at $OPTIND"

while getopts ":i:o:n:p:" optname; do case $optname in i) input="$OPTARG";; o) outputDir="$OPTARG";; n) cutoff="$OPTARG";; p) prefix="$OPTARG";; ?) echo "Usage: basename $0 -i input -o outputDir -n cutoff -p prefix";; :) echo "No argument value for option $OPTARG";; esac

echo "$OPTIND is now $OPTIND"

echo $

done;

Initialize variables

if [ $# -eq 8 ]; then outputDir="${outputDir%*/}" echo "The input file is "basename ${input} echo "The output directory in "${outputDir} echo "The cutoff is "${cutoff} echo "the prefix for output file is "${prefix}

get values from input file

totalN=cat ${input} | grep 'Number of input reads' | cut -f 2 uniqN=cat ${input} | grep 'Uniquely mapped reads number' | cut -f 2 ratio=bc <<< "scale=4; $uniqN/$totalN" multiN=cat ${input} | grep 'Number of reads mapped to multiple loci' | cut -f 2

if (( $(echo "$ratio > $cutoff" | bc -l) )); then echo "The mapping result is pass the cutoff" echo -e "${totalN}\t${uniqN}\t${multiN}" | awk 'BEGIN{FS=OFS="\t"}{print $1,$2,$3,$2/$1,$2+$3,($2+$3)/$1,"pass"}' >> $outputDir/$prefix.sta else echo "The mapping result is NOT pass the cutoff" echo -e "${totalN}\t${uniqN}\t${multiN}" | awk 'BEGIN{FS=OFS="\t"}{print $1,$2,$3,$2/$1,$2+$3,($2+$3)/$1,"fail"}' >> $outputDir/$prefix.sta fi

echo "Job finished!" fi

Run the script:

for i in `ls 01.input/`;

do echo $i;

bash bin/sta.sh -i 01.input/$i/Log.final.out -o 02.output/ -n 0.9 -p summary;

done

Homework

level 1: type the code in your computer and understand the meaning for each command.

download link: Week_2_files/bash_example.zip

level 2: try to write one bash script to check the md5 of files in folder and output the file name of the truncated file.

download link: Week_2_files/homework/checkMD5.zip

level 3: try to write one bash script your own.

Reference

https://www.tldp.org/LDP/Bash-Beginners-Guide/html/Bash-Beginners-Guide.html

Last updated