# 1.bash\_basics

bash: **How to set up multiple jobs as a pipeline**

### 1. Creating a script

You may like to create one directory like to hold your scripts

`mkdir ~/scripts`, and to put this path to `.bashrc` or `.bash_profile`

In order to ensure that no confusion can rise, bash script names often end in "**.sh**".

create one example bash scripts

`vi ~/scripts/myBash1.sh`

## !/bin/bash

clear echo "This is information provided by mysystem.sh. Program starts now."

echo "Hello, $USER"

echo "Today's date is `date`, this is week `date +"%V"`." echo "This is `uname -s` running on a `uname -m` processor."

echo "This is the uptime information:" uptime

echo "I'm creating two variables" USERS=`uptime | cut -d "," -f 3` VALUE="4" echo "There are$USERS have used this computer." echo "This is the number: $VALUE"

(Tips: if you use vim, you may like to activate syntax highlighting, type "**:syntax enable**" in vim, you can add this setting to your `.vimrc` file to make it permanent.)

### 2. Running a script

The script should have execute permissions for the correct owners in order to be runnable.

`chmod u+x ~/scripts/myBash1.sh`

type `~/scripts/myBash1.sh`, `bash ~/scripts/myBash1.sh` or `bash -x ~/scripts/myBash1.sh` to run the script.

```
> ~/scripts/myBash1.sh
This is information provided by mysystem.sh. Program starts now.

Hello, liyang

Today's date is Wed Mar 21 22:07:26 CST 2018, this is week 12.
This is Darwin running on a x86_64 processor.

This is the uptime information:
22:07 up 30 days, 12:42, 14 users, load averages: 1.41 1.67 1.74

I'm creating two variables
There are 14 users have used this computer.
This is the number: 4
```

### 3. Bash basics

#### 3.1 Variables (page 299)

To set a variable in the shell, use

`VARNAME="value"`

Setting and exporting is usually done in one step:

`export VARNAME="value"`

#### 3.2 Quoting characters (page 327)

* **Escape characters:**&#x20;

```
echo $date
echo \$date
```

* **Single quotes:**

```bash
echo '$date'
```

* **Double quotes:**

```
echo "$date"
echo "date"
echo "I said: \"Hello World!\""
```

#### 3.3 Shell expansion (page 325)

* **Brace expansion {}:**

```
echo sp{el,il,al}l
```

* **Variable expansion $:**

```
echo $SHELL
echo ${FRANKY:=Franky}
```

* **Command substitution:**&#x20;

echo $(date) echo `date` echo date

* **Arithmetic expansion:**

```
echo $((365*24))
echo $[365*24]
```

#### 3.4 Regular expressions (page 346)

| Operator | Effect                                                                                                          |
| -------- | --------------------------------------------------------------------------------------------------------------- |
| .        | Matches any single character.                                                                                   |
| ?        | The preceding item is optional and will be matched, at most, once.                                              |
| \*       | The preceding item will be matched zero or more times.                                                          |
| +        | The preceding item will be matched one or more times.                                                           |
| {N}      | The preceding item is matched exactly N times.                                                                  |
| {N,}     | The preceding item is matched N or more times.                                                                  |
| {N,M}    | The preceding item is matched at least N times, but not more than M times.                                      |
| -        | represents the range if it's not first or last in a list or the ending point of a range in a list.              |
| ^        | Matches the empty string at the beginning of a line; also represents the characters not in the range of a list. |
| $        | Matches the empty string at the end of a line.                                                                  |
| \b       | Matches the empty string at the edge of a word.                                                                 |
| \B       | Matches the empty string provided it's not at the edge of a word.                                               |
| \\<      | Match the empty string at the beginning of word.                                                                |
| \\>      | Match the empty string at the end of word.                                                                      |

#### 3.5 grep, awk, sed, pipe(|), cut, sort, uniq, join, cat, paste (Week 1)

#### 3.6 Conditional statements (page 379)

* **general:**

| Primary            | Meaning                                                                                         |
| ------------------ | ----------------------------------------------------------------------------------------------- |
| \[-a FILE]         | True if FILE exists.                                                                            |
| \[-b FILE]         | True if FILE exists and is a block-special file.                                                |
| \[-c FILE]         | True if FILE exists and is a character-special file.                                            |
| \[-d FILE]         | True if FILE exists and is a directory.                                                         |
| \[-e FILE]         | True if FILE exists.                                                                            |
| \[-f FILE]         | True if FILE exists and is a regular file.                                                      |
| \[-g FILE]         | True if FILE exists and its SGID bit is set.                                                    |
| \[-h FILE]         | True if FILE exists and is a symbolic link.                                                     |
| \[-k FILE]         | True if FILE exists and its sticky bit is set.                                                  |
| \[-p FILE]         | True if FILE exists and is a named pipe (FIFO).                                                 |
| \[-r FILE]         | True if FILE exists and is readable.                                                            |
| \[-s FILE]         | True if FILE exists and has a size greater than zero.                                           |
| \[-t FD]           | True if file descriptor FD is open and refers to a terminal.                                    |
| \[-u FILE]         | True if FILE exists and its SUID (set user ID) bit is set.                                      |
| \[-w FILE]         | True if FILE exists and is writable.                                                            |
| \[-x FILE]         | True if FILE exists and is executable.                                                          |
| \[-O FILE]         | True if FILE exists and is owned by the effective user ID.                                      |
| \[-G FILE]         | True if FILE exists and is owned by the effective group ID.                                     |
| \[-L FILE]         | True if FILE exists and is a symbolic link.                                                     |
| \[-N FILE]         | True if FILE exists and has been modified since it was last read.                               |
| \[-S FILE]         | True if FILE exists and is a socket.                                                            |
| \[FILE1 -nt FILE2] | True if FILE1 has been changed more recently than FILE2, or if FILE1 exists and FILE2 does not. |
| \[FILE1 -ot FILE2] | True if FILE1 is older than FILE2, or is FILE2 exists and FILE1 does not.                       |
| \[FILE1 -ef FILE2] | True if FILE1 and FILE2 refer to the same device and inode numbers.                             |

* **for loop:**

`for NAME [in LIST ]; do COMMANDS; done`

* **while loop:**

`while CONTROL-COMMAND; do CONSEQUENT-COMMANDS; done`

* **until loop:**

`until TEST-COMMAND; do CONSEQUENT-COMMANDS; done`

* **break** and **continue**

#### 3.7 Functions

`FUNCTION () { COMMANDS; }`

### 4. Example

Assuming that you have 5 sequencing data, you are trying to check the mapping quality for each sample based on the output log. The output log is looks like:

```
> ll
drwxr-xr-x@ 7 liyang staff 238 Mar 22 01:08 01.input
drwxr-xr-x@ 3 liyang staff 102 Mar 22 01:24 02.output
-rw-r--r-- 1 liyang staff 145 Mar 22 01:27 README
drwxr-xr-x 3 liyang staff 102 Mar 22 01:26 bin

> cd 01.input
> ls
sample1 sample2 sample3 sample4 sample5

> cd sample1

> cat Log.final.out
Started job on | Dec 22 13:14:21
Started mapping on | Dec 22 14:10:01
Finished on | Dec 22 14:49:53
Mapping speed, Million of reads per hour | 29.16

Number of input reads | 19372132
Average input read length | 51
UNIQUE READS:
Uniquely mapped reads number | 17395896
Uniquely mapped reads % | 89.80%
Average mapped length | 50.93
Number of splices: Total | 5038
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 3677
Number of splices: GC/AG | 158
Number of splices: AT/AC | 8
Number of splices: Non-canonical | 1195
Mismatch rate per base, % | 0.25%
Deletion rate per base | 0.00%
Deletion average length | 1.48
Insertion rate per base | 0.00%
Insertion average length | 1.36
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 1080034
% of reads mapped to multiple loci | 5.58%
Number of reads mapped to too many loci | 189582
% of reads mapped to too many loci | 0.98%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 2.54%
% of reads unmapped: other | 1.10%
```

Based on these information, you need to write one bash script to extract the ***number of input reads***, ***uniquely mapped reads number*** and ***multi-mapped reads number**,* and generate one summary file. Usually, the ***uniquely mapped ratio*** were used to measure the mapping quality. The sample do not pass the criteria should be labeled.

## !/usr/bin/bash

set -o nounset set -o errexit

## echo "$OPTIND start at $OPTIND"

while getopts ":i:o:n:p:" optname; do case $optname in i) input="$OPTARG";; o) outputDir="$OPTARG";; n) cutoff="$OPTARG";; p) prefix="$OPTARG";; ?) echo "Usage: `basename $0` -i input -o outputDir -n cutoff -p prefix";; :) echo "No argument value for option $OPTARG";; esac

## echo "$OPTIND is now $OPTIND"

## echo $

done;

## Initialize variables

if \[ $# -eq 8 ]; then outputDir="${outputDir%\*/}" echo "The input file is "`basename ${input}` echo "The output directory in "${outputDir} echo "The cutoff is "${cutoff} echo "the prefix for output file is "${prefix}

## get values from input file

totalN=`cat ${input} | grep 'Number of input reads' | cut -f 2` uniqN=`cat ${input} | grep 'Uniquely mapped reads number' | cut -f 2` ratio=`bc <<< "scale=4; $uniqN/$totalN"` multiN=`cat ${input} | grep 'Number of reads mapped to multiple loci' | cut -f 2`

if (( $(echo "$ratio > $cutoff" | bc -l) )); then echo "The mapping result is pass the cutoff" echo -e "${totalN}\t${uniqN}\t${multiN}" | awk 'BEGIN{FS=OFS="\t"}{print $1,$2,$3,$2/$1,$2+$3,($2+$3)/$1,"pass"}' >> $outputDir/$prefix.sta else echo "The mapping result is NOT pass the cutoff" echo -e "${totalN}\t${uniqN}\t${multiN}" | awk 'BEGIN{FS=OFS="\t"}{print $1,$2,$3,$2/$1,$2+$3,($2+$3)/$1,"fail"}' >> $outputDir/$prefix.sta fi

echo "Job finished!" fi

Run the script:

``for i in `ls 01.input/`;``

`do echo $i;`

`bash bin/sta.sh -i 01.input/$i/Log.final.out -o 02.output/ -n 0.9 -p summary;`

`done`

### Homework

**level 1:** type the code in your computer and understand the meaning for each command.

download link: [Week\_2\_files/bash\_example.zip](https://github.com/lulab/training/blob/master/assets/files/bash_example.zip)

**level 2:** try to write one bash script to check the **md5** of files in folder and output the file name of the truncated file.

```
> cd homework/checkMD5/
> ls
file1 file2 file3 file4 file5 md5sum.txt

> cat md5sum.txt
MD5 (./file1) = 15e8d15469ba992caac192b8684396a3
MD5 (./file2) = 85dc26fde0917b4dbba6339607b63d31
MD5 (./file3) = 62fdd313368e8125115ef53a3caaf95b
MD5 (./file4) = d09c83de50bb2da6c0c824fe44ba887a
MD5 (./file5) = db648585dd242f899818247643e599dd
```

download link: [Week\_2\_files/homework/checkMD5.zip](https://github.com/lulab/training/blob/master/assets/files/checkMD5.zip)

**level 3:** try to write one bash script your own.

### Reference

<https://www.tldp.org/LDP/Bash-Beginners-Guide/html/Bash-Beginners-Guide.html>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://youngleebbs.gitbook.io/bioinfo-training/part-i/1.bash_basics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
