Command-line shortcuts for phylogeny

These guidelines assume you are using a Mac OSX computer. They might work with a linux machine, with some tweaks.
The goal is to be able to do several steps in an analysis with one shortcut. For example, you could convert a file to the correct format, align it, convert to another format, run bootstraps, and convert the final file to a viewable tree file, all with one command.

Steps:

• Create a scripts directory
In your home directory, make a new folder, or ...
In the terminal window, type mkdir scripts
• Make sure you are using the bash shell
type echo $SHELL to make sure it says /bin/bash
• Move files in scripts directory
scripts like encodename.pl , decodename.pl
and executables like muscle, phyml, etc...
In some cases you might have to make these "executable" but do not worry about this yet. If you get an error, you can type:
chmod 775 programname.pl
• Edit .bash_profile
Add scripts directory to PATH by putting the following line at the top
export PATH=$PATH:/Users/lucy/scripts
NOTES:
You need to use a text editor, not Word, to edit this file. (Textwrangler is a great free option.)
This file is loaded each time you open a new terminal window, so you can put shortcuts here. There is a file called cnidtol.txt linked at the bottom of this page. If you do not already have a .bash_profile file, you can just save this attached file using to name in your home directory (not scripts) to make that command available.
This definition adds to the places that the system searches for programs when you type a command in the terminal. That way, you can run muscle from within your data directory and it will know where to find the file.
• This .bash_profile is a place to store your shortcuts.
For example, you can create the "functions" below, and then run them by typing them at the command line. In this case, the functions below expect the user to enter a file name to modify.
To use a script, move to the directory where you have your data, using the cd command. This gets a little complicated, but if you have a folder calledmydata in your home directory, you can type:
cd ~/mydata
In this case, the tilde symbolizes your home directory.
To run one of the scripts, once you are in the correct directory, type:
mac2unix datafile.fasta
Other commands may take two input parameters (called arguments). The phymldna command below can take a file name as well as a number of bootstraps desired. That command is used by typing:
phymldna mysequencefile.NBRF 10
• A useful command for combining sequence files
This works for any kind of text files. In the terminal window, use cd to move to the folder containing your files. (You should become comfortable moving around with this command.)
Use the commandcat *.fasta > combined.fta to join all .fasta files into one combined file. You do notwant to use the same extension for the output file name, because you might get caught in an endless loop!
• Sample scripts
Place these scripts into your .bash_profile to have them accessible from wherever you are. Some require encodename.pl and decodename.pl. Download these files, remove the .txt extension, put them in your scripts folder, and then made executable with the chmod command above.
You can edit the scripts below to your own purposes. In the examples, the $1 (brackets are actually optional), stands for the first thing the user types after the command name. $2 will be second this (after a space) they have entered.
When editing your own scripts, you can just copy the command you would normally type, and insert $1 wherever you want the file name to go. Keep in mind that spaces in file names will be misinterpreted by the command line as separate entities, unless you put a backslash before it, or use the tab key to expand the file name (which will add the \ for you).
To change input and output files (for example, have it save to a file instead of the screen), put the > symbol at the end, and provide a new file name based on the old one. So >new_$1 will save the output of the command in a file called new_originalname
Simple version

function testname () {
echo This is the name you entered: $1
}

File converter unix to mac, no feedback

function mac2unix () {
tr '\r' '\n' < "$1" > u_"$1"
}

Expanded version - gives help with no input and gives feedback of what it's doing

function unix2mac () {
if [ $# -lt 1 ]
then
echo "convert unix to mac cr"
else
tr '\n' '\r' < "$1" > u_"$1"
echo "converting $1 to u_$1"
fi
}

This script takes a sequence file and aligns it using muscle. Because some programs want unix-style and not mac-style carriage returns, it also runs the sequence through a converter first.
function macmuscle (){
if [ $# -lt 1 ]
then
echo "convert mac to unix cr and run muscle"
else
tr '\r' '\n' < "$1" > u_"$1"
echo "converting $1 to u_$1"
muscle -in u_"$1" -out m_"$1"
tr '\n' '\r' < m_"$1" > align_"$1"
echo "aligned file saved as align_$1"
echo "unix formatted alignment is at m_$1"
fi
}

Create phyml format from fasta, then run bootstraps using phyml.

phymldna(){
if [ $# -lt 1 ]
then
echo "Runs a phyml DNA bootstrap run"
echo "Enter the sequence file name"
echo
echo "To specify number of bootstrap replicates"
echo "put the number after the name"
else
BOOTS=100
if [ $2 ]
then
BOOTS=$2
fi
encodename.pl $1
# this removes any suffix after last.
ROOTNAME=${1%.*}
phyml "${ROOTNAME}".ctx.phy 0 i 1 ${BOOTS} GTR e e 1 1.0 BIONJ y y
decodename.pl -h "${ROOTNAME}".ctx.lut "${ROOTNAME}".ctx.phy_phyml_tree.txt
fi
}

Here is some code to use raxml for amino acid analysis. Modify the raxml command line as needed (this uses raxmlHPC with two processors)

raxmlaa(){
if [ $# -lt 1 ]
then
echo "Performs a raxml run on the file specified"
echo "To indicate a number of bootstrap replicates"
echo "put the number after the name"
echo "For example: raxmlaa mydata.fta 50"
else
BOOTS=100
if [ $2 ]
then
BOOTS=$2
fi
encodename.pl $1
# This weird command removes everything after the last period in the filename
# That is, it removes the suffix to leave the root file name
ROOTNAME=${1%.*}
raxmlHPC -T 2 -f a -s alg -x 12345 -N $BOOTS -m PROTGAMMAWAGF -n $1 -s ${ROOTNAME}.ctx.phy
decodename.pl -h ${ROOTNAME}.ctx.lut -o ${ROOTNAME}.raxml_bipart.tre RAxML_bipartitions.$1
fi
}


(Make your own corrections or contact Steve Haddock for clarifications or additions.)