These guidelines assume you are using a Mac OSX computer. They might work with a linux machine, with some tweaks.
The goal is to be able to do several steps in an analysis with one shortcut. For example, you could convert a file to the correct format, align it, convert to another format, run bootstraps, and convert the final file to a viewable tree file, all with one command.
Steps:
scripts directory
mkdir scripts
echo $SHELL to make sure it says /bin/bashscripts directoryencodename.pl , decodename.pl muscle, phyml, etc... chmod 775 programname.pl.bash_profileexport PATH=$PATH:/Users/lucy/scripts muscle from within your data directory and it will know where to find the file.cd command. This gets a little complicated, but if you have a folder calledmydata in your home directory, you can type:cd ~/mydata mac2unix datafile.fastaphymldna command below can take a file name as well as a number of bootstraps desired. That command is used by typing:phymldna mysequencefile.NBRF 10cd to move to the folder containing your files. (You should become comfortable moving around with this command.)cat *.fasta > combined.fta to join all .fasta files into one combined file. You do notwant to use the same extension for the output file name, because you might get caught in an endless loop!chmod command above.$1 (brackets are actually optional), stands for the first thing the user types after the command name. $2 will be second this (after a space) they have entered.>new_$1 will save the output of the command in a file called new_originalname
function testname () {
echo This is the name you entered: $1
}
function mac2unix () {
tr '\r' '\n' < "$1" > u_"$1"
}
function unix2mac () {
if [ $# -lt 1 ]
then
echo "convert unix to mac cr"
else
tr '\n' '\r' < "$1" > u_"$1"
echo "converting $1 to u_$1"
fi
}
This script takes a sequence file and aligns it using muscle. Because some programs want unix-style and not mac-style carriage returns, it also runs the sequence through a converter first.
function macmuscle (){
if [ $# -lt 1 ]
then
echo "convert mac to unix cr and run muscle"
else
tr '\r' '\n' < "$1" > u_"$1"
echo "converting $1 to u_$1"
muscle -in u_"$1" -out m_"$1"
tr '\n' '\r' < m_"$1" > align_"$1"
echo "aligned file saved as align_$1"
echo "unix formatted alignment is at m_$1"
fi
}
Create phyml format from fasta, then run bootstraps using phyml.
phymldna(){
if [ $# -lt 1 ]
then
echo "Runs a phyml DNA bootstrap run"
echo "Enter the sequence file name"
echo
echo "To specify number of bootstrap replicates"
echo "put the number after the name"
else
BOOTS=100
if [ $2 ]
then
BOOTS=$2
fi
encodename.pl $1
# this removes any suffix after last.
ROOTNAME=${1%.*}
phyml "${ROOTNAME}".ctx.phy 0 i 1 ${BOOTS} GTR e e 1 1.0 BIONJ y y
decodename.pl -h "${ROOTNAME}".ctx.lut "${ROOTNAME}".ctx.phy_phyml_tree.txt
fi
}
Here is some code to use raxml for amino acid analysis. Modify the raxml command line as needed (this uses raxmlHPC with two processors)
raxmlaa(){
if [ $# -lt 1 ]
then
echo "Performs a raxml run on the file specified"
echo "To indicate a number of bootstrap replicates"
echo "put the number after the name"
echo "For example: raxmlaa mydata.fta 50"
else
BOOTS=100
if [ $2 ]
then
BOOTS=$2
fi
encodename.pl $1
# This weird command removes everything after the last period in the filename
# That is, it removes the suffix to leave the root file name
ROOTNAME=${1%.*}
raxmlHPC -T 2 -f a -s alg -x 12345 -N $BOOTS -m PROTGAMMAWAGF -n $1 -s ${ROOTNAME}.ctx.phy
decodename.pl -h ${ROOTNAME}.ctx.lut -o ${ROOTNAME}.raxml_bipart.tre RAxML_bipartitions.$1
fi
}
(Make your own corrections or contact Steve Haddock for clarifications or additions.)