TTT Proteotyping Pipeline¶
Contents:
Installation¶
TTT Proteotyping Pipeline consists of several steps:
- raw to mzXML conversion with ReAdW.
- peptide identification with X!Tandem.
- X!Tandem xml output to FASTA conversion with convert_tandem_xml_2_fasta.py.
- Peptide to reference database alignment with BLAT.
- Antibiotic resistance detection with Proteotyping.
- Taxonomic composition estimation with Proteotyping.
- Listing distinct/unique proteins in X!Tandem output with create_unique_protein_list.py.
ReAdW, X!Tandem, and BLAT are external programs. This documentation will give
brief instructions on how to install and use them together with TTT
Proteotyping Pipeline. The Python programs are a part of this Python package,
available in the TTT_proteotyping_pipeline
folder
in the repository.
The steps can be run manually, or preferably via the included Snakemake script.
Download the code¶
To download the code, clone the repository:
$ hg clone https://bitbucket.org/chalmersmathbioinformatics/TTT_proteotyping_pipeline
This will clone the entire repository to a folder called TTT_proteotyping_pipeline in your current directory.
ReAdW¶
ReAdW is meant to be run under Windows, but can be run under Linux using Wine, see instructions below. Running ReAdW in Wine requires a Linux system with a working 32-bit Wine installation.
Note
It is important that you use 32-bit Wine, as ReAdW cannot be run under 64-bit Wine. As of this writing, 32-bit Wine is only available for RedHat Enterprise Linux 6 and below. Support for 32-bit Wine was removed in RHEL 7.
Get ReAdW¶
ReAdW can be downloaded from the ReAdW Github repository. Either clone the
entire repository or download the binary suitable for your system. Note the
information about the dependencies on three Windows DLL files:
XRawfile2.dll
, fileio.dll
, fregistry.dll
. These files are NOT
supplied with this pipeline.
Create a 32-bit Wine prefix¶
Install Wine. It is important that a 32-bit version of Wine is installed, this normally means packages named
<package>.i686
instead of<package>.x86_64
. In RHEL/CentOS it can be installed like this:yum install wine
Create a Win32 prefix from which to run ReAdW. Make sure to set and export
WINEARCH=win32
during the creation of the wine prefix. Modify the command below to a path of your choice. Note that this step likely requires working X11-forwarding:export WINEARCH=win32 export WINEPREFIX=/path/to/your/desired/wineprefix winecfg
Click OK in any configuration windows that pop up.
Download winetricks to install the required Visual Studio C++ runtimes.
vcrun2010
is required for ReAdW andvcrun2008
is required for the Thermo DLL’s. Again, note that this step requires X11-forwarding to be enabled:wget https://raw.githubusercontent.com/Winetricks/winetricks/master/src/winetricks sh winetricks vcrun2010 vcrun2008
Click through any installation prompts that pop up, and after they complete, finish by registering XRawfile2.dll in your wine prefix:
wine regsvr32 XRawfile2.dll
Running ReAdW¶
Make sure to set the WINEPREFIX
environment variable to the correct path
(same directory you specified when creating the 32 bit wine prefix), then run ReAdW from
your Linux command prompt via Wine:
export WINEPREFIX=/path/to/your/desired/wineprefix
wine /path/to/ReAdW.201510.xcalibur.exe [options] /path/to/sample.raw
Now it should run.
X!Tandem¶
Download and install X!Tandem according to the instructions on the X!Tandem homepage.
To run a sample in X!Tandem, several xml-files must be prepared. There is a Python program
in TTT Proteotyping Pipeline called run_xtandem.py
that will automatically create the
required input files and run X!Tandem for you. It makes it very easy to use X!Tandem:
run_xtandem.py --output OUTFILE --db /PATH/TO/FASTA --threads N --xtandem /PATH/TO/TANDEM.EXE
Running¶
The TTT Proteotyping Pipeline can be run either by manually running each of the
steps, or it can be controlled automatically using Snakemake. The
automation ensures that each RAW input file is taken through all the required
steps to produce the final output. Together with the supplied
snakemake_crontab_script.sh
it can be used as a completely hands-off
automated way of analyzing proteomics samples.
Note
The Snakemake workflow can only be run on Linux computers, as it depends on some Linux command line features.
Work directory¶
The Snakemake workflow requires a work directory containing the following folder structure:
0.raw
1.mzXML
2.xml
3.fasta
4.blast8
5.results
The reference data required to run the entire workflow is usually put in a
single directory (or symlinked there), but they can (in theory) be located
anywhere in the file system. The position of all the required files must be
specified in the TTT_pipeline_snakemake_config.yaml
file. This file must be
specified on the command line when invoking the workflow.
Run the Snakemake workflow¶
To run the Snakemake workflow, ensure that a suitable Python/Conda environment
is activated in which all the proteotyping programs and scripts are available
in PATH
. The minimal command line required to start the workflow is this:
snakemake --snakefile SNAKEFILE --configfile CONFIGFILE
As the work directory is specified in the configfile, the command can in theory
be run anywhere in the file system. It is recommended, however, that the
Snakemake workflow is invoked via the use of the included
snakemkae_crontab_script.sh
which sets some environment parameters to
ensure reliable operation. It uses the linux command flock
to ensure that
only one instance of the workflow is ever run at the same time.
Automatic invokation via crontab¶
The workflow can be invoked automatically at set times using the Linux built-in
crontab
. To edit your personal user’s crontab, type crontab -e
at the
command prompt. Add something like the following line to make the Snakemake
workflow check for new files to analyze three times daily (00:00, 12:00,
18:00):
0 0,12,18 * * * /bin/bash /PATH/TO/snakemake_crontab_script.sh
Make sure to modify the configfile (TTT_pipeline_snakemake_config.yaml
) and
the crontab script file (snakemake_crontab_script.sh
) to match your
environment.