Utils

RNA Sequence

RNA Sequence with secondary structure prediction methods.

This tool takes a given sequence and returns the secondary structure prediction provided by 5 different tools: RNAfold, RNAsubopt, ipknot, contextfold and centroid_fold. You must have these tools installed. You don’t have to install all tools if you want to use only one of the methods.

It’s easy to add more methods of your choince to this class.

Installation:

ContextFold (https://www.cs.bgu.ac.il/~negevcb/contextfold/) needs Java. Try this on Ubuntu 14-04 https://askubuntu.com/questions/521145/how-to-install-oracle-java-on-ubuntu-14-04 Single chain only!

ViennaRNA (https://www.tbi.univie.ac.at/RNA/)

ipknot OSX (https://github.com/satoken/homebrew-rnatools)

RNAStructure (http://rna.urmc.rochester.edu/)

Works with 5.8.1; Jun 16, 2016.

Download http://rna.urmc.rochester.edu/RNAstructureDownload.html and untar it in <RNA_PDB_TOOLS>/opt/RNAstructure/; run make, the tools will be compiled in a folder exe. Set up DATPATH in your bashrc to <RNA_PDB_TOOLS>/opt/RNAstructure/data_tables DATAPATH=/home/magnus/work/src/rna-pdb-tools/opt/RNAstructure/data_tables/ (read more http://rna.urmc.rochester.edu/Text/Thermodynamics.html). RNAstructure can be run with SHAPE restraints, read more http://rna.urmc.rochester.edu/Text/File_Formats.html#Constraint about the format. The file format for SHAPE reactivity comprises two columns. The first column is the nucleotide number, and the second is the reactivity. Nucleotides for which there is no SHAPE data can either be left out of the file, or the reactivity can be entered as less than -500. Columns are separated by any white space.

FAQ:

  • Does it work for more than one chain??? Hmm.. I think it’s not. You have to check on your own. –magnus

TIPS:

Should you need to run it on a list of sequences, use the following script:

from rna_pdb_tools import Seq
f = open("listOfSequences.fasta")
for line in f:
 if line.startswith('>'):
   print line,
 else:
   print line,
   s = Seq.Seq(line.strip()) # module first Seq and class second Seq #without strip this has two lines
   print s.predict_ss(method="contextfold"),
   #print s.predict_ss(method="centroid_fold")

@todo should be renamed to RNASeq, and merged with RNASeq class from RNAalignment.

exception rna_pdb_tools.Seq.MethodNotChosen[source]
class rna_pdb_tools.Seq.RNASequence(seq)[source]

RNASequence.

Usage:

>>> seq = RNASequence("CCCCUUUUGGGG")
>>> seq.name = 'RNA03'
>>> print(seq.predict_ss("RNAfold", constraints="((((....))))"))
>RNA03
CCCCUUUUGGGG
((((....)))) ( -6.40)
eval(no_dangling_end_energies=True, verbose=False)[source]

Evaluate energy of RNA sequence.

Args:
no_dangling_end_energies (Boolean) verbose (Boolean)
Returns:
Energy (flaot)

The RNAeval web server calculates the energy of a RNA sequence on a given secondary structure. You can use it

to get a detailed thermodynamic description (loop free-energy decomposition) of your RNA structures.

Simply paste or upload your sequence below and click Proceed. To get more information on the meaning of the options click the help symbols. You can test the server using this sample sequence/structure pair.

An equivalent RNAeval command line call would have been

RNAeval -v -d0 < input.txt

Read more: http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAeval.cgi

predict_ss(method='RNAfold', constraints='', shapefn='', verbose=0)[source]

Predict secondary structure of the seq.

Parameters:
  • method
  • constraints
  • shapefn – path to a file with shape reactivites
  • verbose

It creates a seq fasta file and runs various methods for secondary structure prediction. You can provide also a constraints file for RNAfold and RNAsubopt.

ContextFold:

$ java -cp bin contextFold.app.Predict in:CCCCUUUGGGGG
CCCCUUUGGGGG
((((....))))

It seems that a seq has to be longer than 9. Otherwise:

$ java -cp bin contextFold.app.Predict in:UUUUUUGGG
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10

# this is OK
$ java -cp bin contextFold.app.Predict in:CCCCUUUGGG
CCCCUUUGGG
.(((...)))

RNAstructure:

>>> seq = RNASequence("GGGGUUUUCCC")
>>> print(seq.predict_ss("rnastructure"))
>  ENERGY = -4.4  rna_seq
GGGGUUUUCCC
((((...))))

and with the shape data:

>>> print(seq.predict_ss("rnastructure", shapefn="data/shape.txt"))
>  ENERGY = -0.2  rna_seq
GGGGUUUUCCC
.(((....)))

the shape data:

1 10
2 1
3 1

You can easily see that the first G is unpaired right now! The reactivity of this G was set to 10. Worked!

RNA Secondary Structure

Secondary structure analysis

exception rna_pdb_tools.SecondaryStructure.ExceptionOpenPairsProblem[source]
rna_pdb_tools.SecondaryStructure.draw_ss(title, seq, ss, img_out, resolution=4, verbose=False)[source]

Draw Secondary Structure using VARNA (you need correct configuration for this).

If everything is OK, return None, if an error (=exception) return stderr.

Usage:

>>> seq = 'GGAAACC'
>>> ss =  '((...))'
>>> img_out = 'output/demo.png'
>>> draw_ss('rna', seq, ss, img_out)
>>> print('Made %s' % img_out)
Made output/demo.png
_images/demo.png

Can be used with http://geekbook.readthedocs.io/en/latest/rna.html

rna_pdb_tools.SecondaryStructure.parse_vienna_to_pairs(ss, remove_gaps_in_ss=False)[source]

Parse Vienna (dot-bracket notation) to get pairs.

Parameters:
  • ss (str) – secondary stucture in Vienna (dot-bracket notation) notation
  • remove_gaps_in_ss (bool) – remove - from ss or not, design for DCA (tpp case ss = "(((((((((.((((.(((.....))))))......------)....." works with pk of the first level, [[]]
Returns:

(pairs, pairs_pk)

Return type:

list of two lists

Examples:

>>> parse_vienna_to_pairs('((..))')
([[1, 6], [2, 5]], [])

>>> parse_vienna_to_pairs('(([[))]]')
([[1, 6], [2, 5]], [[3, 8], [4, 7]])

>>> parse_vienna_to_pairs('((--))')
([[1, 6], [2, 5]], [])

>>> parse_vienna_to_pairs('((--))', remove_gaps_in_ss=True)
([[1, 4], [2, 3]], [])

>>> parse_vienna_to_pairs('((((......')
Traceback (most recent call last):
  File "/usr/lib/python2.7/doctest.py", line 1315, in __run
    compileflags, 1) in test.globs
  File "<doctest __main__.parse_vienna_to_pairs[4]>", line 1, in <module>
    parse_vienna_to_pairs('((((......')
  File "./SecondaryStructure.py", line 106, in parse_vienna_to_pairs
    raise ExceptionOpenPairsProblem('Too many open pairs (()) in structure')
ExceptionOpenPairsProblem: Too many open pairs (()) in structure

Blast PDB

A super-simple wrapper around Blast on the PDB db (online).

class rna_pdb_tools.BlastPDB.BlastPDB(seq)[source]

BlastPDB - run Blast online on the PDB database.

This can be used in Jupiter based RNA notebooks, e.g. https://github.com/mmagnus/rna-pdb-tools/blob/master/rp18.ipynb

Usage:

>>> p = BlastPDB('GGGUCAGGCCGGCGAAAGUCGCCACAGUUUGGGGAAAGCUGUGCAGCCUGUAACCCCCCCACGAAAGUGGG')
>>> p.search()
>>> p.result  
u'<HTML>\n<TITLE>BLAST Search Results</TITLE>...
Parameters:seq – string
search()[source]

Search online the seq.

Blastn - select sequences from teh database matched by BLASTn

A super-simple wrapper to parse by headers a BLASTn output (outfmt - 6) sequences in fasta from the database of sequences.

PDB Edit Bfactor/Occupancy

rna_pdb_edit_occupancy_bfactor.py - edit occupancy or bfactor in PDB file.

Example:

rna_pdb_edit_occupancy_bfactor.py --occupancy --select A:1-40,B:1-22 \
                                 --set-to 0 \
                                 19_Bujnicki_Human_4_rpr_n0-000001.pdb


rna_pdb_edit_occupancy_bfactor.py --occupancy \
                                  --select A:1-2 \
                                  --select-atoms P+C4\' \
                                  --set-to 10 \
                                  -o test_data/3w3s_homologymodel_out.PD
                                  --set-not-selected-to 8
                                  test_data/3w3s_homologymodel.pdb

usage: rna_pdb_edit_occupancy_bfactor.py [-h] (--bfactor | --occupancy)
                                         [--select SELECT] [--set-to SET_TO]
                                         [--set-not-selected-to SET_NOT_SELECTED_TO]
                                         [-o OUTPUT] [--verbose]
                                         [--select-atoms SELECT_ATOMS]
                                         file

Positional Arguments

file file

Named Arguments

--bfactor

set bfactor

Default: False

--occupancy

set occupancy

Default: False

--select get chain, e.g A:1-10, works also for multiple chainse.g A:1-40,B:1-22
--set-to

set value to, default is 1

Default: 1

--set-not-selected-to
 

set value to, default is 0

Default: 0

-o, --output file output
--verbose

be verbose

Default: False

--select-atoms select only given atomscan be only one atom, e.g. Por more, use ‘ for prims, e.g. P+C4’
rna_pdb_tools.utils.rna_pdb_edit_occupancy_bfactor.rna_pdb_edit_occupancy_bfactor.edit_occupancy_of_pdb(txt, pdb, pdb_out, bfactor, occupancy, set_to, set_not_selected_to, select_atoms, v=False)[source]

Change ouccupancy or bfactor of pdb file.

Load the structure, and first set everything to be set_not_selected_to and then set selected to sel_to.

Parameters:
  • txt (str) – A:1-10, selection, what to change
  • pdb (str) – filename to read as an input
  • pdb_out (str) – filename to save an output
  • bfactor (bool) – if edit bfactor
  • occupancy (bool) – if edit occupancy
  • set_to (float) – set to this value, if within selection
  • set_not_selected_to (float) – set to this value, if not within selection
  • select_atoms (str) – P, P+C4’, use + as a separator
  • v (bool) – be verbose
Returns:

if OK, save an output to pdb_out

Return type:

bool

Warning

this function requires BioPython

RNA Alignment

RNAalignment - a module to work with RNA sequence alignments.

To see a full demo what you can do with this util, please take a look at the jupiter notebook (https://github.com/mmagnus/rna-pdb-tools/blob/master/rna_pdb_tools/utils/rna_alignment/rna_alignment.ipynb)

Load an alignment in the Stockholm or fasta format:

import rna_alignment as ra
alignment = ra.fasta2stokholm(alignment.fasta)
alignment = ra.RNAalignment

Parameters of the aligmnent:

print(alignment.describe())

Consensus SS:

print(alignment.ss_cons_with_pk)

Get sequnce/s from teh aligment:

>>> seq = a.io[0]

RNASeq

class rna_pdb_tools.utils.rna_alignment.rna_alignment.RNASeq(id, seq, ss=None)[source]

RNASeq.

Parameters:
  • id (str) – id of a sequence
  • seq (str) – seq, it be uppercased.
  • ss (str) – secondary structure, default None
seq_no_gaps

str – seq.replace(‘-‘, ‘’)

ss_no_gaps

str – ss.replace(‘-‘, ‘’)

draw_ss(title='', verbose=False, resolution=1.5)[source]

Draw secondary structure of RNA with VARNA.

VARNA: Visualization Applet for RNA A Java lightweight component and applet for drawing the RNA secondary structure

_images/varna.png

Cite: VARNA: Interactive drawing and editing of the RNA secondary structure Kevin Darty, Alain Denise and Yann Ponty Bioinformatics, pp. 1974-197,, Vol. 25, no. 15, 2009

http://varna.lri.fr/

get_conserved(consensus, start=0, to_pymol=True, offset=0)[source]

Start UCGGGGUGCCCUUCUGCGUG————————————————–AAGGC-UGAGAAAUACCCGU————————————————-AUCACCUG-AUCUGGAU-AAUGC XXXXXXXXXXXXGXGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX—————————-XXXXX-XCUGAGAXXXXXXXXXXXXXXXXXXXXXX———————————-XXXXXXXX-XXXXXXXX-ACXUG

get_ss_std()[source]
remove_columns(to_remove)[source]

indexing from 0

remove_gaps(check_bps=True, only_canonical=True, allow_gu=True)[source]

Remove gaps from seq and secondary structure of the seq.

Parameters:
  • check_bps (bool) – fix mistakes as
  • only_canonical (bool) – keep in ss only pairs GC, AU
  • allow_gu (bool) – keep in ss also GU pair
_images/ss_misgap.png

A residue “paired” with a gap.

_images/ss_misgap_wrong.png

paired with any residues (in the blue circle). If yes, then this residues is unpair (in this case ) -> .).

_images/ss_misgap_ok.png

if only_canonical (by default) is True then only GC, AU can be paired.

_images/ss_only_canonical.png

If allow_gu is False (be default is True) then GU pair is also possible.

_images/ss_no_gu.png

If you provide seq and secondary structure such as:

GgCcGGggG.GcggG.cc.u.aAUACAAuACCC.GaAA.GGGGAAUAaggCc.gGCc.gu......CU.......uugugcgGUuUUcaAgCccCCgGcCaCCcuuuu
(((((((((....((.((............(((......)))......))))..(((.(.....................)))).......)))))))))........

gaps will be remove as well.

ss_to_bps()[source]

Convert secondary structure into a list of basepairs.

Returns:a list of base pairs, e.g. [[0, 80], [1, 79], [2, 78], [4, 77], [6, 75], [7, 74], …]
Return type:bps (list)

RNAalignment

class rna_pdb_tools.utils.rna_alignment.rna_alignment.RNAalignment(fn='', fetch='')[source]

RNA alignment - adapter class around BioPython to do RNA alignment stuff

Usage (for more see IPython notebook https://github.com/mmagnus/rna-pdb-tools/blob/master/rna_pdb_tools/utils/rna_alignment/rna_alignment.ipynb)

>>> a = RNAalignment('test_data/RF00167.stockholm.sto')
>>> print(a.tail())
>>> print(a.ss_cons)
Parameters:
  • fn (str) – Filename
  • io (Bio.AlignIO) – AlignIO.read(fn, “stockholm”)
  • lines (list) – List of all lines of fn
  • seqs (list) – List of all sequences as class:RNASeq objects
  • rf (str) – ReFerence annotation, the consensus RNA sequence

Read more:

and on the format itself

Warning

fetch requires urllib3

align_seq(seq)[source]

Align seq to the alignment.

Using self.rf.

Parameters:seq (str) – sequence, e.g. -GGAGAGUA-GAUGAUUCGCGUUAAGUGUGUGUGA-AUGGGAUGUC...
Returns:seq that can be inserted into alignemnt, .-.GG.AGAGUA-GAUGAUUCGCGUUA ! . -> -
Return type:str
copy_ss_cons_to_all(verbose=False)[source]
copy_ss_cons_to_all_editing_sequence(seq_id, before, after)[source]

Change a sequence’s sec structure.

Parameters:
  • seq_id – string, sequence id to change, eg: AE009948.1/1094322-1094400
  • before – string, character to change from, eg: ,
  • after – string, character to change to, eg: .

Warning

before and after has to be one character long

describe()[source]

Describe the alignment.

> print(a.describe()) SingleLetterAlphabet() alignment with 13 rows and 82 columns

find_core(ids=None)[source]

Find common core for ids.

_images/find_core.png

Fig. By core, we understand columns that have all homologous residues. The core is here marked by x.

Parameters:id – list, ids of seq in the alignment to use
find_seq(seq, verbose=False)[source]

Find seq (also subsequences) and reverse in the alignment.

Parameters:
  • seq (str) – seq is upper()
  • verbose (bool) – be verbose
seq = "ggaucgcugaacccgaaaggggcgggggacccagaaauggggcgaaucucuuccgaaaggaagaguaggguuacuccuucgacccgagcccgucagcuaaccucgcaagcguccgaaggagaauc"
hit = a.find_seq(seq, verbose=False)
ggaucgcugaacccgaaaggggcgggggacccagaaauggggcgaaucucuuccgaaaggaagaguaggguuacuccuucgacccgagcccgucagcuaaccucgcaagcguccgaaggagaauc
Match: AL939120.1/174742-174619
ID: AL939120.1/174742-174619
Name: AL939120.1
Description: AL939120.1/174742-174619
Number of features: 0
/start=174742
/end=174619
/accession=AL939120.1
Per letter annotation for: secondary_structure
Seq('CCAGGUAAGUCGCC-G-C--ACCG---------------GUCA-----------...GGA', SingleLetterAlphabet())
GGAUCGCUGAACCCGAAAGGGGCGGGGGACCCAGAAAUGGGGCGAAUCUCUUCCGAAAGGAAGAGUAGGGUUACUCCUUCGACCCGAGCCCGUCAGCUAACCUCGCAAGCGUCCGAAGGAGAAUC
find_seq_exact(seq, verbose=False)[source]

Find seq (also subsequences) and reverse in the alignment.

Parameters:
  • seq – string, seq, seq is upper()
  • verbose – boolean, be verbose or not
format_annotation(t)[source]
get_clean_ss(ss)[source]
get_distances()[source]

Get distances (seq identity) all-vs-all.

With BioPython.

blastn: Bad alphabet 'U' in sequence 'AE008922.1/409481-409568' at position '7' only for DNA?

read more (also about matrix at <http://biopython.org/wiki/Phylo> and HTTP://biopython.org/DIST/docs/api/Bio.Phylo.TreeConstruction.DistanceCalculator-class.html

get_gc_rf()[source]

Return (str) #=GC RF or ‘’ if this line is not in the alignment.

get_seq(seq_id)[source]
get_seq_ss(seq_id)[source]
get_seq_with_name(seq_name)[source]
get_shift_seq_in_align()[source]

RF_cons vs ‘#=GC RF’ ???

get_ss_cons()[source]
Returns:SS_cons_pk line or None if there is now SS_cons_pk.
get_ss_cons_pk()[source]
Returns:SS_cons_pk line or None if there is now SS_cons_pk:
get_ss_remove_gaps(seq, ss)[source]
Parameters:
  • seq – string, sequence
  • ss – string, ss

UAU-AACAUAUAAUUUUGACAAUAUGG-GUCAUAA-GUUUCUACCGGAAUACC–GUAAAUAUUCU—GACUAUG-UAUA- (((.(.((((,,,(((((((_______.))))))).,,,,,,,,(((((((__.._____))))))…),,)))).)))).

get_the_closest_seq_to_ref_seq(verbose=False)[source]

Example:

>>> a = RNAalignment("test_data/RF02221.stockholm.sto")
>>> a.get_the_closest_seq_to_ref_seq()
AF421314.1/431-344
head()[source]
map_seq_on_align(seq_id, resis, v=True)[source]
Parameters:
  • seqid – seq_id, ‘CP000721.1/2204691-2204775’
  • resis – list resis, [5,6]

maps:

[5, 6, 8]
CAC-U
CAC-U-
CAC-U-UA
[4, None, 6]
map_seq_on_seq(seq_id, seq_id_target, resis, v=True)[source]
Parameters:
  • seq_id – seq_id, ‘AAML04000013.1/228868-228953’
  • seq_id_target – seq_id of target, ‘CP000721.1/2204691-2204778’
  • resis – list resis, [5,6]

map:

[4, 5, 6]
UAU-A
UAU-AA
UAU-AAC
[5, 6, 7]
CAC-U
CAC-U-
CAC-U-U
[4, None, 5]
plot(plot_fn='rchie.png')[source]
reload_alignment()[source]
remove_empty_columns(verbose=False)[source]

Remove empty columns in place.

Example:

>>> a = RNAalignment("test_data/zmp.stk")
>>> print(a)
SingleLetterAlphabet() alignment with 6 rows and 319 columns
---ACCUUGCGCGACUGGCGAAUCC-------------------...AAU CP001644.1/756294-756165
--GCUCUCGCGCGACUGGCGACUUUG------------------...GAA CU234118.1/352539-352459
UGAGUUUUCUGCGACUGACGGAUUAU------------------...CUG BAAV01000055.1/2897-2982
GCCCGUUCGCGUGACUGGCGCUAGU-------------------...CGA CP000927.1/5164264-5164343
-----GGGUCGUGACUGGCGAACA--------------------...--- zmp
UCACCCCUGCGUGACUGGCGAUA---------------------...GUU AP009385.1/718103-718202
>>> a.remove_empty_columns()
>>> print(a)
SingleLetterAlphabet() alignment with 6 rows and 138 columns
---ACCUUGCGCGACUGGCGAAUCC-UGAAGCUGCUUUG-AGCG...AAU CP001644.1/756294-756165
--GCUCUCGCGCGACUGGCGACUUUG------------------...GAA CU234118.1/352539-352459
UGAGUUUUCUGCGACUGACGGAUUAU------------------...CUG BAAV01000055.1/2897-2982
GCCCGUUCGCGUGACUGGCGCUAGU-------------------...CGA CP000927.1/5164264-5164343
-----GGGUCGUGACUGGCGAACA--------G-----------...--- zmp
UCACCCCUGCGUGACUGGCGAUA--------GAACCCUCGGGUU...GUU AP009385.1/718103-718202

go over all seq modifes self.nss_cons

ss_cons_std
ss_cons_with_pk

go over ss_cons and overwrite bp is there is pk (ss_cons_pk)

ss_cons: (((.(.((((,,,(((((((_______.))))))).,,,,,,,,(((((((__.._____))))))…),,)))).)))). ss_cons_pk: …………………..[[………………………….]]…………………… ss_cons_with_pk: (((.(.((((,,,(((((((___[[__.))))))).,,,,,,,,(((((((__.._]]__))))))…),,)))).)))).

“return ss_cons_with_pk: string, e.g. (((.(.((((,,,(((((((___[[__.))))

ss_cons_with_pk_std
subset(ids, verbose=False)[source]

Get subset for ids:

# STOCKHOLM 1.0
#=GF WK Tetrahydrofolate_riboswitch

AAQK01002704.1/947-1059 -U-GC-AAAAUAGGUUUCCAUGC.. #=GC SS_cons .(.((.((—-((((((((((… #=GC RF .g.gc.aGAGUAGggugccgugc.. //

tail()[source]
trimmed_rf_and_ss()[source]

Remove from RF and SS gaps.

Returns:trf, tss - new RF and SS
Return type:(str,str)
write(fn, verbose=False)[source]

Write the alignment to a file

Random assignment of nucleotides

random_assignment_of_nucleotides.py - Random assignment of nucleotides for non-typical characters in the sequence alignment (arg –alignfn or fasta file with sequneces (arg –seqfn)

R = G A (purine)
Y = U C (pyrimidine)
K = G U (keto)
M = A C (amino)
S = G C (strong bonds)
W = A U (weak bonds)
B = G U C (all but A)
D = G A U (all but C)
H = A C U (all but G)
V = G C A (all but T)
N = A G C U (any)

author: A. Zyla - azyla

Warning

Tested only on fasta files! and requires Biopython (tested with v1.68)

usage: random_assignment_of_nucleotides [-h] [-v] [--alignfn ALIGNFN]
                                        [--seqfn SEQFN] [--outfn OUTFN]

Named Arguments

-v, --verbose

increase output verbosity

Default: False

--alignfn alignment in the Fasta format
--seqfn sequences in the Fasta format
--outfn output aln file (default: alnfn .fasta -> _out.fasta)

random_assignment_of_nucleotides.py - Random assignment of nucleotides for non-typical characters in the sequence alignment (arg –alignfn or fasta file with sequneces (arg –seqfn)

R = G A (purine)
Y = U C (pyrimidine)
K = G U (keto)
M = A C (amino)
S = G C (strong bonds)
W = A U (weak bonds)
B = G U C (all but A)
D = G A U (all but C)
H = A C U (all but G)
V = G C A (all but T)
N = A G C U (any)

author: A. Zyla - azyla

Warning

Tested only on fasta files! and requires Biopython (tested with v1.68)

rna_pdb_tools.utils.rna_alignment.random_assignment_of_nucleotides.get_align(alignfn)[source]
Get seq from an alignment with gaps.
Args:
alignfn (str): a path to an alignment
Usage::
>>> get_align('test_data/aln1.fasta')
SingleLetterAlphabet() alignment with 2 rows and 13 columns
AGGGGGACAGNYU 1
CYGA------CGG 2
obj1’, description=’obj1’, dbxrefs=[]), id=’<unknown id>’, name=’<unknown name>’, description=’<unknown description>’, dbxrefs=[])
Returns:
alignment
rna_pdb_tools.utils.rna_alignment.random_assignment_of_nucleotides.get_parser()[source]
rna_pdb_tools.utils.rna_alignment.random_assignment_of_nucleotides.get_sequences(seqfn)[source]

Get seq from an fasta file. :param seqfn: a path to a fasta file :type seqfn: str

Usage::
>>> get_align('test_data/fasta.fasta')
Returns:[SeqRecord(seq=Seq(‘GGGYYGCCNRW’, SingleLetterAlphabet()), id=‘1’, name=‘1’, description=‘1’, dbxrefs=[]), SeqRecord(seq=Seq(‘GGRGYYGCCUURWAA’, SingleLetterAlphabet()), id=‘1’, name=‘1’, description=‘1’, dbxrefs=[])]
rna_pdb_tools.utils.rna_alignment.random_assignment_of_nucleotides.write_align(align, outfn)[source]

Write cleaned alignment with Biopython. :param align: a cleaned alignment :type align: obj :param outfn: a path to a new alignment file :type outfn: str

Returns:writes to a file in fasta format
Return type:none
rna_pdb_tools.utils.rna_alignment.random_assignment_of_nucleotides.write_seq(seqfn, outfn)[source]

Write cleaned alignment with Biopython. :param align: a cleaned alignment :type align: obj :param outfn: a path to a new alignment file :type outfn: str

Returns:writes to a file in fasta format
Return type:none

CMAlign

class rna_pdb_tools.utils.rna_alignment.rna_alignment.CMAlign(outputfn=None)[source]

CMAalign class around cmalign (of Inferal).

cmalign - aligns the RNA sequences in <seqfile> to the covariance model (CM) in <cmfile>. The new alignment is output to stdout in Stockholm format.

Example:

cma = ra.CMAlign()
cma.run_cmalign("ade_seq.fa", "RF00167.cm")
seq = cma.get_seq()
print 'cma hit  ', seq
print 'seq      ', a.align_seq(seq)
print 'a.rf     ', a.rf

cmd cmalign -g RF00167.cm ade_seq.fa

# STOCKHOLM 1.0
#=GF AU Infernal 1.1.2

ade          ----------------CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAG-CCUUAAA-CUCUUGAUUAUGAAGUGA------------
#=GR ade PP  ................99*********************************************.*******.***************999............
#=GC SS_cons :::::::::::::::::((((((((,,,<<<<<<<_______>>>>>>>,,,,,,,,<<<<<<<_______>>>>>>>,,))))))))::::::::::::::
#=GC RF      aaaaaauaaaaaaaauucccuCgUAUAAucccgggAAUAUGGcccgggaGUUUCUACCaggcagCCGUAAAcugccuGACUAcGagggaaauuuuuuuuuuu
//
cma hit   ----------------CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAG-CCUUAAA-CUCUUGAUUAUGAAGUGA------------
seq       ----------------CGCU-U-CAUAUAAUCCUAAUGAUAUGG-UUUGGGA-GUUUCUACCAAGAG-CC--UUAAA-CUCUU---GAUUAUG-AAGUGA-------------
a.rf      aaaaaauaaaaaaaauuccc.u.CgUAUAAucccgggAAUAUGG.cccggga.GUUUCUACCaggcagCC..GUAAAcugccu...GACUAcG.agggaaauuuuuuuuuuu.

Install http://eddylab.org/infernal/

Cite: Nawrocki and S. R. Eddy, Infernal 1.1: 100-fold faster RNA homology searches, Bioinformatics 29:2933-2935 (2013).

get_gc_rf()[source]

Get #=GC RF.

Variables:self.output – string
get_seq()[source]
Variables:self.output – output of cmalign, string
run_cmalign(seq, cm, verbose=True)[source]

Run cmalign and process the result.

Parameters:
  • seq – seq string
  • cm – cm fn

Run:

$ cmalign RF01831.cm 4lvv.seq
# STOCKHOLM 1.0
#=GF AU Infernal 1.1.2

4lvv         -GGAGAGUA-GAUGAUUCGCGUUAAGUGUGUGUGA-AUGGGAUGUCG-UCACACAACGAAGC---GAGA---GCGCGGUGAAUCAUU-GCAUCCGCUCCA
#=GR 4lvv PP .********.******************9999998.***********.8999999******8...5555...8**************.************
#=GC SS_cons (((((----(((((((((((,,,,,<<-<<<<<<<<___________>>>>>>>>>>,,,<<<<______>>>>,,,)))))))))))-------)))))
#=GC RF      ggcaGAGUAGggugccgugcGUuAAGUGccggcgggAcGGGgaGUUGcccgccggACGAAgggcaaaauugcccGCGguacggcaccCGCAUcCgCugcc
//

Warning

requires cmalign to be set in your shell

RChie

class rna_pdb_tools.utils.rna_alignment.rna_alignment.RChie[source]

RChie - plotting arc diagrams of RNA secondary structures.

_images/rchie.png

http://www.e-rna.org/r-chie/

The offline version of R-chie, which requires first installing R4RNA is available here, or clone our git repository here How to install it:

  • Ensure R is installed already, or download it freely from http://www.r-project.org/

  • Download the R4RNA (https://github.com/jujubix/r-chie), open R and install the package:

    install.packages("<path_to_file>/R4RNA", repos = NULL, type="source")
    # Install the optparse and RColorBrewer
    install.packages('optparse')
    install.packages('RColorBrewer')
    
  • Go to rna_pdb_tools/rpt_config_local.py and set RCHIE_PATH to the folder with RChie, e.g. "/home/magnus/work/opt/r-chie/".

To test if Rchie works on your machine (from rna_align folder):

<path to your rchie>/rchie.R --msafile test_data/rchie_test_files/fasta.txt test_data/rchie_test_files/helix.txt

you should have rchie.png file in the folder.

More at http://www.e-rna.org/r-chie/download.cgi

Cite: Daniel Lai, Jeff R. Proctor, Jing Yun A. Zhu, and Irmtraud M. Meyer (2012) R-chie: a web server and R package for visualizing RNA secondary structures. Nucleic Acids Research, first published online March 19, 2012. doi:10.1093/nar/gks241

plot_cov(seqs, ss_cons, plot_fn='rchie.png', verbose=False)[source]

Plot an RChie plot_conv.

Parameters:
  • seqs – seqs in the fasta format
  • ss_cons – a string of secondary structure consensus, use only ().. Works with pseuoknots.
show()[source]
write(outfn)[source]

Renumber a pdb file according to alignment

renum_pdb_to_aln.py - renumber a pdb file based on the alignment.

author: A. Zyla under supervision of mmagnus

Warning

works only for single chain! and requires Biopython (tested with v1.68)

usage: renum_pdb_to_aln [-h] [-v] [--residue_index_start RESIDUE_INDEX_START]
                        [--outfn OUTFN]
                        seqid alignfn pdbfn

Positional Arguments

seqid seq id in the alignemnt
alignfn alignemnt in the Fasta format
pdbfn pdb file

Named Arguments

-v, --verbose

increase output verbosity

Default: False

--residue_index_start
 

renumber starting number (default: 1)

Default: 1

--outfn output pdb file (default: pdbfn .pdb -> _out.pdb)

renum_pdb_to_aln.py - renumber a pdb file based on the alignment.

author: A. Zyla under supervision of mmagnus

Warning

works only for single chain! and requires Biopython (tested with v1.68)

rna_pdb_tools.utils.renum_pdb_to_aln.renum_pdb_to_aln.get_parser()[source]
rna_pdb_tools.utils.renum_pdb_to_aln.renum_pdb_to_aln.get_seq(alignfn, seqid)[source]

Get seq from an alignment with gaps.

Parameters:
  • alignfn (str) – a path to an alignment
  • seqid (str) – seq id in an alignment

Usage:

>>> get_seq('test_data/ALN_OBJ1_OBJ2.fa', 'obj1')
SeqRecord(seq=SeqRecord(seq=Seq('GUUCAG-------------------UGAC-', SingleLetterAlphabet()), id='obj1', name='obj1', description='obj1', dbxrefs=[]), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])
Returns:SeqRecord
rna_pdb_tools.utils.renum_pdb_to_aln.renum_pdb_to_aln.open_pdb(pdbfn)[source]

Open pdb with Biopython.

Parameters:pdbfn (str) – a path to a pdb structure
Returns:with a pdb structure
Return type:PDB Biopython object
rna_pdb_tools.utils.renum_pdb_to_aln.renum_pdb_to_aln.renumber(seq_with_gaps, struc, residue_index_start)[source]

Renumber a pdb file.

Parameters:
  • seq_with_gaps (str) – a target sequence extracted from the alignment
  • struc (pdb) – a structure
  • residue_index_start (int) – starting number
Returns:

BioPython Structure object

rna_pdb_tools.utils.renum_pdb_to_aln.renum_pdb_to_aln.write_struc(struc, outfn)[source]

Write renumbered pdb with Biopython.

Parameters:
  • struc (pdb) – a renumbered structure
  • outfn (str) – a path to a new, renumbered pdb file
Returns:

writes to a file

Return type:

none

RNA clustering with CLANS (clanstix)

rna_clanstix - a tool for visualizing RNA 3D structures based on pairwise structural similarity with Clans.

We hacked Clans thus instead of BLAST-based distances between sequences, you can analyze distances between structures described as p-values of rmsd (based on the method from the Dokholyan lab.)

Running Clans: To run CLANS you need to have Java 1.4 or better installed (java can be downloaded HERE). For full functionality you will also need the NCBI BLAST,PSI-BLAST and formatdb executables (NCBI). For command line parameters and basic help please refer to the README file. (source: http://www.eb.tuebingen.mpg.de/research/departments/protein-evolution/software/clans.html)

_images/yndSrLTb7l.gif

The RMSDs between structures are converted into p-values based on the method from the Dokholyan lab.

Color groups

You can color your groups:

_images/rna_clanstix.png

To get colors, run a cmd like this:

rna_clastix.py rnapz17_matrix_farfar_HelSeedCst.txt --groups 20:seq1+20+20+20+20+20+20:seq10

where with the + sign you separate groups. Each group has to have a number of structures. Optionally it can have a name, e.g., 20:seq1, use : as a separator. If a provided name is native then this group will be shown as starts.

Get inspiration for more colors (http://www.rapidtables.com/web/color/RGB_Color.htm)

How to use ClanstixRNA?

  1. Get a matrix of distances, save it as e.g. matrix.txt (see Comment below)

  2. run ClanstixRNA on this matrix to get an input file to Clans (e.g. clans_rna.txt):

    rna_clanstix.py test_data/matrix.txt > clans_run.txt
    
  3. open CLANS and click File -> Load run and load clans_run.txt

  4. You’re done! :-)

Comment: To get this matrix you can use for example another tool from the rna-pdb-tools packages:

rna_calc_rmsd_all_vs_all.py -i rp18 -o rp18_rmsd.csv
rna_clastix.py --groups 1:native+5:3dRNA+
      5:Chen+3:Dokh+5:Feng+5:LeeASModel+
      5:Lee+5:RNAComposer+10:RW3D+5:Rhiju+
      1:YagoubAli+3:SimRNA  rp18_rmsd.csv | tee clans.in

where rp18 is a folder with structure and rp18_rmsd.csv is a matrix of all-vs-all rmsds.

_images/rp18_clanstix.png

Hajdin, C. E., Ding, F., Dokholyan, N. V, & Weeks, K. M. (2010). On the significance of an RNA tertiary structure prediction. RNA (New York, N.Y.), 16(7), 1340–9. doi:10.1261/rna.1837410

An output of this tool can be viewed using CLANS.

Frickey, T., & Lupas, A. (2004). CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics (Oxford, England), 20(18), 3702–4. doi:10.1093/bioinformatics/bth444

class rna_pdb_tools.utils.clanstix.rna_clanstix.RNAStructClans(n=10)[source]

Clans run.

Usage:

>>> f = open('matrix.txt')
>>> ids = f.readline().replace('#','').split()
>>> c = RNAStructClans(n=len(ids)) # 200?
>>> c.add_ids(ids)
>>> c.dist_from_matrix(f)
>>> print(c.txt)
add_ids(ids)[source]
dist_from_matrix(lines, matrix=0, use_pv=False)[source]
rna_pdb_tools.utils.clanstix.rna_clanstix.get_parser()[source]

Calculate Root Mean Square Deviation (RMSD)

rna_calc_rmsd

rna_calc_rmsd.py - calculate RMSDs of structures to the target

If you still have problem with various number of atoms, check out this issue: get_rnapuzzle_ready: how to deal with `Alternate location indicator (https://github.com/mmagnus/rna-pdb-tools/issues/30).

The program is using (https://github.com/charnley/rmsd).

Example #1:

$ rna_calc_rmsd.py -t 6_0_solution_4GXY_rpr.pdb --model_selection=A:1-17+24-110+115-168 *.pdb
rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
method: all-atom-built-in
# of models: 35
6_0_solution_4GXY_rpr.pdb 0.0 3409
6_Blanchet_1_rpr.pdb 22.31 3409
6_Blanchet_2_rpr.pdb 21.76 3409
6_Blanchet_3_rpr.pdb 21.32 3409
6_Blanchet_4_rpr.pdb 22.22 3409
6_Blanchet_5_rpr.pdb 24.17 3409
6_Blanchet_6_rpr.pdb 23.28 3409
6_Blanchet_7_rpr.pdb 22.26 3409
6_Bujnicki_1_rpr.pdb 36.95 3409
6_Bujnicki_2_rpr.pdb 30.9 3409
6_Bujnicki_3_rpr.pdb 32.1 3409
6_Bujnicki_4_rpr.pdb 32.04 3409
...

Example #2:

time rmsd_calc_to_target.py
  -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
  --target_selection A:1-48+52-63
  --model_selection A:1-48+52-63
  --target_ignore_selection A/57/O2'
  clusters/*_AA.pdb

rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
  target_selection:  A:1-48+52-63
  model_selection:   A:1-48+52-63
  target_ignore_selection:  A/57/O2'
  model_ignore_selection:
# of models: 801
fn,rmsd_all
pistol_thrs0.50A_clust01-000001_AA.pdb,7.596
pistol_thrs0.50A_clust02-000001_AA.pdb,7.766
pistol_thrs0.50A_clust03-000001_AA.pdb,18.171
[..]
pistol_thrs0.50A_clust799-000001_AA.pdb,5.356
pistol_thrs0.50A_clust800-000001_AA.pdb,15.282
pistol_thrs0.50A_clust801-000001_AA.pdb,16.339
# of atoms used: 1237
csv was created!  rmsds.csv
rmsd_calc_to_target.py -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
37.93s user 1.07s system 87% cpu 44.650 total

usage: rna_calc_rmsd [-h] [-t TARGET_FN] [--target_selection TARGET_SELECTION]
                     [--target_ignore_selection TARGET_IGNORE_SELECTION]
                     [--model_selection MODEL_SELECTION]
                     [--model_ignore_selection MODEL_IGNORE_SELECTION]
                     [-m METHOD] [-o RMSDS_FN] [-v]
                     files [files ...]

Positional Arguments

files files

Named Arguments

-t, --target_fn
 

pdb file

Default: “”

--target_selection
 

selection, e.g. A:10-16+20, where #16 residue is included

Default: “”

--target_ignore_selection
 

A/10/O2’

Default: “”

--model_selection
 

selection, e.g. A:10-16+20, where #16 residue is included

Default: “”

--model_ignore_selection
 

A/10/O2’

Default: “”

-m, --method

align, fit

Default: “all-atom-built-in”

-o, --rmsds_fn

ouput, matrix

Default: “rmsds.csv”

-v, --verbose

verbose

Default: False

rna_calc_rmsd.py - calculate RMSDs of structures to the target

If you still have problem with various number of atoms, check out this issue: get_rnapuzzle_ready: how to deal with `Alternate location indicator (https://github.com/mmagnus/rna-pdb-tools/issues/30).

The program is using (https://github.com/charnley/rmsd).

Example #1:

$ rna_calc_rmsd.py -t 6_0_solution_4GXY_rpr.pdb --model_selection=A:1-17+24-110+115-168 *.pdb
rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
method: all-atom-built-in
# of models: 35
6_0_solution_4GXY_rpr.pdb 0.0 3409
6_Blanchet_1_rpr.pdb 22.31 3409
6_Blanchet_2_rpr.pdb 21.76 3409
6_Blanchet_3_rpr.pdb 21.32 3409
6_Blanchet_4_rpr.pdb 22.22 3409
6_Blanchet_5_rpr.pdb 24.17 3409
6_Blanchet_6_rpr.pdb 23.28 3409
6_Blanchet_7_rpr.pdb 22.26 3409
6_Bujnicki_1_rpr.pdb 36.95 3409
6_Bujnicki_2_rpr.pdb 30.9 3409
6_Bujnicki_3_rpr.pdb 32.1 3409
6_Bujnicki_4_rpr.pdb 32.04 3409
...

Example #2:

time rmsd_calc_to_target.py
  -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
  --target_selection A:1-48+52-63
  --model_selection A:1-48+52-63
  --target_ignore_selection A/57/O2'
  clusters/*_AA.pdb

rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
  target_selection:  A:1-48+52-63
  model_selection:   A:1-48+52-63
  target_ignore_selection:  A/57/O2'
  model_ignore_selection:
# of models: 801
fn,rmsd_all
pistol_thrs0.50A_clust01-000001_AA.pdb,7.596
pistol_thrs0.50A_clust02-000001_AA.pdb,7.766
pistol_thrs0.50A_clust03-000001_AA.pdb,18.171
[..]
pistol_thrs0.50A_clust799-000001_AA.pdb,5.356
pistol_thrs0.50A_clust800-000001_AA.pdb,15.282
pistol_thrs0.50A_clust801-000001_AA.pdb,16.339
# of atoms used: 1237
csv was created!  rmsds.csv
rmsd_calc_to_target.py -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
37.93s user 1.07s system 87% cpu 44.650 total
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd(a, b, target_selection, target_ignore_selection, model_selection, model_ignore_selection, verbose)[source]

a is model b is target

Params:a = filename of structure a
Params:b = filename of structure b
Returns:rmsd, number of atoms
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd_pymol(pdb1, pdb2, method)[source]

Calculate rmsd using PyMOL. Two methods are available: align and fit

See:

Align can return a list with 7 items:

RMSD after refinement Number of aligned atoms after refinement Number of refinement cycles RMSD before refinement Number of aligned atoms before refinement Raw alignment score Number of residues aligned

in this version of function, the function returns RMSD before refinement.

Install on OSX: brew install homebrew/science/pymol and set PYTHONPATH to your PyMOL packages, .e.g

PYTHONPATH=$PYTHONPATH:/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages

If problem:

Match-Error: unable to open matrix file '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/data/pymol/matrices/BLOSUM62'.

then define PYMOL_PATH in your .bashrc, e.g.:

export PYMOL_PATH=/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymol/
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.get_parser()[source]
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.get_rna_models_from_dir(files)[source]
Parameters:models – a list of filenames

Example of the list:

['test_data/rp17/2_restr1_Michal1.pdb_clean.pdb', 'test_data/rp17/2a_nonrestr2_Michal1.pdb_clean.pdb',
'test_data/rp17/3_nonrestr1_Michal1.pdb_clean.pdb', 'test_data/rp17/5_restr1_Michal3.pdb_clean.pdb']
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.sort_nicely(l)[source]

Sort the given list in the way that humans expect.

http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

rna_calc_evo_rmsd

rna_calc_rmsd_trafl

rmsd_calc_trafl - calculate RMSD of transition A->B based on a SimRNA trajectory

After this script, run:

rna_cal_rmsd_trafl_plot.py rmsd.txt

to get a plot like this:

_images/rmsd_transition.png

Prepare structures:

$ SimRNA -p 17_Das_2_rpr.pdb -n 0 -o 17_Das_2_rpr_n0 # no trafl, trafl will be added
$ SimRNA -p 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped.pdb -n 0 -o 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped
#(struc must be (~CG~) nope. It has to be a trajectory!)

and run:

 $ rmsd_calc_trafl.py 17_Das_2_rpr.pdb.trafl 17_Das_2_rpr_n0.trafl 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl rp17_rmsd.txt
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl 17_Das_2_rpr.pdb_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e
 > struc: 17_Das_2_rpr_n0.trafl 2
 > trafl: 17_Das_2_rpr.pdb.trafl 48
 % saved: 17_Das_2_rpr.pdb.trafl_17_Das_2_rpr_n0.trafl
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl_17_Das_2_rpr_n0.trafl 17_Das_2_rpr.pdb_rmsd_e_17_Das_2_rpr_n0_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e_17_Das_2_rpr_n0_rmsd_e
 > struc: 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl 2
 > trafl: 17_Das_2_rpr.pdb.trafl 48
 % saved: 17_Das_2_rpr.pdb.trafl_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl 17_Das_2_rpr.pdb_rmsd_e_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0_rmsd_e
  0.000 -695.634
  0.000 -551.093
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 < out: rp17_rmsd.txt

Warning

calc_rmsd_to_1st_frame (SimRNA) is required and the path to the binary file is defined in config_local.

usage: rna_calc_evo_rmsd [-h] trafl struc1 struc2 rmsds_fn

Positional Arguments

trafl trafil
struc1 structure A
struc2 structure B
rmsds_fn output file

rna_calc_rmsd_all_vs_all

rna_calc_rmsd_all_vs_all.py - calculate RMSDs all vs all.

Examples:

rna_calc_rmsd_all_vs_all.py -i test_data -o test_output/rmsd_calc_dir.tsv
 # of models: 4
... 1 test_data/struc1.pdb
... 2 test_data/struc2.pdb
... 3 test_data/struc3.pdb
... 4 test_data/struc4.pdb

The program is using (https://github.com/charnley/rmsd)

usage: rna_calc_rmsd_all_vs_all [-h] [-i INPUT_DIR] [-o MATRIX_FN]

Named Arguments

-i, --input_dir
 

input folder with structures

Default: “”

-o, --matrix_fn
 

ouput, matrix

Default: “matrix.txt”

rna_calc_rmsd_all_vs_all.py - calculate RMSDs all vs all.

Examples:

rna_calc_rmsd_all_vs_all.py -i test_data -o test_output/rmsd_calc_dir.tsv
 # of models: 4
... 1 test_data/struc1.pdb
... 2 test_data/struc2.pdb
... 3 test_data/struc3.pdb
... 4 test_data/struc4.pdb

The program is using (https://github.com/charnley/rmsd)

rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.calc_rmsd(a, b)[source]

Calc rmsd.

rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.get_parser()[source]
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.get_rna_models_from_dir(directory)[source]
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.sort_nicely(l)[source]

Sort the given list in the way that humans expect. http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

Calculate Interaction Network Fidelity (INF) and not only

rna_calc_inf

usage: rna_calc_inf [-h] [-t TARGET_FN] [-m NT] [-s SS] [-f] [-v] [-o OUT_FN]
                    files [files ...]

Positional Arguments

files files, .e.g folder_with_pdbs/*pdbs

Named Arguments

-t, --target_fn
 

pdb file

Default: “”

-m, --number_of_threads
 

number of threads used for multiprocessing, if 1 then mp is not used (useful for debugging)!

Default: 8

-s, --ss

A:(([[))]]

Default: “”

-f, --force

force to run ClaRNA even if <pdb>.outCR file is there

Default: False

-v, --verbose

be verbose, tell me more what’re doing

Default: False

-o, --out_fn

out csv file, be default inf.csv

Default: “inf.csv”

A tool to calc inf_all, inf_stack, inf_WC, inf_nWC, SNS_WC, PPV_WC, SNS_nWC, PPV_nWC between two structures.

Mind, that ClaRNA is pretty slow, it takes even a few seconds to analyze a structure, so for, say, 1000 models you need a few hours.

How to make it faster? First, you can use --number_of_threads to specify the number of cores used for multiprocessing.

Second, the procedure implemented in here is composed of two steps, first for each structure ClaRNA is used to generate an output with contacts, then these files are used for comparisons. So, if you want to re-run your analysis, you don’t have to run re-run ClaRNA itself. Thus, be default ClaRNA is not executed if <model>.outCR is found next to the analyzed files. To change this behavior force (--force) rna_cal_inf.py to re-run ClaRNA.

ClaRNA_play required! https://gitlab.genesilico.pl/RNA/ClaRNA_play (internal GS gitlab server). Contact <magnus@genesilico.pl>.

import progressbar (in version 2) is required!

rna_pdb_tools.utils.rna_calc_inf.rna_calc_inf.do_job(i)[source]

Run ClaRNA & Compare, add 1 to the counter, write output to csv file (keeping it locked)

rna_pdb_tools.utils.rna_calc_inf.rna_calc_inf.get_parser()[source]

rna_calc_dinf

Obtain a list of interaction in an RNA molecule where “Interaction” is purely distance based (defined by –cuf-off). Later, you can use it to calculate distance based INF, dINF ;-).

Example:

[mm] rna_calc_inf$ git:(master)$ ./rna_calc_dinf.py  test_output/1Y26.pdb
X    13   X   14          bp   G   C                  WW_cis   1
X    13   X   83          bp   G   C                  WW_cis   1
X    13   X   82          bp   U   C                  WW_cis   1
X    14   X   15          bp   C   G                  WW_cis   1
X    14   X   83          bp   G   G                  WW_cis   1
X    14   X   81          bp   G   G                  WW_cis   1
X    14   X   82          bp   U   G                  WW_cis   1

use clarna_compare.py:

[mm] rna_calc_inf$ ./rna_calc_dinf.py  test_output/1Y26.pdb > 1Y26.pdb.outCR
[mm] rna_calc_inf$ clarna_compare.py -iref 1Y26.pdb.outCR -ichk 1Y26.pdb.outCR
1Y26.pdb.outCR                               1Y26.pdb.outCR      1.000      0.000      1.000      1.000      1.000      1.000      1.000      1.000

You can use -d to get a list of all interacting bases, something like:

draw_dists([(13, 14),(13, 83),...(82, 83)])

so you can plot all interacting bases:

_images/1y26_dinf.png

Mind, that draw_dists works on C2 atoms, that might be different from atoms detected with the program (e.g. different base atom could be detected to make an interaction).

_images/1y26_dinf2.png

usage: rna_calc_inf [-h] [-d] [-c] [-v] file

Positional Arguments

file a PDB file

Named Arguments

-d, --draw-dists
 Default: False
-c, --cut-off Default: 5
-v, --verbose

be verbose

Default: False

Measure distance between atoms

diffpdb

diffpdb - a simple tool to compare text-content of PDB files

The method is quick-and-dirty, but works!

The script takes first 31 characters of lines (or only atom names and residue names) starting with HETATM or ATOM and save these lines to a <filename>.out file.

One file is created per pdb. In the final step DIFF_TOOL is executed on these two output files. You get a diff output. That’s it! Enjoy!

Configuration:

  • DIFF_TOOL="open -a diffmerge" or DIFF_TOOL="kompare" to set up what tool would you like to use to diff files in the file rna-pdb-tools/utils/diffpdb/diffpdb_conf.py (create it if needed)
_images/screenshot.png

./diffpdb.py --names test_data/4/1duq.pdb test_data/4/1duq_decoy0171_amb_clx.pdb

_images/screenshot2.png

and on the Mac (using diffmerge):

_images/diffpdb_osx_diffmerge.png

One of the difference that can be detected with the script is variants of atoms.

_images/atom-variants.png

or a detection of missing atom.

_images/missing-atoms.png

or a detection of missing OP3 at the beginning.

_images/missing-op3.png

diffpdb - a simple tool to compare text-content of PDB files

The method is quick-and-dirty, but works!

The script takes first 31 characters of lines (or only atom names and residue names) starting with HETATM or ATOM and save these lines to a <filename>.out file.

One file is created per pdb. In the final step DIFF_TOOL is executed on these two output files. You get a diff output. That’s it! Enjoy!

Configuration:

  • DIFF_TOOL="open -a diffmerge" or DIFF_TOOL="kompare" to set up what tool would you like to use to diff files in the file rna-pdb-tools/utils/diffpdb/diffpdb_conf.py (create it if needed)
_images/screenshot.png

./diffpdb.py --names test_data/4/1duq.pdb test_data/4/1duq_decoy0171_amb_clx.pdb

_images/screenshot2.png

and on the Mac (using diffmerge):

_images/diffpdb_osx_diffmerge.png

One of the difference that can be detected with the script is variants of atoms.

_images/atom-variants.png

or a detection of missing atom.

_images/missing-atoms.png

or a detection of missing OP3 at the beginning.

usage: rna_helix_vis [-h] [--names] [--names_and_resi] [--htmlout]
                     [--method METHOD]
                     f1 f2

Positional Arguments

f1 file
f2 file

Named Arguments

--names

take only atom residues names

Default: False

--names_and_resi
 

take only atom residues names

Default: False

--htmlout

take only atom residues names

Default: False

--method method e.g. diff

RNA filter

rna_filter.py - calculate distances based on given restrants on PDB files or SimRNA trajectories

rna_filter.py - calculate distances based on given restrants on PDB files or SimRNA trajectories.

Changes: weight is always 1 (at least for now). ,>,=,>=,<= .

[PREVIOUS DOCUMENTATION - TO BE REMOVED]

rna_filter.py -s 4gxy_rpr.pdb -r rp06_MohPairs.rfrestrs d:A5-A42 100.0 measured: 26.7465763417 [x] d:A11-A26 100.0 measured: 19.2863696104 [x]

[mm] rp06$ git:(master) $ rna_filter.py -s 4gxy_rpr.pdb -r rp06_MohPairs.rfrestrs
d:A5-A42 100.0 measured: 26.7465763417 [x] d:A11-A26 100.0 measured: 19.2863696104 [x]
Traceback (most recent call last):
File “/home/magnus/work-src/rna-pdb-tools/bin/rna_filter.py”, line 270, in <module>
calc_scores_for_pdbs(args.structures, restraints, args.verbose)
File “/home/magnus/work-src/rna-pdb-tools/bin/rna_filter.py”, line 221, in calc_scores_for_pdbs
dist = get_distance(residues[h[0]][‘mb’], residues[h[1]][‘mb’])

KeyError: ‘A24’

correct, there is no A24 in this structure:

The format of restraints:

(d:A1-A2 < 10.0 1) = if distance between A1 and A2 lower than 10.0, score it with 1

Usage:

$ python rna_filter.py -r test_data/restraints.txt -s test_data/CG.pdb
 d:A1-A2 10.0 measured: 6.58677550096 [x]
test_data/CG.pdb 1.0 1 out of 1

# $ python rna_filter.py -r test_data/restraints.txt -t test_data/CG.trafl
(d:A1-A2 <  10.0  1)|(d:A2-A1 <= 10 1)
 restraints [('A1', 'A2', '<', '10.0', '1'), ('A2', 'A1', '<=', '10', '1')]

Frame #1 e:1252.26
  mb for A1 [ 54.729   28.9375  41.421 ]
  mb for A2 [ 55.3425  35.3605  42.7455]
   d:A1-A2 6.58677550096
  mb for A2 [ 55.3425  35.3605  42.7455]
  mb for A1 [ 54.729   28.9375  41.421 ]
   d:A2-A1 6.58677550096
# this ^ is off right now

usage: rna_filter.py [-h] -r RESTRAINTS_FN [-v]
                     [-s STRUCTURES [STRUCTURES ...]] [--offset OFFSET]
                     [-t TRAJECTORY]

Named Arguments

-r, --restraints_fn
 restraints_fn: Format: (d:A9-A41 < 10.0 1)|(d:A41-A9 <= 10 1)
-v, --verbose

be verbose

Default: False

-s structures
--offset

use offset to adjust your restraints to numbering in PDB files, ade (1y26)pdb starts with 13, so offset is -12)

Default: 0

-t SimRNA trajectory

rna_dca_mapping.py

show_dists - show distances in PyMOL

show_dists - show distances in PyMOL

_images/getdists.png

Usage:

PyMOL>show_dists([[1,2]])
1, 2, 3.41

rna_ex2x.py - analyze an evolutionary coupling file.

rna_ex2x.py - analyze an evolutionary coupling file.

Files can be downloaded from https://marks.hms.harvard.edu/ev_rna/, e.g. RF00167.EC.interaction.csv

--pairs:

$ rna_ex2x.py RF00167.EC.interaction_LbyN.csv --pairs
[18, 78],[31, 39],[21, 75],[30, 40],[28, 42],[27, 43],[59, 67],[54, 72],[57, 69],[25, 45],[29, 41],[17, 79],[26, 44],[16, 80],[14, 82],[19, 77],[55, 71],[
    15, 81],[34, 63],[56, 70],[58, 68],[35, 63],[26, 45],[35, 64],[32, 39],[54, 73],[24, 74],[16, 82],[24, 45],[24, 43],[32, 36],[25, 48],[48, 82],[36, 48],

usage: rna_ec2x.py [-h] [--sep SEP] [--chain CHAIN] [--ec-pairs]
                   [--ss-pairs SS_PAIRS] [--pairs-delta]
                   interaction_fn

Positional Arguments

interaction_fn interaction file

Named Arguments

--sep

separator

Default: “,”

--chain

chain

Default: “A”

--ec-pairs Default: False
--ss-pairs file with secondary structure base pairs
--pairs-delta

delta: ec-bp - ss-paris

Default: False

rna_pairs2SimRNArestrs.py - convert pairs to SimRNA restraints

rna_pairs2SimRNArestrs.py - convert pairs to SimRNA restraints

Example:

$ rna_pairs2SimRNArestrs.py rp06_pairs_delta.txt -v
# of pairs: 42
SLOPE A/2/MB A/172/MB 0 6 1
SLOPE A/2/MB A/172/MB 0 7 -1
SLOPE A/3/MB A/169/MB 0 6 1
SLOPE A/3/MB A/169/MB 0 7 -1
SLOPE A/12/MB A/32/MB 0 6 1

usage: rna_pairs2SimRNArestrs.py [-h] [--offset OFFSET] [--weight WEIGHT] [-v]
                                 pairs

Positional Arguments

pairs a file with [[2, 172], [3, 169], [12, 32], [13, 31]]

Named Arguments

--offset

can be -10

Default: 0

--weight

weight

Default: 3

-v, --verbose

be verbose

Default: False

rna_ss_get_bps.py - get a list of base pairs for a given “fasta ss” file.

rna_ss_get_bps.py - get a list of base pairs for a given “fasta ss” file.

Input file:

cat ade_ss.fa
>1y26
CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGUG
(((((((((...((((((.........))))))........((((((.......))))))..)))))))))%

Usage:

$ rna_ss_get_bps.py ade_ss.fa --offset 12
[[13, 83], [14, 82], [15, 81], [16, 80], [17, 79], [18, 78], [19, 77], [20, 76], [21, 75], [25, 45], [26, 44], [27, 43], [28, 42], [29, 41], [30, 40], [54, 72], [55, 71], [56, 70], [57, 69], [58, 68], [59, 67]]

Now it also work with pseudoknots.

usage: rna_ss_get_bps [-h] [--offset OFFSET] [-v] file

Positional Arguments

file file in the Fasta format

Named Arguments

--offset offset
-v, --verbose be verbose

rna_pairs_diff.py - get a diff of pairs

rna_pairs_diff.py - get a diff of pairs

Usage:

$ rna_pairs_diff.py pistol_dca_all.bp pistol.bp
# of ec_paris: 31
# of ssbps   : 18
dalta#       : 13
[[4, 32], [6, 9], [6, 36], [6, 39], [9, 39], [13, 32], [16, 17], [17, 18], [22, 49], [29, 58]]

usage: rna_pairs_diff.py [-h] [-v] pairs1 pairs2

Positional Arguments

pairs1 a list of pairs, A
pairs2 a list of pairs to subtract, A-B, results in C(all pairs that are in A and are not in B

Named Arguments

-v, --verbose be verbose

Secondary structure format conversion

rna_convert_pseudoknot_formats

Run this as:

python rna-pk-simrna-to-one-line.py test_data/simrna.ss

Convert:

> a
....((.(..(((....)))..((((.(.........).)))....).)).......((((......))))
..............................((((...................))))..............

to:

> a
....((.(..(((....)))..((((.(..[[[[...).)))....).))...]]]]((((......))))

and:

>2 chains
(((((......)))))........(.((....(.......)..(((. .)))...)).)
.....((((((......................))))))........ ...........

to:

>2 chains
((((([[[[[[)))))........(.((....(]]]]]].)..(((. .)))...)).)

and:

> b
..(.......(((....)))..((((.(.........).))))).............((((......))))
....((.(......................................).)).....................
..............................((((...................))))..............

to:

> b
..(.[[.[..(((....)))..((((.(..{{{{...).)))))..].]]...}}}}((((......))))

and it works with VARNA:

_images/varna_2pk.png

Convert a secondary structure with a pk to the SimRNA format:

rna_convert_pseudoknot_formats git:(master) ✗ python rna_ss_pk_to_simrna.py test_data/ss_with_pk.ss
((((([[[[[[)))))........(.((....(]]]]]].)..(((. .)))...)).)

(((((......)))))........(.((....(.......)..(((. .)))...)).)
.....((((((......................))))))........ ...........
rna_pdb_tools.utils.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.get_multiple_lines(ss)[source]
rna_pdb_tools.utils.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.get_parser()[source]
rna_pdb_tools.utils.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.is_pk(ss)[source]

x3DNA (contacts classification & secondary structure detection)

Python parser to 3dna <http://x3dna.org/>.

Installation:

# install the code from http://forum.x3dna.org/downloads/3dna-download/
Create a copy of the rna_x3dna_config_local_sample.py (remove "_sample") present in rna-pdb-tools/rna_pdb_tools/utils/rna_x3dna folder.
Edit this line :
BINARY_PATH = <path to your x3dna-dssr file>
matching the path with the path of your x3dna-dssr file.
e.g. in my case: BINARY_PATH = ~/bin/x3dna-dssr.bin

For one structure you can run this script as:

[mm] py3dna$ git:(master) ✗ ./rna_x3dna.py test_data/1xjr.pdb
test_data/1xjr.pdb
>1xjr nts=47 [1xjr] -- secondary structure derived by DSSR
gGAGUUCACCGAGGCCACGCGGAGUACGAUCGAGGGUACAGUGAAUU
..(((((((...((((.((((.....))..))..))).).)))))))

For multiple structures in the folder, run the script like this:

[mm] py3dna$ git:(master) ✗ ./rna_x3dna.py test_data/*
test_data/1xjr.pdb
>1xjr nts=47 [1xjr] -- secondary structure derived by DSSR
gGAGUUCACCGAGGCCACGCGGAGUACGAUCGAGGGUACAGUGAAUU
..(((((((...((((.((((.....))..))..))).).)))))))
test_data/6TNA.pdb
>6TNA nts=76 [6TNA] -- secondary structure derived by DSSR
GCGGAUUUAgCUCAGuuGGGAGAGCgCCAGAcUgAAgAPcUGGAGgUCcUGUGtPCGaUCCACAGAAUUCGCACCA
(((((((..((((.....[..)))).((((.........)))).....(((((..]....))))))))))))....
test_data/rp2_bujnicki_1_rpr.pdb
>rp2_bujnicki_1_rpr nts=100 [rp2_bujnicki_1_rpr] -- secondary structure derived by DSSR
CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU
[[[[(((.....(((&{{{{))))))&(((((((.....(.(&]]]]).))))&[[[[[[......[[[&))))]]].]]&}}}}(((.....(((&]]]]))))))
rna_pdb_tools.utils.rna_x3dna.rna_x3dna.get_parser()[source]
class rna_pdb_tools.utils.rna_x3dna.rna_x3dna.x3DNA(pdbfn)[source]

Atributes:

curr_fn report
clean_up(verbose=False)[source]
get_ion_water_report()[source]

@todo File name: /tmp/tmp0pdNHS

no. of DNA/RNA chains: 0 [] no. of nucleotides: 174 no. of waters: 793 no. of metals: 33 [Na=29, Mg=1, K=3]
get_modifications()[source]

Run find_pair to find modifications.

get_secstruc()[source]

Get secondary structure.

get_seq()[source]

Get sequence.

Somehow 1bzt_1 x3dna UCAGACUUUUAAPCUGA, what is P? P -> u

run_x3dna()[source]
exception rna_pdb_tools.utils.rna_x3dna.rna_x3dna.x3DNAMissingFile[source]

ClaRNA (contacts classification)

If you want to calculate “Interaction Network Fidelity (INF) and not only” see rna_calc_inf in the Utils.

usage:

./clarna_app.py ../../input/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
../../input/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
((((([[[[[[)))))........(.((....(]]]]]].)..(((......)))...)).)

Example

from rna_pdb_tools.utils.clarna_app import clarna_app
if __name__ == '__main__':
    ss = '((((.[[[[[[.))))........((((.....]]]]]]...(((((....)))))..))))'
    fnCRref = clarna_app.get_ClaRNA_output_from_dot_bracket(ss)
    f = '../rna_calc_rmsd/test_data/pistol/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb'
    fnCR = clarna_app.clarna_run(f, force=False)
    results = clarna_app.clarna_compare(fnCRref, fnCR)
    print results #
    #tmp_Z42i_..pdb.outCR     5k7c_clean_onechain_renumber_as_puzzle_srr.pdb.outCR      0.706      NA         0.865      NA         0.842      0.889      NA         0.000

Warning

Setup a bash variable: ClaRNA_play_path, and add ClaRNA_play to your $PATH (install ClaRNA_play https://gitlab.genesilico.pl/RNA/ClaRNA_play (internal GS gitlab server)

rna_pdb_tools.utils.clarna_app.clarna_app.clarna_compare(target_cl_fn, i_cl_fn, verbose=False)[source]

Run ClaRNA compare.

Returns:a list target, fn, scores

Scores:

inf_all      0.706
inf_stack -999.999 -> NA
inf_WC       0.865
inf_nWC   -999.999 -> NA
SNS_WC       0.842
PPV_WC       0.889
SNS_nWC     NA
PPV_nWC      0.000

Example of the list:

5k7c_clean_onechain_renumber_as_puzzle_srr.pdb     pistol_thrs0.50A_clust01-000001_AA.pdb
 0.642      NA         0.874      0.000      0.944      0.810      0.000      0.000s

use results.split()[4] to get inf_WC

rna_pdb_tools.utils.clarna_app.clarna_app.clarna_run(fn, force=True)[source]

Run ClaRNA run

fn: str
filename to analyze
Returns:a filename to ClaRNA output (fn + ‘.outCR’)
rna_pdb_tools.utils.clarna_app.clarna_app.get_ClaRNA_output_from_dot_bracket(ss, temp=True, verbose=False)[source]

Get dummy ClaRNA output out of dat bracket secondary structure (ss)

Parameters:ss (string) – secondary structure
Returns:a filename to ClaRNA output
rna_pdb_tools.utils.clarna_app.clarna_app.get_dot_bracket_from_ClaRNAoutput(inCR, verbose=False)[source]

In inCR file

rna_pdb_tools.utils.clarna_app.clarna_app.get_parser()[source]

SimRNA

Select low energy frames

rna_simrna_lowest.py

Select lowest energy frames out of a SimRNA trajectory file. This code uses heavily the SimRNATrajectory class. Be default 100 lowest energy frames is exported.

usage: rna_simrna_lowest.py [-h] [-n NSTRUC] trafl

Positional Arguments

trafl SimRNA trafl file

Named Arguments

-n, --nstruc

SimRNA trafl file

Default: 100

Extract

rna_simrna_extra.py - extract full atom structures from a SimRNA trajectory file.

Options:

SIMRNA_DATA_PATH has to be properly defined in rpt_config_local.

usage: rna_simrna_extract.py [-h] -t TEMPLATE -f TRAFL [-c]
                             [-n NUMBER_OF_STRUCTURES]

Named Arguments

-t, --template template PDB file used for reconstruction to full atom models
-f, --trafl SimRNA trafl file
-c, --cleanup

Keep only *_AA.pdb files, move *.ss_detected and *.pdbto _<traj name folder>

Default: False

-n, --number_of_structures
 Default: 100

SimRNAweb

Download files of a SimRNAweb run

SimRNATrajectory

SimRNATrajectory module.

SimRNATrajectory / Frame / Residue / Atom

class rna_pdb_tools.utils.simrna_trajectory.simrna_trajectory.Atom(name, x, y, z)[source]

x y z coord

get_coord()[source]

Return coords (np.array).

class rna_pdb_tools.utils.simrna_trajectory.simrna_trajectory.Frame(id, header, coords, top_level=False)[source]
Syntax of header:
  • write_number
  • replica_id
  • total_energy
  • energy_without_restraints
  • temperature

Warning

If there is an invalid frame, please use repair_trafl.py to fix the trajectory first.

class rna_pdb_tools.utils.simrna_trajectory.simrna_trajectory.Residue(id, p, c4p, n1n9, b1, b2)[source]

Create Residue object.

Each residue in SimRNA coarse-grained represantation consists only 5 coarse-grained atoms:

  • backbone: p = phospate group, c4p = sugar moiety
  • nucleotide: n1n9 = N1 for pyrimidines, N9 for purines, b1 = C2 for purines and pyrimidines, b2 = C4 for pyrimidines, C6 for purines
get_atoms()[source]

Return all atoms

get_center()[source]

Return MB for residue `((self.n1n9 + self.b2) / 2)`

class rna_pdb_tools.utils.simrna_trajectory.simrna_trajectory.SimRNATrajectory[source]
load_from_file(fn, debug_break=False, top_level=False, only_first_frame=False)[source]

Create a trajectory based on give filename.

Parameters:top_level

top_level = True, don’t make a huge tree of objects (Residues/Atoms) == amazing speed up! Useful if you need only frames, energies and coords as text. You only get the info that is in header of each frame.

top_level = False, makes huge tree of objects (Residues/Atoms) == very slow for a huge trajectories

Warning

Loads up whole trafl file into memory, and get stuck. Use this if you want to compute e.g. distances between atoms, get the positions of specified atoms etc. If you can not process your trajectory

use top_level=True or look at load_from_string() to load a frame by frame from a file.

h(eader), l(line), f(ile)

load_from_list(frames)[source]
load_from_string(c, txt)[source]

Create a trajectory based on given string (txt) with id given by c.

Faster method, loads only one frame at a time to memory, and after computations loads the next frame (memory efficient).

plot_energy(plotfn='plot.png')[source]

Plots the SimRNA energy of the trajetory.

_images/simrnatrajectory.png
save(fn, verbose=True)[source]

Save the trajectory to file.

sort()[source]

Sort frames within the trajectory according to energy.

RNAkb

RNAkb (previous Gromacs) utils.

A module with different functions needed for Gromacs/RNAkb merriage.

Marcin Magnus Albert Bogdanowicz

  1. prepare groups and then (2) mdp score file.
rna_pdb_tools.utils.rnakb_utils.rnakb_utils.fix_gromacs_gro(path, verbose=False)[source]

It’s probably a bug in GROMACS, but box coordinates in gro files are not always separated by spaces. This function guesses how it should be separated and inserts spaces.

Parameters:path = path to gro file (*) –
Output:
  • file is overwritten with a corrected one
rna_pdb_tools.utils.rnakb_utils.rnakb_utils.fix_gromacs_ndx(path)[source]

Sometimes, GROMACS index has some atoms in more than one group, or doesn’t have all the groups grompp requires. This function fixes that.

Parameters:path = path to index file (*) –
Output:
  • index is overwritten with a corrected one
rna_pdb_tools.utils.rnakb_utils.rnakb_utils.format_score_mdp(mdp_out, energygrps, seq, verbose=False)[source]

Get a template score mdp and replace energygrps (it can be generated with prepare_groups) and energygrp_table

rna_pdb_tools.utils.rnakb_utils.rnakb_utils.get_res_code(line)[source]

Get residue code from a line of a PDB file

rna_pdb_tools.utils.rnakb_utils.rnakb_utils.get_res_num(line)[source]

Extract residue number from a line of PDB file

Parameters:line = ATOM line from a PDB file (*) –
Output:
  • residue number as an integer
rna_pdb_tools.utils.rnakb_utils.rnakb_utils.make_rna_gromacs_ready(pdb_string, verbose=False)[source]

GROMACS has some special requirements for PDB files.

Parameters:pdb_string = contents of PDB file as a string (*) –
Output:
  • new PDB returned as a string

(!!!) # hmm… [ RA5 ] will not be detected based on it (!?) Hmm.. becase it detects if the structure is already prepared.

rna_pdb_tools.utils.rnakb_utils.rnakb_utils.make_rna_rnakb_ready(pdb_string, verbose=False)[source]

RNAkb read (difference between this function and make_rna_gromacs_ready is ignoring R5U etc. RNAkb does not treat them differently so there is no point to distinguish them.

Parameters:pdb_string = contents of PDB file as a string (*) –
Output:
  • new PDB returned as a string
rna_pdb_tools.utils.rnakb_utils.rnakb_utils.prepare_groups(fn, gr_fn, potential='aa', verbose=False)[source]

Prepare an index for fn file. gr_fn is a file where gtxt is saved in.

Get seq and uniq & sort it. ['RG5', 'RA', 'RA', 'RA', 'RG', 'RU', 'RA', 'RA', 'RC3'] set(['RU', 'RG', 'RC3', 'RG5', 'RA'])

@todo RG5 etc – done!

gtxt:

del 1
r RU & a C1'
name 1 uC1s
r RU & a C2
name 2 uC2
r RU & a C2'
name 3 uC2s
...

return, gtxt (groups_txt), energygrps . The result is saved to g_fn. energygrps: [‘uP’, ‘uC4s’, ‘uC2’, ‘uC4’, ‘uC6’, ‘gP’, ‘gC4s’, ‘gC2’, ‘gC4’, ‘gC6’, ‘aP’, ‘aC4s’, ‘aC2’, ‘aC4’, ‘aC6’] gtxt: RA del 1 r RU & a P name 1 uP r RU & a C4s name 2 uC4s r RU & a C2 name 3 uC2 r RU & a C4 [..] r RA & a C6 name 15 aC6 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15 name 16 RNA_5pt 0 & ! 16 name 17 other q

rna_pdb_tools.utils.rnakb_utils.rnakb_utils.set_res_code(line, code)[source]

Set residue code from a line of a PDB file

RNA Refinement (QRNAS)

rna_refinement - RNA refinement with QRNAS.

Models of RNA 3D structures obtained by modeling methods often suffer from local inaccuracies such as clashes or physically improbable bond lengths, backbone conformations, or sugar puckers. To ensure high quality of models, a procedure of re nement should be applied as a nal step in the modeling pipeline. The software tool QRNAS was developed in our laboratory to perform local re nement of nucleic acid structures based on an extended version of the AMBER force field. The extensions consist of energy terms associated with introduction of explicit hydrogen bonds, idealization of base pair planarity and regularization of backbone conformation.

Read more: Piatkowski, P., Kasprzak, J. M., Kumar, D., Magnus, M., Chojnowski, G., & Bujnicki, J. M. (2016). RNA 3D Structure Modeling by Combination of Template-Based Method ModeRNA, Template-Free Folding with SimRNA, and Refinement with QRNAS. Methods in Molecular Biology (Clifton, N.J.), 1490(Suppl), 217-235. http://doi.org/10.1007/978-1-4939-6433-8_14

Right now, there is 20k steps of refinement.

_images/qrnas_0k.png

The initial structure, 179c48aa-c0d3-4bd6-8e06-12081da22998_ALL_thrs6.20A_clust01-000001_AA.pdb.

_images/qrnas_3k.png

after 3k, ~10min

_images/qrnas_10k.png

after 10k steps, around 30min

_images/qrnas_20k.png

after 20k steps, around 1h.

Installation of QRNAS

Download the QRNAS package from http://genesilico.pl/qrnas/, unzip the archive, and compile it with the following command:

./qrnamake sequential

This should create an executable version of QRNAS.

Warning

Please, change the name of the binary file from QRNA to QRNAS!

Be default the script searches QRNAS in <rna-pdb-tools>/opt/qrnas/ .

Usage of QRNA:

QRNA - Quick Refinement of Nucleic Acids (0.2 alpha)
     by Juliusz Stasiewicz (jstasiewicz@genesilico.pl)

To use type:
  QRNA -i <input PDBfile> [-o <output PDBfile>] [-c <configfile>] [-p] [-m <restraintsfile>]
OR specify <input PDBfile>, <output PDBfile> and <restraintsfile> in <configfile> and type just:
  QRNA -c <configfile>

Installation of this util

Set up in your bashrc:

export QRNAS_PATH=<your path to qrnas> # e.g. /home/magnus/src/rna-pdb-tools/opt/qrnas

but default rna-pdb-tools searches for qrnas in <rna-pdb-tools>/opt/qrnas.

QRNAS at Peyote2

There is no problem to run QRNAS at our Genesilico cluster, peyote2. Tested by mmagnus –170822. Copy files of QRNAS to peyote and run ./qrnamake sequential.

To run it at a cluster with the Sun Grid Engine queuing system (this one with qusb ;-)):

for p in *.pdb; do echo "rna_refinement.py $p >& ${p}.log" | qsub -cwd -V -pe mpi 1 -N "r_$p" ; done

DONE:

  • [x] clean up the output structure
  • [x] configuration should not be hardcoded

usage: rna_refinement.py [-h] [-s STEPS] [-o OUTPUT_FILE] fn

Positional Arguments

fn input pdb file

Named Arguments

-s, --steps

# of steps, default: 20k

Default: 20000

-o, --output_file
 output pdb file

ROSETTA

A set of wrappers around Rosetta (https://www.rosettacommons.org/), mostly based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

Run (modeling)

run_rosetta - wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http: // www.sciencedirect.com / science / article / pii / S0076687914000524

The script makes(1) a folder for you job, with seq.fa, ss.fa, input file is copied as input.fa to the folder(2) make helices(3) prepare rosetta input files(4) sends jobs to the cluster.

The header is take from the fast file(`` > /header / ``) not from the filename of your Fasta file.

I discovered this:

qstat -xml | tr '\n' ' ' | sed 's#<job_list[^>]*>#\n#g' \

> | sed ‘s#<[^>]*>##g’ | grep ” ” | column -t

(https://stackoverflow.com/questions/26104116/qstat-and-long-job-names) so there is now need to shorted my job ids.

Run:

rna_rosetta_run.py - i - e - r - g - c 200 cp20.fa

-i:

# prepare a folder for a run
>cp20
AUUAUCAAGAAUCUCAAAGAGAGAUAGCAACCUGCAAUAACGAGCAAGGUGCUAAAAUAGAUAAGCCAAAUUCAAUUGGAAAAAAUGUUAA
.(((((....(((((.....)))))(((..(((((..[[[[..)).))).)))......))))).((((......)))).......]]]].

[peyote2] ~ rna_rosetta_run.py - i cp20.fa
run rosetta for:
cp20
AUUAUCAAGAAUCUCAAAGAGAGAUAGCAACCUGCAAUAACGAGCAAGGUGCUAAAAUAGAUAAGCCAAAUUCAAUUGGAAAAAAUGUUAA
.(((((....(((((.....)))))(((..(((((..[[[[..)).))).)))......))))).((((......)))).......]]]].
/home / magnus // cp20 / created
Seq & ss created

Troubleshooting.

If one of the helices is missing you will get:

IOError: [Errno 2] No such file or directory: 'helix1.out'
rosetta_submit.py README_FARFAR o 500 100 taf
Could not find:  README_FARFAR

and the problem was a1 and g8 pairing:

outputting command line to:  helix0.RUN  # previous helix #0
Sequence:  AUGG CCGG
Secstruc:  (((())))
Not complementary at positions a1 and g8!
Sequence:  GUGGG CCCAU
Secstruc:  ((((()))))

Writing to fasta file:  helix2.fasta  # next helix #2

My case with a modeling of rp12

Sequence: cc gc Secstruc: (()) Not complementary at positions 1 and 4!

edit the secondary structure, run the program with -i(init, to overwrite seq.fa, ss.fa) and then it works.

Notes:

rp17hc 6 charactes.

usage: rna_rosetta_run.py [-h] [-i] [-e] [-r] [-g] [-n NSTRUC] [-c CPUS]
                          [--sandbox SANDBOX]
                          file

Positional Arguments

file
file:
> a04

UAUAACAUAUAAUUUUGACAAUAUGGGUCAUAAGUUUCUACCGGAAUACCGUAAAUAUUCUGACUAUGUAUA ((((.((((…((.((((…….)))).))……..(.(((((…….))))).)..))))))))

Named Arguments

-i, --init

prepare _folder with seq and ss

Default: False

-e, --helices

produce h(E)lices

Default: False

-r, --rosetta

prepare rosetta files (still you need go to send jobs to a cluster)

Default: False

-g, --go

send jobs to a cluster(run qsubMINI)

Default: False

-n, --nstruc

# of structures you want to get

Default: 10000

-c, --cpus

# of cpus to be used

Default: 200

--sandbox

where to run it (default: RNA_ROSETTA_RUN_ROOT_DIR_MODELING

Default: “/home/magnus/rosetta-runs”

Get a number of structures

rna_roseta_n.py - show me # of structure in a silent file

Example:

$ rna_rosetta_n.py ade.out
21594

usage: rna_rosetta_n.py [-h] [-v] file

Positional Arguments

file ade.out

Named Arguments

-v, --verbose Default: False

Get a head of a Rosetta silent file

rna_roseta_head.py - get a head of a Rosetta silent file.

Example:

$ rna_rosetta_head.py -n 10000 thf.out
# a new file will be created, thf_10000.out

Silent file:

[peyote2] thf head -n 100 thf.out
SEQUENCE: ggagaguagaugauucgcguuaagugugugugaaugggaugucgucacacaacgaagcgagagcgcggugaaucauugcauccgcucca
SCORE:     score    rna_data_backbone    rna_vdw    rna_base_backbone    rna_backbone_backbone    rna_repulsive    rna_base_pair    rna_base_axis    rna_base_stagger    rna_base_stack    rna_base_stack_axis     rna_rg    atom_pair_constraint    linear_chainbreak       N_WC      N_NWC       N_BS description
REMARK BINARY SILENTFILE
SCORE:  -601.975                0.000     31.113              -16.960                   -3.888           20.501         -302.742          -38.531            -158.004           -80.764               -110.053     23.750                   0.000               33.601         32          6         86    S_000001_000
FOLD_TREE  EDGE 1 4 -1  JEDGE 4 85 1  C4   C2   END  EDGE 4 5 -1  EDGE 85 80 -1  EDGE 85 89 -1  JEDGE 80 40 5  C4   C2   END  EDGE 80 78 -1  EDGE 40 43 -1  EDGE 40 33 -1  JEDGE 33 45 4  C4   C2   END  EDGE 45 54 -1  EDGE 45 44 -1  EDGE 33 20 -1  JEDGE 20 65 3  C2   C4   END  EDGE 65 67 -1  EDGE 20 17 -1  EDGE 65 63 -1  JEDGE 17 69 2  C4   C2   END  JEDGE 63 58 6  C4   C2   END  EDGE 17 6 -1  EDGE 58 59 -1  EDGE 63 60 -1  EDGE 69 77 -1  EDGE 58 55 -1  EDGE 69 68 -1  S_000001_000
RT -0.987743 0.139354 0.0703103 0.135963 0.989404 -0.0509304 -0.0766626 -0.0407466 -0.996224 6.25631 0.103544 0.0647696   S_000001_000
RT -0.98312 0.1587 -0.091045 0.166923 0.981743 -0.0912024 0.074909 -0.10486 -0.991662 5.89962 -1.95819 -0.1075   S_000001_000
RT -0.987645 0.154078 0.0285994 0.153854 0.988044 -0.00989514 -0.0297821 -0.00537275 -0.999542 6.13138 1.047 0.115722   S_000001_000
RT -0.989884 0.140722 0.0180554 0.140036 0.989532 -0.0348618 -0.0227723 -0.0319807 -0.999229 6.21787 0.201588 -0.0264223   S_000001_000
RT -0.988455 0.134013 0.0706914 0.134924 0.990822 0.00825457 -0.0689364 0.0176973 -0.997464 6.19447 0.189237 0.125791   S_000001_000
RT -0.990412 0.138036 0.00546261 0.137927 0.990299 -0.0168644 -0.00773751 -0.0159492 -0.999843 6.25456 0.0842849 -0.0135807   S_000001_000
ANNOTATED_SEQUENCE: g[RGU:LowerRNA:Virtual_Phosphate]gaga[RAD:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]uagaugauucgcguuaagugugugugaaugggauguc[RCY:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]ucacacaacg[RGU:rna_cutpoint_lower]a[RAD:rna_cutpoint_upper]agcg[RGU:rna_cutpoint_lower]a[RAD:rna_cutpoint_upper]gagcgcg[RGU:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]ugaaucauu[URA:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]cauccgcucca[RAD:UpperRNA] S_000001_000
Le3nY9smsa4zEMGdvAA+z+e

It seems to work:

-rw-rw-r--   1 magnus users 474M 2017-08-06 05:25 thf_10000.out
-rw -rw-r--   1 magnus users 947M 2017-08-06 04:54 thf.out

[peyote2] thf rna_rosetta_n.py thf_10000.out
10000

usage: rna_rosetta_n.py [-h] [-v] [-n NSTRUC] file

Positional Arguments

file ade.out

Named Arguments

-v, --verbose Default: False
-n, --nstruc Default: 10000

Cluster

usage: rna_rosetta_cluster.py [-h] [--no_select] file n

Positional Arguments

file ade.out
n # of total structures

Named Arguments

--no_select

Don’t run selection once again. Use selected.out in the current folder

Default: False

A wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

rna_rosetta_cluster.py ade_min.out 20000

Take n * 0.005 (.5%) of all frames and put them into selected.out. Then the tool clusters this selected.out.

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.cluster(radius=1)[source]

Internal function of cluster_loop: It removes cluster.out first.

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.cluster_loop(ns)[source]

Go from radius 1 to get 1/6 of structures of ns (# of selected structures) in the first cluster, then it stops.

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.extract()[source]
rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.get_no_structures(file)[source]
rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.get_parser()[source]
rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.get_selected(file, nc)[source]

Get selected for clustering

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.run()[source]

Pipline for modeling RNA

Minimize

rna_rosetta_min.py - a script to do minimization

The script takes the number of structures and the analyzed silence file and does the maths.

Job names will be as your silent file preceding with ~, .e.g ~tha.

http://www.sciencedirect.com/science/article/pii/S0076687914000524

ade$ rna_rosetta_cluster.py ade.out

The first number states how many processors to use for the run, while the second number is 1/6 the total number of previously generated FARNA models. If you are running on a supercomputer that only allows specific multiples of processors, use an appropriate number for the first input.:

rosetta_submit.py min_cmdline min_out 1 24

rosetta_submit.py min_cmdline min_out [1] [16] The first number states how many processors to use for each line in min_cmdline. Here, enter 1 for the first input so that the total number of processors used will be equal to the number of processors entered with the “-proc” flag in command line [12], above. The second number states the maximum time each job will be allowed to run (walltime). Start the run with the appropriate command listed by the out- put above (e.g., source qsubMPI for the Stampede cluster).

E.g. for 20k silet file, 1/6 will be minimized = 3.3k:

parallel_min_setup.py -silent rp21cr62.out -tag rp21cr62_min  -proc 200 -nstruct 3200 -out_folder mo -out_script MINIMIZE " -ignore_zero_occupancy false "
rosetta_submit.py MINIMIZE mo 1 100 m

[peyote2] rp21 easy_cat.py mo
Catting into:  rp21_min.out ... from 200 primary files. Found 3200  decoys.

# on 200 cpus it took around ~30min

usage: rna_rosetta_min.py [-h] [-g] [-c CPUS] file

Positional Arguments

file ade.out

Named Arguments

-g, --go Default: False
-c, --cpus

default: 200

Default: 200

Check progress

rna_rosetta_cluster.py - a script to cluster a silent file

Example:

[peyote2] rosetta_jobs rna_rosetta_check_progress.py .
         jobs  #curr  #todo  #decoys done
0  ./rp17s223    200      0      407  [ ]
1   ./rp17hcf      0      0        0  [ ]
# curr  232 #todo  0

usage: rna_rosetta_cluster.py [-h] [-v] [-m] [-k] dir

Positional Arguments

dir

directory with rosetta runs, define by RNA_ROSETTA_RUN_ROOT_DIR_MODELING right now: /home/magnus/rosetta-runs

Default: “/home/magnus/rosetta-runs”

Named Arguments

-v, --verbose

be verbose

Default: False

-m, --min-only

check only for mo folder

Default: False

-k, --kill

kill (qdel) jobs if your reach limit (nstruc) of structure that you want, right now is 10000 structures

Default: False

Plotting

Misc

rna_sali2dotbracket

usage: rna_sali2dotbracket [-h] filename

Positional Arguments

filename file in the Sali format

This beauty here will go to sali notation and convert it to dotbracket notation. The file name should be xxxx.sali

Author: Catarina Almeida

rna_pdb_tools.utils.rna_sali2dotbracket.rna_sali2dotbracket.convert_sali2dotbracket(fn)[source]

The function needs a filename in the Sali format. This function will get the secondary structure of the sequence, then get its identifier and then the sequence itself.

To get the ss

The line with the secondary structure is a list and will look like this:

['', '', '', '', '', '', '', '', '', '', '--...<<<[[...]]..>>>>', '', '', '\n']

In this case, the ss is in the 11th position. But in some files it may be in the 12th, 13th, 10th, etc..

If the longest element from the list is extracted, then this problem is overcomed.

The ss will some times have patterns of repeated gaps, which will come in the form of:

  1. x
  2. xnt
  3. ( x )

With x being any number, from 1 to 1000. These must be converted to the correspondent number of gaps (-) in the converted ss. This conversion is done by:

1 - Identifying the pattern with regex

2 - Replacing it with repl function.

As such, the following expressions will replace the previously mentioned patterns:

  1. re.sub(r'\d*\d', repl, temp)
  2. re.sub(r'\d*\dnt', repl, temp)
  3. re.sub(r'(?P<smthBeautiful>\(\d+\))', repl, temp)

To get the sequence

The sequence, much like the ss, can sometimes be in a different position in the list. Like in the ss, the longest element will be selected. Also, like in the ss, patterns for repeated gaps appear. So these must also be removed.

rna_pdb_tools.utils.rna_sali2dotbracket.rna_sali2dotbracket.get_parser()[source]
rna_pdb_tools.utils.rna_sali2dotbracket.rna_sali2dotbracket.repl(m)[source]

This function will replace the length of a given string by the correspondent number of dashes. The expression qwerty will be replaced by -----.

rna_add_chain

Example:

./rna_add_chain.py -c X ../../input/1msy_rnakbmd_decoy999_clx_noChain.pdb     > ../../output/1msy_rnakbmd_decoy999_clx_noChain_Xchain.pdb

From:

ATOM      1  O5'   U     1      42.778  25.208  46.287  1.00  0.00
ATOM      2  C5'   U     1      42.780  26.630  45.876  1.00  0.00
ATOM      3  C4'   U     1      42.080  27.526  46.956  1.00  0.00
ATOM      4  O4'   U     1      43.013  28.044  47.963  1.00  0.00
ATOM      5  C1'   U     1      42.706  29.395  48.257  1.00  0.00
ATOM      6  N1    U     1      43.857  30.305  47.703  1.00  0.00
ATOM      7  C6    U     1      45.057  29.857  47.308  1.00  0.00
ATOM      8  C5    U     1      46.025  30.676  46.763  1.00  0.00
ATOM      9  C4    U     1      45.720  32.110  46.702  1.00  0.00
ATOM     10  O4    U     1      46.444  32.975  46.256  1.00  0.00

to:

ATOM      1  O5'   U X   1      42.778  25.208  46.287  1.00  0.00
ATOM      2  C5'   U X   1      42.780  26.630  45.876  1.00  0.00
ATOM      3  C4'   U X   1      42.080  27.526  46.956  1.00  0.00
ATOM      4  O4'   U X   1      43.013  28.044  47.963  1.00  0.00
ATOM      5  C1'   U X   1      42.706  29.395  48.257  1.00  0.00
ATOM      6  N1    U X   1      43.857  30.305  47.703  1.00  0.00
ATOM      7  C6    U X   1      45.057  29.857  47.308  1.00  0.00
ATOM      8  C5    U X   1      46.025  30.676  46.763  1.00  0.00
ATOM      9  C4    U X   1      45.720  32.110  46.702  1.00  0.00
ATOM     10  O4    U X   1      46.444  32.975  46.256  1.00  0.00
rna_pdb_tools.utils.misc.rna_add_chain.get_parser()[source]
usage: rna_add_chain [-h] [-c CHAIN] file

Positional Arguments

file file

Named Arguments

-c, --chain a new chain, e.g. A

Cluster load

A very simple tool to see your cluster load per user:

MAX_JOBS: 1000
#jobs cluster 917 load:  0.917  to use: 83
#jobs you     749 load:  0.749  to use: 251
{'deepak': 160, 'azyla': 8, 'magnus': 749}
1 azyla        r 8
20 magnus       r 10
16 deepak       r 10
329 magnus       r 1
22 magnus       qw 10

A super simple script to get some statistics of who is running at a cluster

Set MAX_JOBS to calc % of usage, it’s an approximation of max number of jobs, e.g. peyote ~1k (rather 700, e.g. FARNA runs.).

rna_pdb_tools.utils.cluster_load.cluster_load.get_parser()[source]
rna_pdb_tools.utils.cluster_load.cluster_load.per_user()[source]

get stats (#cpus) per user

rna_pdb_tools.utils.cluster_load.cluster_load.stats_for_cluster()[source]

get stats (#jobs) per cluster

rna_pdb_tools.utils.cluster_load.cluster_load.stats_for_user()[source]

get stats (#jobs) per user