rna-pdb-tools

get RNAPuzzle ready

class rna_pdb_tools.rna_pdb_tools_lib.RNAStructure(fn)[source]

RNAStructure - handles an RNA pdb file.

Atributes:

fn (string) : filename of the pdb file lines (list) : the PDB file is loaded and ATOM/HETATM/TER/END go to self.lines
get_rnapuzzle_ready(renumber_residues=True, fix_missing_atoms=True, rename_chains=True, report_missing_atoms=True, verbose=True)[source]

Get rnapuzzle (SimRNA) ready structure.

Clean up a structure, get current order of atoms.

Parameters:
  • renumber_residues – boolean, from 1 to …, second chain starts from 1 etc.
  • fix_missing_atoms – boolean, superimpose motifs from the minilibrary and copy-paste missing atoms, this is super crude, so should be used with caution.

Submission format @http://ahsoka.u-strasbg.fr/rnapuzzles/

Run rna_pdb_tools.rna_pdb_tools_lib.RNAStructure.std_resn() before this function to fix names.

  • 170305 Merged with get_simrna_ready and fixing OP3 terminal added
  • 170308 Fix missing atoms for bases, and O2’
_images/fix_missing_o_before_after.png

Fig. Add missing O2’ atom (before and after).

_images/fix_missing_superposition.png

Fig. The residue to fix is in cyan. The G base from the library in red. Atoms O4’, C2’, C1’ are shared between the sugar (in cyan) and the G base from the library (in red). These atoms are used to superimpose the G base on the sugar, and then all atoms from the base are copied to the residues.

_images/fix_missing_bases.png

Fig. Rebuild ACGU base-less. It’s not perfect but good enough for some applications.

Warning

It was only tested with the whole base missing!

Warning

requires: Biopython

get sequence

Example:

$ rna_pdb_toolsx.py --get_seq 5_solution_1.pdb
> 5_solution_1.pdb A:1-576
CAUCCGGUAUCCCAAGACAAUCUCGGGUUGGGUUGGGAAGUAUCAUGGCUAAUCACCAUGAUGCAAUCGGGUUGAACACUUAAUUGGGUUAAAACGGUGGGGGACGAUCCCGUAACAUCCGUCCUAACGGCGACAGACUGCACGGCCCUGCCUCAGGUGUGUCCAAUGAACAGUCGUUCCGAAAGGAAG
class rna_pdb_tools.rna_pdb_tools_lib.RNAStructure(fn)[source]

RNAStructure - handles an RNA pdb file.

Atributes:

fn (string) : filename of the pdb file lines (list) : the PDB file is loaded and ATOM/HETATM/TER/END go to self.lines
get_seq(compact=False, chainfirst=True)[source]

Get seq (v2) gets segments of chains with correct numbering

Run:

python rna_pdb_seq.py input/1ykq_clx.pdb
> 1ykq_clx A:101-111
GGAGCUCGCCC
> 1ykq_clx B:201-238
GGGCGAGGCCGUGCCAGCUCUUCGGAGCAAUACUCGGC

> 6_solution_0 A:1-19 26-113 117-172
GGCGGCAGGUGCUCCCGACGUCGGGAGUUAAAAGGGAAG

Chains is {'A': {'header': 'A:1-19 26-113 117-172', 'resi': [1, 2, 3, ...,         19, 26, 27, ..., 172], 'seq': ['G', 'G', 'C', 'G', ... C', 'G', 'U', 'C']}}

Chains are in other as the appear in the file.

Warning

take only ATOM and HETATM lines.

fetch

Example:

$ rna_pdb_toolsx.py --fetch 1xjr
downloading...1xjr ok
rna_pdb_tools.rna_pdb_tools_lib.fetch(pdb_id, path='.')[source]

fetch pdb file from RCSB.org https://files.rcsb.org/download/1Y26.pdb

fetch Biological Assembly

Example:

$ rna_pdb_toolsx.py --fetch_ba 1xjr
downloading...1xjr_ba.pdb ok

or over a list of pdb ids in a text file:

$ cat data/pdb_ids.txt
1y26
1fir

$ while read p; do rna_pdb_toolsx.py --fetch_ba $p; done < data/pdb_ids.txt
downloading...1y26_ba.pdb ok
downloading...1fir_ba.pdb ok

$ ls *.pdb
1fir_ba.pdb 1y26_ba.pdb
rna_pdb_tools.rna_pdb_tools_lib.fetch_ba(pdb_id, path='.')[source]

fetch biological assembly pdb file from RCSB.org

>>> fetch_ba('1xjr')
...

delete

Examples:

$ for i in *pdb; do rna_pdb_toolsx.py --delete A:46-56 $i > ../rpr_rm_loop/$i ; done

go over all files in the current directory, remove a fragment of chain A, residues between 46-56 (including them) and save outputs to in the folder rpr_rm_loops.

edit

rna_pdb_tools.rna_pdb_tools_lib.edit_pdb(args)[source]

Edit your structure.

The function can take A:3-21>A:1-19 or even syntax like this A:3-21>A:1-19,B:22-32>B:20-30 and will do an editing.

The output is printed, line by line. Only ATOM lines are edited!

Examples:

$ rna_pdb_toolsx.py --edit 'A:3-21>A:1-19' 1f27_clean.pdb > 1f27_clean_A1-19.pdb

or even:

$ rna_pdb_toolsx.py --edit 'A:3-21>A:1-19,B:22-32>B:20-30' 1f27_clean.pdb > 1f27_clean_renumb.pdb

the library

rna_pdb_tools_lib.py - main lib file, many tools in this lib is using this file.

exception rna_pdb_tools.rna_pdb_tools_lib.PDBFetchError[source]
class rna_pdb_tools.rna_pdb_tools_lib.RNAStructure(fn)[source]

RNAStructure - handles an RNA pdb file.

Atributes:

fn (string) : filename of the pdb file lines (list) : the PDB file is loaded and ATOM/HETATM/TER/END go to self.lines
edit_occupancy_of_pdb(pdb, pdb_out, v=False)[source]

Make all atoms 1 (flexi) and then set occupancy 0 for seletected atoms. Return False if error. True if OK

fix_O_in_UC()[source]
fix_op_atoms()[source]

Replace OXP’ to OPX1, e.g (‘O1P’ -> ‘OP1’)

fix_with_qrnas(outfn='', verbose=False)[source]

Add missing heavy atom.

A residue is recognized base on a residue names.

Copy QRNAS folder to curr folder, run QRNAS and remove QRNAS.

Warning

QRNAS required (http://genesilico.pl/QRNAS/QRNAS.tgz)

get_all_chain_ids()[source]
Returns:chain ids, e.g. set([‘A’, ‘B’])
Return type:set
get_atom_code(line)[source]

Get atom code from a line of a PDB file

get_atom_coords(line)[source]

Get atom coordinates from a line of a PDB file

get_atom_num(line)[source]

Extract atom number from a line of PDB file :param * line = ATOM line from a PDB file:

Output:
  • atom number as an integer
get_info_chains()[source]

return A:3-21 B:22-32

get_report()[source]
Returns:report, messages collected on the way of parsing this file
Return type:string
get_res_code(line)[source]

Get residue code from a line of a PDB file

get_res_num(line)[source]

Extract residue number from a line of PDB file :param * line = ATOM line from a PDB file:

Output:
  • residue number as an integer
get_rnapuzzle_ready(renumber_residues=True, fix_missing_atoms=True, rename_chains=True, report_missing_atoms=True, verbose=True)[source]

Get rnapuzzle (SimRNA) ready structure.

Clean up a structure, get current order of atoms.

Parameters:
  • renumber_residues – boolean, from 1 to …, second chain starts from 1 etc.
  • fix_missing_atoms – boolean, superimpose motifs from the minilibrary and copy-paste missing atoms, this is super crude, so should be used with caution.

Submission format @http://ahsoka.u-strasbg.fr/rnapuzzles/

Run rna_pdb_tools.rna_pdb_tools_lib.RNAStructure.std_resn() before this function to fix names.

  • 170305 Merged with get_simrna_ready and fixing OP3 terminal added
  • 170308 Fix missing atoms for bases, and O2’
_images/fix_missing_o_before_after.png

Fig. Add missing O2’ atom (before and after).

_images/fix_missing_superposition.png

Fig. The residue to fix is in cyan. The G base from the library in red. Atoms O4’, C2’, C1’ are shared between the sugar (in cyan) and the G base from the library (in red). These atoms are used to superimpose the G base on the sugar, and then all atoms from the base are copied to the residues.

_images/fix_missing_bases.png

Fig. Rebuild ACGU base-less. It’s not perfect but good enough for some applications.

Warning

It was only tested with the whole base missing!

Warning

requires: Biopython

get_seq(compact=False, chainfirst=True)[source]

Get seq (v2) gets segments of chains with correct numbering

Run:

python rna_pdb_seq.py input/1ykq_clx.pdb
> 1ykq_clx A:101-111
GGAGCUCGCCC
> 1ykq_clx B:201-238
GGGCGAGGCCGUGCCAGCUCUUCGGAGCAAUACUCGGC

> 6_solution_0 A:1-19 26-113 117-172
GGCGGCAGGUGCUCCCGACGUCGGGAGUUAAAAGGGAAG

Chains is {'A': {'header': 'A:1-19 26-113 117-172', 'resi': [1, 2, 3, ...,         19, 26, 27, ..., 172], 'seq': ['G', 'G', 'C', 'G', ... C', 'G', 'U', 'C']}}

Chains are in other as the appear in the file.

Warning

take only ATOM and HETATM lines.

get_text(add_end=True)[source]

works on self.lines.

is_amber_like()[source]

Use self.lines and check if there is XX line

is_mol2()[source]

Return True if is_mol2 based on the presence of `@<TRIPOS>`.

is_nmr()[source]

True if the file is an NMR-style multiple model pdb

Returns:True or Fo
Return type:boolean
is_pdb()[source]

Return True if the files is in PDB format.

If self.lines is empty it means that nothing was parsed into the PDB format.

remove(verbose)[source]

Delete file, self.fn

remove_ion()[source]

TER 1025 U A 47 HETATM 1026 MG MG A 101 42.664 34.395 50.249 1.00 70.99 MG HETATM 1027 MG MG A 201 47.865 33.919 48.090 1.00 67.09 MG

rtype:object
remove_water()[source]

Remove HOH and TIP3

renum_atoms()[source]

Renum atoms, from 1 to X for line; ATOM/HETATM

set_atom_occupancy(line, occupancy)[source]

set occupancy for line

set_occupancy_atoms(occupancy)[source]
Parameters:occupancy
std_resn()[source]

‘Fix’ residue names which means to change them to standard, e.g. RA5 -> A

Works on self.lines, and returns the result to self.lines.

Will change things like:

# URI -> U, URA -> U
1xjr_clx_charmm.pdb:ATOM    101  P   URA A   5      58.180  39.153  30.336  1.00 70.94
rp13_Dokholyan_1_URI_CYT_ADE_GUA_hydrogens.pdb:ATOM  82  P   URI A   4     501.633 506.561 506.256  1.00  0.00         P
un_nmr(verbose=False)[source]

Un NMR - Split NMR-style multiple model pdb files into individual models.

Take self.fn and create new files in the way:

input/1a9l_NMR_1_2_models.pdb
   input/1a9l_NMR_1_2_models_0.pdb
   input/1a9l_NMR_1_2_models_1.pdb

Warning

This function requires biopython.

write(outfn, v=True)[source]

Write `self.lines` to a file (and END file”)

rna_pdb_tools.rna_pdb_tools_lib.collapsed_view(args)[source]

Collapsed view of pdb file. Only lines with C5’ atoms are shown and TER, MODEL, END.

example:

[mm] rna_pdb_tools git:(master) $ python rna-pdb-tools.py --cv input/1f27.pdb
ATOM      1  C5'   A A   3      25.674  19.091   3.459  1.00 16.99           C
ATOM     23  C5'   C A   4      19.700  19.206   5.034  1.00 12.65           C
ATOM     43  C5'   C A   5      14.537  16.130   6.444  1.00  8.74           C
ATOM     63  C5'   G A   6      11.726  11.579   9.544  1.00  9.81           C
ATOM     86  C5'   U A   7      12.007   7.281  13.726  1.00 11.35           C
ATOM    106  C5'   C A   8      12.087   6.601  18.999  1.00 12.74           C
TER
rna_pdb_tools.rna_pdb_tools_lib.edit_pdb(args)[source]

Edit your structure.

The function can take A:3-21>A:1-19 or even syntax like this A:3-21>A:1-19,B:22-32>B:20-30 and will do an editing.

The output is printed, line by line. Only ATOM lines are edited!

Examples:

$ rna_pdb_toolsx.py --edit 'A:3-21>A:1-19' 1f27_clean.pdb > 1f27_clean_A1-19.pdb

or even:

$ rna_pdb_toolsx.py --edit 'A:3-21>A:1-19,B:22-32>B:20-30' 1f27_clean.pdb > 1f27_clean_renumb.pdb
rna_pdb_tools.rna_pdb_tools_lib.fetch(pdb_id, path='.')[source]

fetch pdb file from RCSB.org https://files.rcsb.org/download/1Y26.pdb

rna_pdb_tools.rna_pdb_tools_lib.fetch_ba(pdb_id, path='.')[source]

fetch biological assembly pdb file from RCSB.org

>>> fetch_ba('1xjr')
...
rna_pdb_tools.rna_pdb_tools_lib.fetch_cif_ba(cif_id, path='.')[source]

fetch biological assembly cif file from RCSB.org

rna_pdb_tools.rna_pdb_tools_lib.get_version(currfn='', verbose=False)[source]

Get version of the tool based on state of the git repository. Return version. If currfn is empty, then the path is ‘.’. Hmm.. I think it will work. We will see. The version is not printed! https://github.com/m4rx9/curr_version/

rna_pdb_tools.rna_pdb_tools_lib.replace_chain(struc_fn, insert_fn, chain_id)[source]

Replace chain of the main file (struc_fn) with some new chain (insert_fn) of given chain id.

Parameters:
  • struc_fn (str) – path to the main PDB file
  • insert_fn (str) – path to the file that will be injected in into the main PDB file
  • chain_id (str) – chain that will be inserted into the main PDB file
Returns:

text in the PDB format

Return type:

string