Astonishingly, this uncoupled polycistronic transcription is achieved in the absence of many basal transcription factors otherwise conserved throughout Archaea and Eukarya Ivens et al. This is also true for T. Similarly, an unstable association between epitope-tagged RPB6z and the largest subunit of Pol I was also detected Nguyen et al. Here, we use a number of approaches to demonstrate for the first time diversification of function and complex specialization by different isoforms of the conventionally shared RNA polymerase subunits RPB5 and RPB6.

Iterative profile-based searching techniques are a powerful way of identifying homologues in diverse, distantly related organisms. Several sequences with low expectation values from closely related organisms were easily identified. These sequences were aligned using ClustalX Thompson et al. These models were then used to search a large, protein sequence database from a diverse set of organisms see Supplemental Material. The resultant hits were aligned to the model, and unconvincing low-scoring matches were discarded. A new alignment containing the additional sequence matches was then used to generate another hidden Markov model, and the database was researched.

This process was repeated until no further satisfactory matches could be detected. The final amino-acid alignments were manually edited and trimmed to informative blocks of and 79 characters for RPB5 and RPB6, respectively. These alignments were then used to infer Bayesian maximum likelihood trees by using the program MrBayes 3.

The WAG substitution matrix was used with a gamma-distributed substitution rate variation approximated by four discrete categories with shape parameter estimated from the data. Bootstrap support for the inferred topologies was estimated from replicate Bayesian tree inferences with character resampling. Cloning of procyclic-form cells was performed by limiting dilution on well plates. Cloning of bloodstream-form cells was performed either by limiting dilution on well plates or by plating cells on agarose Carruthers and Cross, These vectors were linearized with NotI restriction endonuclease and then transfected into procyclic form cells Wirtz et al.

Louis, MO to culture medium. After stable transfection into 29—13 cells Wirtz et al. Cells were then permeabilized with 0. Cells were subsequently incubated with rabbit polyclonal anti-GFP gift from Jeff Errington, Newcastle University, United Kingdom and mouse monoclonal L1C6 primary antibodies, washed, and then incubated with goat anti-rabbit AlexaFluor Invitrogen, Carlsbad, CA and goat anti-mouse tetramethylrhodamine B isothiocyanate Sigma-Aldrich -conjugated secondary antibodies. L1C6 is a monoclonal antibody mAb with reactivity to the nucleolus; the precise antigen is unknown.

Cells were resuspended in ice-cold lysis buffer 0. The gels were stained with Sypro Ruby reagent Invitrogen , and the bands were excised from the gel before in-gel tryptic digestion and analysis by mass spectrometry Devaux et al. Cells were then fixed with 3. Run on transcription in permeabilized trypanosomes was performed as described previously Ullu and Tschudi, PCR products encompassing the whole open reading frame of each gene was used as hybridization target.

Iterative profile-based searches of 19 disparate eukaryotic genomes resulted in the identification of 29 RPB5 and 24 RPB6 sequences. These predicted protein sequences were aligned and edited, and the alignments were then used to infer Bayesian phylogenies Huelsenbeck and Ronquist, for each subunit. The resulting trees Figure 1 A show that duplication of RPB5 has occurred in multiple eukaryotic lineages. We find that A. Multiple RPB5 sequences can be found in the genomes of other higher plants that can similarly be grouped by the presence or absence of this short N-terminal extension see Supplemental Material.

However, only a single RPB5 sequence can be found in the genomes of the red alga Cyanidioschyzon merolae or green alga Chlamydomonas reinhardtii. Figure 1. A Bayesian maximum likelihood tree of RPB5 sequences. B Bayesian maximum likelihood tree of RPB6 sequences. Instances where multiple paralogues exist are indicated by shaded boxes, GenBank accession numbers are shown with taxon names.

Accession numbers for Cyanidioschyzon merolae , Chlamydomonas reinhardtii , and Paramecium tetraurelia proteins are specific for their genome project databases Matsuzaki et al. Bootstrap support for the Bayesian topology from maximum likelihood ML , maximum parsimony MP , and neighbor-joining NJ approaches is indicated next to nodes. Bar, 0.

  • The single-cell eukaryotes G. The three trypanosomatids each encode two RPB5 paralogues forming two separate clades, but they are clearly derived from a single common ancestor Figure 1 A. The grouping of these sequences suggests that they arose independently within the trypanosomatid lineage predating the split of L. Similarly for RPB6, it is apparent that duplications have occurred in at least three different eukaryotic lineages.

    The small C-terminal domain is well conserved across Eukarya and archaea and is responsible for the attachment of RPB5 to the largest subunit of the polymerase complex. The third and smallest insertion resides just before the C-terminal domain. Interestingly, an insertion is found at the same position in both paralogues of G. Insertions within the N-terminal region of RPB5 are not specific to trypanosomatids and can also be found in the single putative RPB5 homologue in Tetrahymena thermophila see Supplemental Material.

    In trypanosomatids, the N-terminal insertions differ in size and are poorly alignable. However, they share some similarity in their sequence composition, each being composed of a high proportion of charged residues Figure 2 B. These insertions were not used to infer the tree and thus are not responsible for the clustered grouping of the TbRPB5z sequences Figure 1 A.

    Figure 2. Comparison between duplicated subunits in trypanosomatids. For all panels, conserved structure is in blue, and residues that are predicted to cause alterations to the solved structure or residues that are not present in the solved structure are in green.

    A Cartoon depicting domains and insertions in trypanosomatid RPB5 sequences. B Comparison between insertion size and proportion of charged residues for the two largest insertions in the three trypanosomatids.

    C Cartoon depicting domains and insertions in trypanosomatid RPB6 sequences. D Comparison between the known S. E Comparison between the known S. The structure of the RPB6 charged N-terminal domain is not known and hence is not included. The level of sequence identity between all the organisms used to infer the RPB5 tree is very low. Of the sequence positions sampled, only four are identical across all sequences in the tree not including the highly truncated A. Given this low level of sequence identity, we looked to see how the four residues shown to be important for transcription factor binding in human RPB5 F76, I, T, and S Le et al.

    S is the least well conserved of these four residues, only occurring in metazoans and fungi. I is represented by a nonpolar residue in all eukaryotes apart from one G. The canonical RPB6 subunit is also composed of two domains: a short negatively charged N-terminal domain and a highly conserved C-terminal domain that is homologous to the archaeal RNA polymerase subunit rpoK. This domain is poorly alignable across the eukaryotes and is absent from archaeal rpoK. Intriguingly, this highly charged N-terminal domain is absent from G.

    A similar insertion is found at the same position in the predicted gene models for RPB6 in the apicomplexan parasites Plasmodium falciparum and Theileria annulata but not in the related alveolates T. Unlike the N-terminal insertions of RPB5z, the length and sequence of this small insertion are conserved within the trypanosomatids for alignment, see Supplemental Material. Of the 79 sequence positions sampled, seven are identical across all taxa.

    To investigate how the insertions found in the noncanonical trypanosomatid RPB5 and RPB6 subunits might affect their interactions with other polymerase components, we used the solved crystal structures for the yeast Pol II complex subunits Cramer et al. For T. The most N-terminal proximal insertion is predicted to occupy a region adjacent the DNA entrance channel, creating a novel interaction surface for the polymerase complex facing DNA-bound transcriptional regulators Figure 2 F.

    The second largest insert is predicted to form a small helix that protrudes away from the core complex Figure 2 , D and F. The third and smallest insertion is predicted to take the form of a small sheet that projects toward RPB6 and the dissociable heterodimer complex.

    By homology modeling of T. This insertion is predicted to form a surface exposed structure that extends across a gap on the side of the Pol II complex Figure 2 , E and F. To address whether the multiple RPB5 and RPB6 subunits represent different forms of the same polymerase class or whether they are unique to different polymerase classes, we interrogated their subnuclear localization by generating chimeras of each isoform fused at their N terminus to GFP.

    In tsetse-form procyclic T. Figure 3. E—H Bloodstream-form cells processed in the same manner. ESBs are indicated with white arrows. This extranucleolar compartment was shown to be the expression site body by colocalization with a mAb to the Pol I largest subunit Figure 3 I; Navarro and Gull, To address whether the different paralogues of RPB5 did indeed form part of different polymerase complexes, we generated procyclic-form T.

    The highest molecular weight bands of each purified complex were identified by mass spectrometry. Figure 4. Expression was induced overnight by addition of doxycycline and analyzed by Western blot with anti-TAP tag antibodies. The highest molecular weight bands of each purification were analyzed by mass spectrometry.

    • To evaluate the effect of loss of either trypanosome RPB5 subunit on the transcription profile of the cell, tetracycline-inducible RNAi constructs were generated to specifically target degradation of each RPB5 mRNA independently. Knockdown of TbRPB5 resulted in rapid growth arrest and cell death in both bloodstream- and tsetse-form cells Figure 5 A. Knockdown of TbRPB5z also produced a growth phenotype in tsetse-form cells, but it did not cause any observable growth defect in bloodstream-form cells Figure 5 B.

      Numerous attempts to generate an RPB5z knockout cell line in both life cycle stages have failed. Figure 5.

      Effect of RNAi on growth. In each case, transcript level and production of double-stranded RNA was monitored by Northern blot. Quantification of hybridization signals is shown in histograms before D0 or after 2 D2 or 4 D4 days of doxycycline induction. Because the nucleolus is the sole site of Pol I-mediated rRNA transcription in procyclic-form cells, we used transmission electron microscopy to determine whether the knockdown of the Pol I-specific subunit affected nucleolar structure. In each case, induced and control cells were analyzed from thin sections of nuclei.

      Cells differing from the previously defined canonical nucleolar structure Ogbadoyi et al. A variety of abnormal nucleoli were observed, the majority of which fell into two categories: 1 unusual heterogeneous density and 2 the appearance of an intranucleolar electron dense ring Figure 6 B.

      Figure 6. Effect of RNAi on structure of the nucleolus. A Wild-type homogenous density nucleolus. Numbers below images are percentages from counts of RNAi-induced cells. It has been previously noted that there have been expansions in the repertoire of RPB5 subunits encoded by the genomes of higher plants Larkin et al. Best et al. However, to date, no functional consequence of these expansions has been described. Here, we have shown that, in addition to the RPB5 expansions mentioned above, there has also been a duplication of RPB5 in the ciliate P. We also identify duplications in RPB6 in higher plants, trypanosomes, and P.

      Again, these most likely represent independent expansions in the different lineages and not an ancestral eukaryotic state. Nuclear transcription in this organism is particularly interesting for two reasons. Second, although Pol II transcribes the majority of protein-coding genes, some highly expressed proteins of the cell surface are transcribed by Pol I.

      Moreover, in at least one life cycle stage, this Pol I-mediated transcription occurs not in the nucleolus, but in a transcription factory in the nucleoplasm called the ESB. On the basis of domain organization, the trypanosomatid paralogues of both RPB5 and RPB6 can be divided into two groups. One group is identical in domain organization to the canonical eukaryotic subunit—these subunits are named RPB5 and RPB6, respectively.

      The other group differs in domain organization by gain of novel insertions or loss of conserved regions and—these variant subunits are named RPB5z and RPB6z, respectively. By expression of chimeric proteins, we have demonstrated that the canonical forms of both RPB5 and RPB6 are found exclusively in the nucleoplasm of T. The duplicated RPB5 and RPB6 subunits are the first documented examples of conventionally shared subunits being specific to different polymerase classes. Taken together, these data demonstrate that the elaboration of RPB5 and RPB6 in trypanosomes is linked to a specialization of the paralogues to particular polymerase complexes.

      This elaboration involves the emergence of a noncanonical variant of both subunits that is specific for Pol I. Interestingly, the specificity of the variant forms for Pol I is true for nucleolar rRNA transcription and also for the Pol I-mediated transcription of protein-coding genes in the nucleoplasm.

