PgIC

N,N'-diacetylbacilliosaminyl-1-phosphate transferase PgIC I57M/Q175 variant (PDB ID: 5W7L) is part of a family of proteins called polyprenol phosphate phosphoglycosyl transferases (PGTs) that catalyze the first membrane-committed step in the synthesis of essential glycoconjugates, including glycolipids, glycoproteins, and peptidoglycan (1). PGTs are directly responsible for the catalysis of the transfer of a phosphosugar from a nucleoside diphosphate sugar to a polyprenol phosphate (Pren-P), forming a membrane-bound product (2). PGTs are further subdivided into two superfamilies, the polytopic and monotopic PGT superfamilies, and PgIC is a representative member of the monotopic PGT superfamily because it contains the minimal functional core of the three known families (1). Monotopic membrane proteins are localized to one leaflet of the membrane and membrane association most commonly occurs through hydrophobic interactions of amphipathic helices positioned parallel to the membrane (3). PgIC catalysis has been reported to involve a two-step ping-pong mechanism, in which a highly conserved Asp-Glu dyad is essential for function (1). The first step involves the formation of an enzyme-substrate intermediate and the release of UMP, and the second step is a nucleophilic attack by the Pren-P substrate on the intermediate to form the membrane-associated product (Figure 1) (4). Complex glycoconjugates are integral to bacterial survival and virulence, and can modulate the interaction between bacteria and the host metabolism and immune system, making PGTs important pharmacological targets (1, 5). Elucidating the mechanism by which bacteria synthesize glycoconjugates is crucial to understanding host cell responses to bacteria and how to control or target pathogenic infections.

The I57M and Q175M variant of PgIC, designated as PgIC, from Campylobacter concisus was expressed in Escherichia coli (E. coli), and after purification from the cells, was crystalized using hanging-drop vapor distillation under conditions of 0.1 HEPES, pH 7.5, 0.2 M MgCl2, and 25% polyethylene glycol (PEG) (1, 6). The crystals were flash-cooled with liquid nitrogen without additional cryoprotection (1). X-ray diffraction was used to obtain the final crystal structure of the protein (6). Each subunit of PgIC has a molecular weight of 23418.37 Da and an isoelectric point of 9.62, as determined by the ExPASy database (7). Comprised of two identical A and B chain subunits, PgIC variant has a total of 410 residues, with 205 in each of the subunits (6). Residues 1-183 in each chain were crystallized and refined to 2.74 Å resolution (1). As a whole, the structure of PgIC is independent of large domains, but the superfamily of monotopic PGTs generally have a dual domain architecture that consists of a small, soluble, globular C-terminal domain that is approximately 180 residues long and a N-terminal membrane-inserted domain that is about 20 residues long (1, 5). In addition to the N-terminal and C-terminal domains, PgIC also has a globular double-twisted loop domain (residues 105-140) (1).

The primary structure of PgIC is composed of 410 amino acids, 205 in the first A chain subunit and 205 in the other identical B chain subunit. The secondary structure for each subunit of PgIC is 45% helical, consisting of 12 helices and 94 residues, 10% beta sheets, consisting of 6 strands and 22 residues, and the remaining portion is random coils (6). The α-helices in PgIC, including a reentrant helix-break-helix (helix A, residues 1-22, and helix B, residues 25-37) and amphipathic coplanar membrane-associated helices (helix C, residues 82-90, helix D, residues 92-103, and helix I, residues 166-183), interact to stabilize the protein (1).

In terms of tertiary structure, the reentrant helix-break-helix motif consisting of helices A and B are the primary components of the N-terminal domain that anchors PgIC in the membrane. The interhelix angle of 118° formed by the proline kink of Pro-24 and stabilized by a 2.6 Å hydrogen bond between Ser-23 and the carbonyl backbone of Ile-20 allows the helix to penetrate 14 Å into the cytoplasmic side of the membrane and reemerge on the same side. The globular double-twisted loop domain is formed by helices E, F, G and H and supported by the C terminus of helix D. Backbone amide and carbonyl groups also form an extensive hydrogen-bonding network that stabilizes the double-twisted loop motif (1). While the quaternary structure of PgIC is composed of two identical subunits, researchers have found that in lipid bilayer disks, PgIC is functional as a monomeric biological assembly (1, 8).

An Asp-Glu catalytic dyad (residues 93 and 94) is the active site of PgIC and is essential for function, as Asp is the nucleophile that forms the covalent phosphosugar intermediate. The presence of basic residues in and around the active site promotes the binding of the negatively-charged phosphate-rich substrates in the active site. The binding site of the phosphate ligand marks where the UDP-N,N'-diacetylbacillosamine (UDP-diNAcBac) substrate would bind to PgIC. A strictly conserved Pro-Arg-Pro motif (residues 111-113) positions Arg-112 toward the active site, potentially allowing the side chain of Arg-112 to interact with the uracil of UDP-diNAcBac. The position of the binding site of the phosphatidylethanolamine (PE) ligand at Arg-8 suggests the presence of a narrow, hydrophobic volume that is proposed to be the Pren-P binding site. The location of the PE ligand binding site at the membrane interface would position Pren-P proximal to the active site and Asp-112, allowing the Pren-P to interact with the phosphosugar intermediate to form the membrane-associated product. Additionally, the Mg2+ ligand acts as a cofactor and binds to the active site by forming a coordinate bond with Asp-93 (1).

The Basic Local Alignment Search Tool (BLAST) algorithm is used to determine sequence similarities between proteins and DNA. Specifically, protein-specific Position Specific Iterated BLAST (PSI-BLAST) was used to compare PgIC with similar proteins. The algorithm calculates an E-value based on how conserved positions are between two proteins, where lower E-values signal more sequence alignment (9, 10). A threshold of E < 0.05 was used to signify significant sequence similarity. The Dali server is another tool for quantifying protein similarity by examining tertiary structure and computing a sum-of-pairs method that compares intramolecular distances. Protein structures with significant similarities have Z-scores exceeding 2, with larger Z-scores indicating greater similarity (11). Monotopic membrane proteins are sparsely represented in the Protein Data Bank, and the lack of comparison proteins with significant sequence similarities points to the need to characterize additional monotopic membrane proteins like PgIC (1).

AF4/FMR2 family member 4 (AFF4) C-terminal homology domain (PDB ID: 6R80), derived from Homo sapiens, had the most significant primary and tertiary structure similarity with PgIC, with an E-value of 0.25 and a Z-score of 4.5 (10, 11). While both AFF4 and PgIC are expressed in E. coli, they have drastically different functions. More generally, AFF4 is the scaffold protein of the super-elongation complex (SEC) that mediates the release of RNA polymerase II from promoter-proximal pausing and plays a role in the transactivation of HIV-1 transcription (12). The primary and secondary structure of AFF4 C-terminal domain differs from PgIC in that AFF4 has 284 residues and is 58% helical (9 helices, 167 residues) with the remainder of the protein consisting of random coils. AFF4 C-terminal domain also consists of a single subunit compared to PgIC's two subunit structure (13). In terms of tertiary structure, different domains of AFF4 are specialized for interaction with various other components of the SEC. AFF4 associates with the kinase complex P-TEFb associates at the N-terminal (residues 1-73), RNA polymerase II elongation factor at residues 301-350, and factors ENL and AF9 associate with residues 761-774 (12). By interacting with various components of the SEC, AFF4 is able to properly perform its function as a scaffold protein and assist in assembling the SEC. On a more holistic note, AFF4 is largely positively charged, reflecting that its function necessitates close interaction with negatively nucleic acids (12). In contrast, the function of PgIC requires that it closely associate with the plasma membrane. The structure of PgIC also reflects its own unique function, as a reentrant helix-break-helix motif that is part of the reentrant membrane helix (RMH) and amphipathic α-helices allow for PgIC to be localized to the membrane (1). AFF4 and PgIC have completely dissimilar functions, and this is reflected in their completely dissimilar forms.

Overall, PgIC is biologically significant due to its integral role in catalyzing the first membrane-committed step of complex glycoconjugates in bacterial. The role of glycoconjugates in influencing the host cell immune response and thus, pathogenicity of bacteria makes PgIC an important pharmacological target in bacterial infections. The relatively miniscule representation of monotopic membrane proteins in the Protein Data Bank makes it difficult to find other proteins similar in function and structure to PgIC, limiting the use of sequence and structure comparison in better elucidating how the structure of PgIC informs its function. Characterizing additional monotopic membrane proteins would facilitate greater understanding of the catalytic mechanism involved in the transfer of a phosphosugar from a nucleoside diphosphate sugar to a Pren-P.