CAP

Catabolite Gene Activator Protein (CAP)/DNA Comples and Adenosine-3’,5’-cyclic Monophosphate (1CGP) from Escherichia coli

Created by: Sarah Hatef

The Catabolite Gene Activator Protein (CAP)/DNA Complex and Adenosine-3’,5’-cyclic Monophosphate is an transcription regulator in the bacteria, Escherichia coli. CAP has a molecular weight of 23,640 Da and an isoelectric point of 7.7122 (1). CAP, whose Protein Data Bank (PDB) identification is 1CGP, is the receptor protein for cyclic AMP (adenosine 3’,5’ monophosphate). When CAP is complexed with cAMP, transcription is activated at more than 20 different promoters in E. coli (2). CAP is thus responsible for the transcription and subsequent translation of many proteins in E. coli. E. coli is a valuable bacteria in research because it is inexpensive and easy to grow. The bacteria only takes 20 minutes to reproduce, and it can be manipulated in order to work with recombinant DNA. Because of E. coli’s use in the fields of biology and biotechnology, it has been studied extensively, and its specific use in the research of recombinant DNA means that any protein involved in DNA replication, transcription, or translation would be of significance to researchers. CAP was discovered in E. coli along with its ligand, cAMP over 40 years ago while research was being conducted to find the cause of diauxic growth, or growth in two phases (3).

Catabolite gene activator protein regulates hundred of transcription units by sensing changing concentrations of cyclic adenosine monophosphate, or cAMP (8). This molecule is a second messenger derived from ATP and when bound to CAP it induces a conformational change that allows for protein-protein interactions that induce transcription (8). In the apo-CAP, or the CAP not attached to its ligands, the two DNA-binding domains interact in order to bury the DNA-recognition helices so that they cannot be seen by the DNA (3). When cAMP attaches, a hinge is activated that allows the DNA-binding domains to move in order to bind the DNA molecule. During this conformational change, the C-helices are lengthened by six residues and the D-helices are shortened by four residues (3).

CAP is composed of two subunits, each of which has 3 domains. The N-terminus is a cAMP-binding domain which includes a cyclic nucleotide-binding unit. The C-terminus is a DNA-binding domain which includes a helix-turn-helix motif for binding DNA (2). These two domains are connected by a long alpha helix called a C-helix that forms a coiled-coil, a linker, and a D-helix to dimerize the two subunits (3). The protein 1CGP is bound to its cAMP molecule and a DNA molecule because until around 2009 the protein was unable to be crystallized without its ligand. This DNA molecule is composed of 4 smaller subunits (2).

Catabolite gene activator protein displays multiple types of secondary structure. Beta sheets, random coils, and alpha helices are all present, but the most notable areas of specific secondary structure are the helices (8). The protein contains 7 helices in total (6). Two recognition helices are present along the DNA-binding sites, and both are split into two halves by a beta-turn (3). These helices participate in sequence-specific binding to DNA molecules. A D-helix is present at the dimer interface (3). A long C-helix connects the two subunits by forming a coiled-coil at the interface, and this helix is involved in the conformational change induced by cAMP (3).

The most important residues in the catabolite gene activator protein are Ser-128 and Asp-138. When cAMP binds, the hinge’s bending is induced by hydrogen bond formation between the N(6) position of adenosine with Ser-128 of the C-helices (3). When the hinge’s bending shortens the D-helices by four residues, Asp-138 becomes the N-terminal capping residue of the D-helix. The propensity for Asp-138 to cap the D-helix is correlated with how much cAMP binds the CAP (3). In addition to Ser-128 and Asp-138, several other residues are very important for the function of CAP. His-159 and Gly-162 are both exposed to the outer surface of the domain of CAP and can be contacted by RNA polymerase once transcription has been induced (9). Glu-181 positions base pair 5 in a way that requires a kink and bends the hinge in order to give the DNA its characteristic 90° bend (9). Arg-180 contributes to binding RNA polymerase when the DNA is bent (9). Arg-185 interacts with bases in the major groove of the DNA helix (9). Thr-140, Ser-179, and The-182 all hydrogen-bond to the DNA molecule (9).

No ionic interactions are present in the protein, but hydrogen bonding is important in catabolite gene activator protein’s structure and function. Three side chains emanating from the DNA-recognition helix hydrogen are bound directly to three base pairs in the major groove of the DNA, allowing the DNA to bind securely but temporarily (9). Additionally, the guanidium group of Arg-180 hydrogen-bonds to the O(6) and N(7) of Gly-7, which helps to bind RNA polymerase (9). Finally, in response to cAMP-binding, Ser-128 in the dimerization helix hydrogen bonds with the N(6) position of adenosine, closing the hinge (3).

Binding of cAMP to CAP is dependent upon the concentration of cAMP present. Since cAMP is made from ATP, the activity of CAP is thought to be connected to how much energy is present (2). Binding of cAMP is also dependent on the presence of different types of phosphates on the protein. If ethylated phosphate groups are present, this ethylation interferes with cAMP binding. If non-ethylated phosphates are present, they increase binding affinity (9). If the CAP is not bound to cAMP, it is referred to as apo. For years, there was no way to crystallize the apo type of the protein. In 2009 the apo type was crystallized, and the conformation of the unbound CAP was known (2). Knowing both the bound and unbound conformations allowed researchers to figure out the details of the conformational change that were unknown before. In the unbound CAP, both the DNA-binding domains and the cAMP-binding domains are unbound (2). Ser-128 is not connected to adenosine by hydrogen bonds, so the hinge is unbent (3). The D-helices are not shortened by four residues, so Asp-138 is not the N-terminal capping residue. The DNA-recognition helices are not moved by the hinge, so they remain buried in the protein (3). The PDB ID for the unbound conformation of CAP is 3HIF (6).

Using the bioinformatics servers, Dali and PSI-BLAST, CAP was compared to that other similar proteins. The Dali server compares a protein’s tertiary structure to the 61,000 proteins’ tertiary structures listed in the PDB (4). It uses a z-score to show how similar the structures are, and a z-score higher than 2 means that the proteins have similar folds. The PSI-BLAST server enables the user to compare a protein’s amino acid sequence, or primary structure, to the library of known amino acid sequences. It uses an E value determines by comparing the sequences of proteins and assigning gaps, and a score below 0.5 indicates high similarity between proteins (5).

Using these servers, the Virulence Factor Regulator (VFR) protein in the bacteria Pseudomonas aeruginosa was found to be very similar in both amino acid sequence and in tertiary structure. The comparison structure’s z-score is 28.4, which is well above the minimum value of 2, and its E value is 2e-68, which is well below the maximum value of 0.5 (4, 5). Virulence Factor Regulator, whose PDB ID is 2OZ6, has a primary structure only four residues longer than CAP (6). VFR weighs more than CAP and it only has one subunit instead of two, so it does not include any structures included in CAP for the dimerization of its two subunits. The two proteins have very similar secondary structures. CAP is comprised of 38% helical structure and 26% beta sheet structure, while VFR is 38% helical and 29% beta sheet (6). Because VFR regulates the transcription of a virulence factor present in the pathogenic bacteria, both proteins bind cAMP in order to regulate gene expression (7).

Catabolite gene activator protein’s function is determined by its structure. Its secondary and tertiary structures are determined by the primary structure, and the residues necessary for specific hydrogen bonding and hinge interactions are coded for in the primary structure. If the protein had a mutation in its primary structure it would not be able to undergo the same conformational changes in order to bind cAMP. The hydrogen bonding of the protein is described by the secondary structure and the characteristic folding patterns such as alpha helices and beta sheets are determined by hydrogen bonding patterns. Important functions, such as DNA-recognition, cAMP-binding, and moving the hinge, are all dependent upon alpha helices, and if these alpha helices had a different structure or a mutation that made them misfold, the protein’s function would be compromised (2). These helices, and the beta sheets and beta turns that separate them, must exist in order for cAMP to bind and induce the conformational change that exposes the DNA-recognition helices and allows DNA to bind.