MED1DNA_Complex

MED1:DNA Complex (C. elegans)

Created by Muhibullah Tora

   The MED-1:DNA Complex  (PDB ID: 2KAE) is the association of the MED-1 zinc finger protein (MED1zf) and a very specific DNA sequence. Functionally, MED1zf primarily acts as a transcriptional regulator affecting cell fates and differentiation1. MED1zf is the first GATA-type protein identified that has a binding site that is significantly divergent in binding sequence from the canonical GAT nucleotide sequence with typical GATA-type proteins 2. The proper development of the C. elegans hinges on this one protein’s association with a very specific DNA sequence. And this physiological degree of importance cannot separated from the protein’s structure.

   In addition to the regulatory targets, MED1 can be considered a divergent GATA-type zinc finger transcription factor. That said, GATA-type transcription factors have been demonstrated to have varying levels of involvement in an array of proteins such as P53 3, CKD-8 4 et al and has also been implicated in disease pathways such as down syndrome 5 and acute megakaryblastic leukemia 6 et al. The biological significance of the MED1zf protein is apparent. Its coupling with DNA to form the MED-1:DNA complex is incumbent upon its function, and its function derives its biological significance. In addition, one can transitively assert that its structure confers its function and its significance. It is thus imperative to analyze the structure of the MED-1:DNA complex to elucidate the intricacies of its function and its mechanism of function.

   On a macro scale, the MED1zf can be perceived as a protein association with a DNA. By binding to the DNA, the central locus of cellular transcriptional activity, MED-1 can affect transcription by mediating the activity of RNA Polymerases. MED1zf achieves its function of transcriptional regulation by interactions with Tcf/Lef-1-like factor POP-1, acting as either a repressor or an enhancer and promoting development of Mesoderm and Endoderm lineages 1. More specifically, MED-1 Binds in vivo to the end-1,3 promoters present on extrachromosomal transgenes in both the Endodermal and Mesodermal cells as part of a system of gene regulation to influence cell fates 1.

   The MED1:DNA complex can be analyzed with regard to its respective subunits as well. The DNA sequence of interest is dsDNA among extrachromosomal transgenes 2. Med1zf protein recognizes the specific sequence GTATACT(T/C) on the DNA molecule 2. This understood it is important to consider the primary sequence of the protein.

   By looking only at primary sequence it was anticipated that MED1zf would bind the canonical GATA-family consensus sequence 2. Specific residues are responsible for binding affinity including flanking thymines, residues contacting downstream pyrimidines and a very important arginine residue that confers specificity of binding. However, upon investigation of secondary and tertiary structures in coordination with DNA titers, it was found that MED1zf binds a divergent consensus sequence 1. While the primary sequence is important in the specific residues that confer binding, the secondary and tertiary structures make it possible for such interactions to occur. The primary sequence does add to the overall folding of the protein and is responsible for structural characteristics of MED1zf by which sequence recognition proceeds. But solvent interactions, intramolecular and intermolecular interactions where conformational and chemical changes occur that impact protein structure and function 2. It is important to note that the C-terminal alpha helixwas not predicted by prediction tools such as JUFO or PORTER 2; stressing the influence of binding partners on protein conformation beyond only primary sequence and solvent interactions.

   Most members of GATA family transcription factors contain one or more zinc finger (zf) domains. It is these zf’s that allow for binding to dsDNA that contain a certain consensus sequence 2. Considering that the GTATACT(T/C) sequence is divergent from other GATA family transcription factors, MED1zf’s zf is understandably variable from the four other classes of GATA-zf’s.

   The free zf domain of MED1zf is similar to the secondary structure of other GATA-family transcription factors’ zf domains. There are four short β-strands and a single α-helix; termed helix 1. Between residues Serine112 to Tyrosine146 there is a significant chemical shift following DNA binding in the second and third β-strands and the loop connecting them 2. The most striking change occurs across 14-residues on the basic tail. This domain which was formerly disordered undergoes conformational change to actually form another alpha helix of 11-residues at the C-terminal tail between residues Valine152 to Lysine 162, 2.

   These sites of conformational change and chemical shift occur after interaction with DNA to form the MED-1:DNA complex. The aforementioned β2 loop-β3 regions on MED1zf contact bases on the 5’ end of the DNA sequence. In addition, Arg-Guanine H-bonding occurs between the Arg124 specific G bases 2. It is true that the most common base-specific interactions are in fact arginine-guanine hydrogen bonds in protein-DNA complexes, but consider that Arg124 is replaced with a leucine in GATA proteins that recognize AGATA. It appears that this otherwise common interaction is the driving force for the specificity of binding2,7.  It might also be asserted that H-bonding confers structural stability to the complex, allowing the effective binding of the protein and DNA to form the MED-1:DNA complex; however this has yet to be directly extrapolated.

   In helix 1, an Isoleucine residue of MED1zf contacts the ATA sequence from the GTATACT sequence and this portion of the consensus sequence is completely conserved 2. The primary difference in sequence recognition between the canonical GATA site and the MED1zf’s recognition site is largely due to the Leu to Arg substitution that allows for the aforementioned Arg-Guanine H-bonding 2. Data confirms that this single Arg residue in the core zf is responsible for the switch in specificity 2. Helix 2 contacts thymines at Thy14-16 that assist in binding, as mutation of these Thymine residues hinders binding affinity. The reasons for this effect on binding affinity have yet to be elucidated 2.

   The MED1zf C-terminal residues form another α-helix that contacts a numerous pyrimidines downstream of the GTATAC site; this is contrary to GATA zinc finger homologues whose C-terminal residues loop and wrap around the DNA and contact the minor groove 2. The formerly disordered C-terminal tail of MED1zf forms an alpha helical structure as a direct result interaction with DNA. The structure discussed can be considered the wild-type structure as the MED1zf bound and unbound conformations are the subject of discussion identified in several species of Caenorhabditis 2.

   As mentioned before, MED1zf is similar but divergent from GATA-type transcription factors. In comparing them, BLAST sequence analysis understandably yields conserved domains with the GATA DNA binding protein8; where the E value was approximately 2e-4.The GATA-1 Chicken (PDB ID: 2GAT) has 53% sequence similarity with MED1zf over the core zinc finger structure and 46% similarity with the basic tail regions2. The reason for conservation of structure goes to the basic function of both proteins where they are involved in transcriptional regulation and must form a protein:DNA complex. However the difference in tail structure and the zinc finger domain may confer a different specificity of binding sites. Recall in MED1zf the unique α-helix at the C-terminus binding to the 3’ pyrimidine rich region or the arginine substitution that confers the Arg-Guanine binding that differs from the other GATA type proteins2. Despite the differences, one significant similarity between MED1zf and GATA proteins such as the chicken GATA-1 can be noticed in an Isoleucine residue in MED1zf involved in recognition of an Adenine-Thymine-Adenine sequence. This particular motif is completely conserved across GATA type zinc finger transcription factors2.

   The discussion of MED1zf’s and the MED-1:DNA complex’s structure brings to light a few key points with regard to MED1zf and protein structure as an overall concept. Firstly, primary structure defines key residues, but overall solvent and binding partner interactions will impact secondary and tertiary structure and therefore its final functional capacity. Secondly, single residues and common interactions might actually be responsible for very unique and divergent interactions; as evidenced in Arg124 - Guanine H-bonding. Also consider the flanking Thymine residues that, if mutated, severely hindered binding affinity 2. Thirdly, the conformational changes that are evidenced post-DNA binding reference the dynamic nature of protein structures. Consider the C-terminal of MED1zf tail that is first disordered, and then forms a highly ordered α-helix that interacts with a downstream pyrimidine rich region 2. Next, consider that the GTATACT sequence, nearly a palindrome, does not show any competitive binding in the inverse conformation even at a 150 molar excess; emphasizing the importance of directionality and the high level of specificity involved 2. Lastly, to take a step back and observe the protein with respect to its homologues and its physiological implications helps define the MED1zf in living organisms with regard to conservation of protein structure and conference of function.

   It is thus important to consider this protein structure as not only as a subset of function or its contributions to understanding the basic science of complex macromolecular interaction, but also as the primary determining factor of function, and therefore the primary determining factor of physiological significance. Structure is inseparable from function, and it is inseparable from biological significance.