CoVSpikeaglycoprotein
Prefusion 2019-nCoV Spike Glycoprotein - Single Receptor-Binding Domain up (PDB ID: 6VSB) from Homo sapiens
Created by: Alex Adatsi
          The Prefusion 2019-nCoV Spike Glycoprotein (PDB ID: 6VSB) of the novel coronavirus, 2019-nCoV, discovered at the end of December 2019, is a key protein in allowing the virus to enter the somatic cells of certain animals and namely humans (Homo sapiens). This specific coronavirus is responsible for COVID-19, a disease that has grown to be a pandemic-level threat (1). The disease has disrupted normal life across the world and now requires several precautions to be taken by the public to mitigate its spread. To stop the spread of the virus completely, a vaccine is necessary. However, before a vaccine can be produced, researchers must first understand the structure of the virus, starting from its proteins. It is likely that the key to the production of the vaccine lies in the spike glycoprotein of the virus.
          As stated earlier, the spike glycoprotein is found in the species, Homo sapiens, following infection with the novel coronavirus. To determine the crystal structure of the glycoprotein, researchers employed cryogenic electron microscopy with a 3.5 angstrom resolution (2). The aggregation state was a particle and the reconstruction method was a single particle in terms of experimental data (2) The clashscore value was 14, the Ramachandran outliers had a percentage value of 0.2%, and the sidechain outliers had a percentage value of 0.5%. The spike glycoprotein of the novel coronavirus is trimeric (3), meaning it consists of three smaller, simpler main chains. They are classified as A, B, and C, and they have a total atom count of 22954 and a total residue count of 2905. A contributes 959 residues and 7311 total atoms to the crystal structure (2). B contributes 973 residues and 7362 total atoms, while C contributes 973 residues and 7327 total atoms. There is only one key ligand involved in the crystal structure: 2-acetamido-2-deoxy-beta-D-glucopyranose (2).
          The molecular weight of the spike glycoprotein is 440.39 kDa, but the estimated weight is 318.88 kDa (2, 4). The three chains that the spike glycoprotein is made from consist of an estimated total of 3868 amino acids (4). There is only one unique residue in its entire structure (2). It has an isoelectric point of 6.62 (4). As described earlier, the spike glycoprotein has a trimeric structure, consisting of chains A, B, and C. It consists of two subunits, known as S1 and S2 (5). The first subunit, S1, serves as the receptor-binding domain. The second, S2, is responsible for the fusion of viral membrane and the host cellular membrane. Together, they allow the virus to enter the somatic cells of Homo sapiens through receptor-mediated endocytosis. More specifically, the virus attaches to the angiotensin-converting enzyme 2 (ACE2) receptors found on the plasma membranes of certain somatic cells (6). The crystal structure of the spike glycoprotein is not complete, as there are several residues throughout the amino acid sequence that are missing in the 3-D projection, especially towards the C-terminal end of the sequence (2).
          The primary amino acid sequence of the spike glycoprotein as described earlier, consists of nearly 4000 amino acids. The primary sequence of the S1 subunit, the receptor-binding domain, consists of a 193 amino acid sequence (N318-V510) that plays a key role in neutralizing antibodies (7). As for the secondary structure, the spike glycoprotein displays alpha helices, beta sheets, 3/10 helices, and random coils. The spike glycoprotein structure consists mainly of alpha helices towards the surface of the virus's body, while there are mostly beta sheets towards the outer region where receptor binding takes place (8). Random coils are dispersed throughout the structures of each subunit. There are hydrophobic regions, acidic regions, and basic regions throughout the structure of the spike glycoprotein, but there are few uncharged polar regions. About 40% of the spike glycoprotein is random coils, while about 30% is beta sheets and about 25% is alpha helices. Only about 5% is 3/10 helices (8). The significant domains of the S2 subunit consist of 12-14 strands for domain II and 6 for domain III (5). The tertiary structure of the 2019-nCoV spike glycoprotein is in the prefusion conformation with a single receptor-binding domain in the up state (2, 3). As for the quaternary structure, individual spike glycoproteins do not associate together. Rather, they are bound to the viral envelope and act individually in terms of receptor binding.
          As described earlier, receptor-mediated endocytosis by way of the ACE2 receptor is required to allow the virus to enter somatic cells. The transmembrane spike glycoprotein harbors a furin cleavage site at the boundary between the S1 and S2 subunits, which is processed during biogenesis (9). Once the ACE2 receptor is bound to, the virus undergoes an irreversible conformational change and the virus then enters the cell. It then begins to override translation in the cell and the cell begins to produce more coronaviruses. They then leave the cell through exocytosis and spread to other cells, advancing the disease. Important residues include: F318 to F541. This range of amino acids designate the receptor-binding domain for the S1 subunit. Functionally important residues include: A684, amino acids 14-305, 788-806, and 1237-1273 (8, 3). A684 is part of the furin cleavage site, which has been shown to be essential for the infection of human lung cells. Amino acids 14-305 make up the N-terminal domain, 788-806 make up the fusion peptide, and 1237-1273 make up the cytoplasm domain. There are specific hydrogen-bonding locations which occur between D113 and R41, D516 and K85, D520 and K82, and T125 and R14. There is only one key ligand associated with the virus: 2-acetamido-2-deoxy-beta-D-glucopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose, which is involved in N-glycosylation. It allows the coronavirus to override glycosylation of the host cell (2, 3). It is bound at N717.
          There are three alternative conformations for the spike glycoprotein. Their PDB IDs are: 6XKL, 6XF6, and 6VYB. 6XKL is the same functionally as 6VSB, except that this is an experimental version in which HexaPro has been added to it. In this variant, six beneficial proline substitutions exhibit higher expression than the parental 6VSB protein as well as the ability to withstand heat stress, storage at room temperature, and three freeze-thaw cycles (2). It retains the prefusion spike conformation. 6XF6 is the same functionally as 6VSB, even with the receptor-binding domain in the up state, except that it is biotinylated. 6VYB is different in its conformation in that it is in the open state, meaning that the receptor is accessible due to the receptor binding domain being changed from its up state, as seen in 6VSB (2).
          The PSI-BLAST revealed several proteins with similar primary structure to the spike glycoprotein, and the Dali Server offered several proteins with similar tertiary structures. As for proteins with similar primary structures, three notable comparisons were the MERS Spike Glycoprotein (PDB ID: 6L8Q_B), the Spike Glycoprotein [Murine coronavirus] (PDB ID: 3R4D_B), and the Ebola Virus Makona Glycoprotein 2 [Ebola virus - Zaire (1995)] (PDB ID: 6DZL_D) (10). These proteins had E-values of 7e-67, 2e-34, and 0.004, respectively. As for proteins with similar tertiary structures, notable comparisons were the Crystal Structure of Agrocybe aegerita lectin AAL complexed with Thomsen-Friedenreich antigen (PDB ID: 3AFK), the SPRY domain-containing SOCS box protein 2 (SSB-2) (PDB ID: 2AFJ), and the N-terminal NC4 domain of collagen IX (PDB ID: 2UUR). These proteins had Z-score values of 9.2, 2.2, and 2.6, respectively.
          Of these proteins, an especially good comparison would be the Ebola Virus Makona Glycoprotein 2 of Homo sapiens, since its tertiary structure is involved in the endocytosis of a completely different type of virus. As indicated by the BLAST results, the primary sequences between the Ebola glycoprotein and the spike glycoprotein are remarkably similar, as the E value is below 0.05. As for the secondary structure, the Ebola protein consists of alpha helices and a high number of beta sheets, as well as a few random coils. The tertiary structure, like the spike glycoprotein, is trimeric and is involved with the endocytosis of the Ebola virus. However, the Ebola virus undergoes what is known as macropinocytosis by way of the Ebola glycoprotein, which is slightly different from the endocytosis of the coronavirus. In addition, the target receptor for the Ebola glycoprotein on somatic cells is known as the NPC1 receptor, which is a membrane protein typically associated with cholesterol regulation (12). As for the quaternary structure of the Ebola glycoprotein, the glycoproteins are spaced far from one another, like those of the coronavirus.
          In conclusion, the understanding of the spike glycoprotein is significant and of high priority, as scientists believe the key to ending the COVID-19 pandemic through the development of a vaccine lies in its function. To improve the process, scientists could potentially try to find ways to elucidate the structure of the glycoprotein by finding and adding all the unknown residues to the structure.