Mapping the precursors of modern structural biology

This essay was originally published in the Current Contents print editions December 5, 1994, when Clarivate Analytics was known as the Institute for Scientific Information (ISI).

Watching the emergence of a new frontier of scientific research is fascinating. Once a new sub-specialty begins to show up in the literature, its roots and direction can be traced by combining data from Clarivate Analytics Science Citation Index (SCI) with software for mapping.

An example of an emergent multidisciplinary field is structural biology. In this essay, we will use a sequence of steps to identify the core papers for this relatively new field and then use a multi-dimensional scaling program to create a map showing how these papers are connected by citation relationships. Each step of the process will give us a unique perspective on some of the classic work that has gone into creating this field.

Information Retrieval

The field of structural biology illustrates the challenges encountered using spatial configuration techniques of information retrieval. As the following data indicate, the field touches on many different disciplines. Ordinarily, one may not appreciate the connection that exists between, for example, the molecular basis of blood coagulation and the 3-dimensional structure of poliovirus at 2.9 Å resolution. At first glance, these may not appear to be related topics, but both subjects involve the configurations of biological macromolecules.

Identification of Core Papers

The process begins with the identification of a set of core papers. A knowledgeable researcher might try to do this from personal memory or, preferably, do a keyword search on the term structural biology. Using the SCI file for 1981-93, we retrieved 26 papers containing the term. In a process sometimes called “cycling,” we then created a second set of 539 papers that were cited by those 26. Within these cited references, we selected those cited 500 or more times during the period 1981-93. The latter step was done by a computerized process that automated the look-up procedure. Out of this process, we identified 17 core papers listed in  Note that the listing is chronological, starting with the Haskins paper in 1983 to the Lenardo paper in 1989.

Using the lists of papers that cite the core items, we determined how often each pair of the 17 were cited together, that is, co-cited. If all of these 17 papers had been co-cited, then there would have been about 17 x 17 pairs. However, the actually occurring number was 124. These pairs of core papers are used to create the matrix showing the co-citation relationships (see Figure 1). From this matrix, the map is generated using multidimensional scaling, which, according to Kruskal, is a “set of mathematical techniques that enable a researcher to uncover the `hidden structure’ of databases.”1  The map is shown in Figure 2, and an additional list of papers cited 250 to 500 times is shown in  below.

Configuration of the Map

The multidimensional scaling program can be used to chart the map and specify the x,y coordinates of each point, and also indicate by the distances between points the strength of association. Comparing papers that are clustered together on the map should (and, in this example, does) show similarities of topics. For instance, nodes K and G are closely linked on the map and the link shows a strength of 277. This strong relationship is supported by the fact that they are both about the study of metal-binding domains. A study of the nodes on the map and the corresponding articles will show analogous similarities between other closely linked nodes. For example, in a region pertaining to immunology and another to genetics, both fields use the tools of the new structural biology, and the map may be seen as a multidisciplinary display of notable precursors of the new approach.


The convenience of using a software system such as SCI-Map speaks for itself, but finding the roots of a new field of study has its merits as well. Each step reveals different sets of papers that describe current work as well as the research that was the precursor to that work. Science does have structure, and each of us can draft images of the aspects that interest us.

Dr. Eugene Garfield 
Founder and Chairman Emeritus, ISI


  1. Kruskal J.Multidimensional Scaling. Beverly Hills: Sage Publications, 1978. p. 5.

Table 1. Core papers in the field of structural biology.

A Haskins K, Kubo R, White J, Pigeon M, Kappler J, Marrack P. The major histocompatibility complex-restricted antigen receptor on T cells. 1. Isolation with a monoclonal antibody. J. Exp. Med. 157:1149, 1983.
B Brooks B R, Bruccoleri R E, Olafson B D, States D J, Swaminathan S, Karplus M. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Ch. 4:187, 1983.
C Allen F H, Kennard O, Taylor R. Systematic analysis of structural data as a research technique in organic chemistry. Acc. Chem. Res. 16:146, 1983.
D Meuer S C, Hussey R E, Fabbi M, Fox D, Acuto O, Fitzgerald K A, Hodgson J C, Protentis J P, Schlossman S F, Reinherz E L. An alternative pathway of T cell activation: a functional role for the 50 KD T11 sheep erythrocyte receptor protein. Cell 36:897, 1984.
E Bevilacqua M P, Pober J S, Majeau G R, Cotran R S, Gimbrone M A. Interleukin-1 (IL-1) induces biosynthesis and cell surface expression of procoagulant activity in human vascular endothelial cells. J. Exp. Med. 160:618, 1984.
F Truneh A, Albert F, Golstein P, Schmittverhulst A M. Early steps of lymphocyte activation bypassed by synergy between calcium ionophores and phorbol ester. Nature 313:318, 1985.
G Miller J, McLachlan A D, Klug A. Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. Embo J. 4:1609, 1985.
H Hollenberg S M, Weinberger C, Ong E S, Cerelli G, Oro A, Lebo R, Thompson E B, Rosenfeld M G, Evans R M. Primary structure and expression of a functional human glucocorticoid receptor cDNA. Nature 318:635, 1985.
I Kadonaga J T, Jones K A, Tjian R. Promoter-specific activation of RNA polymerase-II transcription by SP1. Trends Biochem. Sci. 11:20, 1986.
J Caput D, Beutler B, Hartog K, Thayer R, Brownshimer S, Cerami A. Identification of a common nucleotide sequence in the 3?-translated region of messenger-RNA molecules specifying inflammatory mediators. P.N.A.S.U.S. 83:1670, 1986.
K Berg J M. Potential metal-binding domains in nucleic acid-binding proteins. Science 232:485, 1986.
L Bevilacqua M P, Pober J S, Majeau G R, Fiers W, Cotran R S, Gimbrone M A. Recombinant tumor necrosis factor induces procoagulant activity in cultured human vascular endothelium: Characterization and comparison with the actions of interleukin-1. P.N.A.S.U.S. 83:4533, 1986.
M Amit A G, Mariuzza R A, Phillips S E V. 3-Dimensional structure of an antigen-antibody complex at 2.8 ? resolution. Science 233:747, 1986.
N Shaw G, Kamen R. A conserved A-U sequence from the 3? untranslated region of GM-CSF messenger-RNA mediates selective messenger-RNA degradation. Cell 46:659, 1986.
O Lee W, Mitchell P, Tjian R. Purified transcription factor AP-1 interacts with TPA-inducible enhancer elements. Cell 49:741, 1987.
P Sanders M E, Makgoba M W, Sharrow S O, Stephany D, Springer T A, Young H A, Shaw S. Human memory lymphocytes-T express increased levels of three cell adhesion molecules (LFA-3, CD2, and LFA-1) and three other molecules (UCHL1, CDW29, and PGP-1) and have enhanced IFN-y production. J. Immunol. 140:1401, 1988.
Q Lenardo M J, Baltimore D. NF-k-B: a pleiotropic mediator of inducible and tissue-specific gene control. Cell 58:227, 1989.


Table 2. Core papers in the field of structural biology (cited between 250 and 500 times) listed in descending order of total citations.

Kamoun M, Martin P J, Hansen J A, Brown M A, Siadak A W, Nowinski R C. Identification of a human lymphocyte-T surface protein associated with the E-rosette receptor. J. Exp. Med. 153:207, 1981.

Treisman R. Identification of a protein binding site that medicates transcriptional response of the C-FOS gene to serum factors. Cell 46:567, 1986.

Rudd C E, Trevillyan J M, Dasgupta J D, Wong L L, Schlossman S F. The CD4 receptor is complexed in detergent lysates to a protein-tyrosine kinase (pp58) from human lymphocytes-T. P.N.A.S.U.S. 85:5190, 1988.

Seed B. An LFA-3 cDNA encodes a phospholipid-linked membrane-protein homologous to its receptor CD2. Nature 329:840, 1987.

Hogle J M, Chow M, Filman D J. 3-Dimensional structure of poliovirus at 2.9 ? resolution. Science 229:1358, 1985.

Lau L F, Nathans D. Expression of a set of growth-related immediate early genes in BALB-c 3T3 cells: Coordinate regulation with C-FOS or C-MYC. P.N.A.S.U.S.84:1182, 1987.

Seed B, Aruffo A. Molecular cloning of the CD2 antigen, the T cell erythrocyte receptor, by a rapid immunoselection procedure. P.N.A.S.U.S. 84:3365, 1987.

Dame J B, Williams J L, McCutchan T F, Weber J L, Wirtz R A, Hockmeyer W T, Maloy W L, Haynes J D, Schneider I, Roberts D, Sanders G S, Reddy E P, Diggs C L, Miller L H. Structure of the gene encoding the immunodominant surface antigen on the sporozoite of the human malaria parasite Plasmodium falciparum. Science 225:593, 1984.

Veillette A, Bookman M A, Horak E M, Samelson L E, Bolen J B. Signal transduction through the CD4 receptor involves the activation of the internal membrane tyrosine-protein kinase p56lck. Nature 338:257, 1989.

Wagner G, Wuthrich K. Sequential resonance assignments in protein H-1 nuclear magnetic resonance spectra: Basic pancreatic trypsin inhibitor. J. Mol. Biol.155:347, 1982.

Kornblihtt A R, Umezawa K, Vibepedersen K, Baralle F E. Primary structure of human fibronectin: Differential splicing may generate at least 10 polypeptides from a single gene. Embo J. 4:1755, 1985.

Shaw S, Luce G E G, Quinones R, Gress R E, Springer T A, Sanders M A. Two antigen-independent adhesion pathways used by cytotoxic T cell clones. Nature323:262, 1986.

Bazan J F. Structural design and molecular evolution of a cytokine receptor superfamily. P.N.A.S.U.S. 87:6934, 1990.

Pugh B F, Tjian R. Mechanism of transcriptional activation by SP1: Evidence for coactivators. Cell 61:1187, 1990.

Tainer J A, Getzoff E D, Beem K M, Richardson J S, Richardson D C. Determination and analysis of the 2A structure of copper, zinc superoxide-dismutase. J. Mol. Biol. 160:181, 1982.

Gearing D P, King J A, Gough N M, Nicola N A. Expression cloning of a receptor for human granulocyte-macrophage colony-stimulating factor. Embo J. 8:3667, 1989.

Giniger E, Varnum S M, Ptashne M. Specific DNA binding of gal4, a positive regulatory protein of yeast. Cell 40:767, 1985.

Ollis D L, Brick P, Hamlin R, Xuong N G, Steitz T A. Structure of large fragment of Escherichia coli DNA polymerase I complexed with DTMP. Nature 313:762, 1985.

Tsai S Y, Carlstedtduke J, Weigel N L, Dahlman K, Gustafsson J A, Tsai M J, O’Malley B W. Molecular interactions of steroid-hormone receptor with its enhancer element: Evidence for receptor dimer formation. Cell 55:361, 1988.

Selvaraj P, Plunkett M L, Dustin M, Sanders M E, Shaw S, Springer T A. The lymphocyte-T glycoprotein CD2 binds the cell surface ligand LFA-3. Nature326:400, 1987.

Howard F D, Ledbetter J A, Wong J, Bieber C P, Stinson E B, Herzenberg L A. A human lymphocyte-T differentiation marker defined by monoclonal antibodies that block E-rosette formation. J. Immunol. 126:2117, 1981.

Hollenberg S M, Giguere V, Segui P, Evans R M. Colocalization of DNA binding and transcriptional activation functions in the human glucocorticoid receptor. Cell49:39, 1987.

Umesono K, Murakami K K, Thompson C C, Evans R M. Direct repeats as selective response elements for the thyroid hormone, retinoic acid, and vitamin D3 receptors. Cell 65:1255, 1991.

Jorgensen W L. Quantum and statistical mechanical studies of liquids. 10. Transferable intermolecular potential functions for water, alcohols, and ethers: Application to liquid water. J. Am. Chem. Soc. 103:335, 1981.

Umesono K, Evans R M. Determinations of target gene specificity for steroid thyroid-hormone receptors. Cell 57:1139, 1989.
Keegan L, Gill G, Ptashne M. Separation of DNA binding form the transcription-activating function of a eukaryotic regulatory protein. Science 231:699, 1986.

Furie B, Furie B C. The molecular basis of blood coagulation. Cell 53:505, 1988.
Kissinger C R, Liu B S, Martinblanco E, Kornberg T B, Pabo C O. Crystal structure of an engrailed homeodomain DNA complex at homeodomain DNA interactions. Cell 63:579, 1990.

Meuer S C, Hussey R E, Hodgdon J C, Hercend T, Schlossman S F, Reinherz E L. Surface structures involved in target recognition by human cytotoxic lymphocytes-T. Science 218:471, 1982.