Institutional Unification: Getting the Full Picture

The world of research and scholarship, needless to say, is sprawling and complex. Under the umbrella of large universities and corporate or government institutions, various smaller laboratories, research facilities, installations and outposts all add to the accumulation of knowledge and report their findings in published papers. The individual names of these facilities, however, as listed in publications, may not overtly reflect their affiliation with the larger, parent body.

For the advancement of knowledge and technology, this vast landscape of affiliated institutions is entirely appropriate. On the other hand, for Clarivate Analytics – an organization whose mission includes the precise and comprehensive tabulation of publication and citation statistics and the attribution of credit to the ultimately responsible institution – the tangle of names presents a challenge.

The answer, underlying the Web of Science™ and other Clarivate Analytics tools and resources, is a thorough and constant process of identifying, disambiguating and unifying name variants. Embodied most visibly in the “Organization-Enhanced” feature in the Web of Science, this institutional unification allows users to search the database for organizations’ main or “preferred” names as well as variant names. This affords the most comprehensive and accurate search results, reflecting the entire output and impact of institutions.

In analytic and benchmarking tools such as InCites, Institutional Profiles, and Essential Science Indicators, the process of unification ensures that the publication- and citation-based metrics for large institutions reflect the contributions of all affiliated or component facilities.

Various Variants

The example of one institution offers only a hint of the vast range of names that may be subsumed under a parent body. The government organization Fisheries and Oceans Canada, for instance, encompasses facilities whose indexed names, based on the author listings in the original papers, might include “Fisheries Canada Oceans,” “Dept Fisheries Oceans Sci Oceans Environm,” “Fed Dept Fisheries Oceans” and many more, including seemingly unrelated but affiliated installations such as “Bedford Institute.”

Faced with this array of names associated with just one institution, not to mention countless other comparable tangles representing organizations around the world, database specialists at Clarivate Analytics painstakingly oversee the compilation of name and address variants. Through this process of collating Web of Science publication addresses, the specialists are able to create variant rules that will attribute publications to a specific organization or organizations in a specific geographic location. These rules guide the process by which address variations in current and future indexed Web of Science papers are automatically treated in the database.

Examples of these rules include instances of “Wharton School” being unified under the “University of Pennsylvania,” or “University Hospital” in Arlington, Texas, being assigned to the “University of Texas, Arlington.”

Without this unification, a user who wanted to find all Web of Science publications associated with a given organization would have to search multiple times for variant names and then manually aggregate the results.

Instead of the user facing an exhaustive manual process, the creation of a unified “Organization-Enhanced” record allows the user to search for all the publications from a given organization with one action.

To return to our previous example: An “Organization-Enhanced” search on “Fisheries & Oceans Canada”, with the “Select from Index” option, gives the user a choice: To retrieve all papers from the main organization, or to scan a list of all associated variant names (including “Bedford Inst” and other affiliated centers). With the latter option, by using the “add” button next to each entry, the user can select one or more desired variants. This operation, of course, is ideal for a focused search on research being done in specific centers within the parent organization.

An Ongoing Process

The Web of Science’s extensive store of variant names and addresses now covers more than 4,000 institutions worldwide and is reflected in upwards of 35 million records. And the store is growing all the time: The process of unification is a constant, year-round effort. In some instances, the process is undertaken according to a specific need, such as a request from a customer or a given organization.

At the heart of institutional unification for the Web of Science and other resources is the WAAN (Web Application for Address Normalization) system. As newly published papers and other source items are indexed in the Web of Science, all of the listed information concerning institutions and addresses is imported into WAAN. There, the addresses are unified – in some cases, via automated analysis of text strings in order to find possible variants and attribute them to an organization where appropriate.

In turn, the unified addresses are included in the processing of data for InCites, Essential Science Indicators, and other tools, ultimately finding their way to the Web of Science to deepen the “Organization-Enhanced” search utility.

To ensure that Web of Science users can retrieve all pertinent publications from a given organization or its sub-units, and that analytic tools such as InCites and Institutional Profiles reflect the totality of an organization’s output, unification is an essential and unceasing process.

In addition to aiding Web of Science users, the process of unification benefits the institutions themselves. A complete and accurate accounting of output and impact can contribute substantially to an organization’s performance in rankings and other evaluative exercises that include publication- and citation-based metrics. Comprehensive unification means that no key institutional components, or their contributions, will be omitted.