Cited title unification

This is the first in a series of editorial comments, written especially for those users of the Journal Citation Reports (JCR) with an interest in deepening their understanding of that product.

The topics addressed in this series all arise from the comments we receive regarding JCR data and its appropriate use. Our goal is to answer commonly asked questions, to provide new ideas to help our users develop cogent journal evaluation methods, and to remind users and critics alike about the many JCR applications.

Cited Title Unification or Making a Molehill out of a Mountain

Key to an understanding of the value of a citation index is the recognition that the birth of citation indexing was closely tied to a desire to apply machine automation to the organization and management of scientific literature. Machine-generated indexing offered a way of reducing the amount of human intervention for subject classification and indexing. The computer promised a reduction in the processing time required for incorporation of a bibliographic record into a computerized database as well. (For more on this, read our essay on the history of citation indexing).

But by virtue of its incorporation into a database or machine-readable file, the bibliographic record must be standardized in order to facilitate and maximize retrieval of results. A simple example of how standardization might be required can be seen in the various abbreviations for journal titles in records produced from a wide variety of journals and publishers. For instance, the journal title may be referenced as J CHROM or it may read J CHROMATOGR. Different disciplines, even different publishing organizations, have established their own bibliographic styles and practices over time. How does a database producer impose order on a widely divergent set of records?

Database producers, such as Clarivate Analytics, generally have established processes for imposing standardization on the various data, known as unification. In particular, journal title unification may be defined as the process whereby a computer locates data strings that match known variant title spellings and updates them to an associated standard or preferred title spelling.

Consider that approximately 25 million citations are processed annually for inclusion in the Clarivate Analytics database. About 14 million of these citations refer to one of the nearly 6500 journals listed in the Journal Citation Reports. The remaining 10-11 million citations are distributed over roughly 1.5 million titles, including cited journals, cited books as well as individually cited chapters, patents, conversations, etc.

For some products, such as the Journal Citation Reports (JCR), data unification represents a significant aspect of its value to the user. In the absence of some form of unification, citations that have some variant form of a journal title could be passed over or misattributed, thus affecting the reported citation frequency of the journal and hence any subsequent perception of the journal’s influence. Yet ironically, as we will see, the subtitle of this segment, Making a Molehill Out of a Mountain, has two meanings. Clarivate Analytics editorial experts must correlate tens of millions of cited titles with a set of standard abbreviations, a mountain of effort but an important aspect of database quality. The retrieval of citations associated with a given journal title and its associated abbreviation has direct bearing on the ranking that journal may receive in the JCR. Yet we may go to great extremes to attribute a handful of additional cites to JCR journals only to see resulting averages, such as the impact factor, altered by perhaps only a tenth of a percentage point. That represents the molehill of our title. Unification is necessary to the value of the product but its final impact may be less than would appear justified.

Because the JCR dataset is more narrowly circumscribed than that of the citation indexes and because of the statistical calculations that must be performed on the data, the extent of unification in the JCR is greater than in the Science Citation Index and Social Sciences Citation Index.

The time and effort involved in normal system maintenance is compounded by the time and effort required to seek out those problems specific to this resource. The JCR unification dictionary lists approximately 200,000 cited variants for roughly 15,000 journals that either are currently covered or were covered at one time in Clarivate Analytics products. There is an average of twenty cited variants per journal listed in this dictionary. (Some journals are listed with more than 100 variants.)

Any and all of the following can affect the standardization and unification of JCR data:

  • bibliographic standards and accuracy in published references
  • bibliographic standards in journal title presentation on covers, spines, copyright pages, headers or footers
  • Clarivate Analytics data entry guidelines for keying references
  • Clarivate Analytics title abbreviations in our source journal records
  • Clarivate Analytics policies on source journal record maintenance
  • cited title unification during JCR production
  • ambiguous cited titles
  • the potential for human error in all phases of presenting and processing journal titles

It is easy to conceive how authors, journal editors and Clarivate Analytics data entry personnel could make simple keying errors when processing citations, but these kinds of errors are often anticipated and caught during regular automated and manual checks in our database load and JCR production systems. In fact, we have a rating of more than 98% in our accuracy of input. More significant unification problems tend to stem from exceptional situations and deviations from bibliographic standards. It seems that every possible exception that might occur in the presentation and citation of a journal title will occur at some point. Such exceptions can cause conflicts in JCR-related systems.

For example, to process cited titles, we use a twenty character field length. Occasionally, when following standard abbreviation practices, that field length is not adequate for inclusion of all the discriminating information for a longer title. Imagine a journal with the title, International Journal of Manufacturing and Production Systems. Such a title might appear within the Clarivate Analytics system as INT J MANUF PROD S. But imagine that another title is considered for inclusion within the database with a very similar title, the International Journal of Manufacturing and Production Services. It would be easy for the latter to be abbreviated in an identical fashion as the first title. In discovering such a conflict, however, our handling would involve a slight modification within the system of the first title’s abbreviation so that the title would appear as INT J MANUF PROD SYS. Such handling might not perfectly meet ISO standards for such material but it would still eliminate a great many of the variant citations.

In another case, the Journal of General Microbiology changed title to Microbiology which brought it into conflict with the translation journal for the Russian Mikrobiologiya. To handle these types of difficulties, a new JCRproduction table — the new JCR Homograph Table — will compare volume and publication year to confirm which would be the correct Clarivate Analytics association. MICROBIOL-UK will be the Clarivate Analytics designation for the former Journal of General Microbiology while MICROBIOLOGY+ represents our designation for the latter. (Note: The plus sign on this last abbreviation indicates that cites to the original language journal are unified with cites to the translation title.)

Users have a role in our unification process as well. They frequently perform an invaluable service when they identify ongoing, new or potential problems with our methods of coping with irregularities — and we are deeply grateful when these are brought to our attention.

We are also appreciative when publishers consult us on title changes they may be considering. Such title changes may adversely affect JCR production by creating an unusual situation or by causing a conflict with other journal title abbreviations. By working with our users and with publishers, we can maintain the high level of precision and quality in our citation databases and in the data analysis that appears in the JCR.

This essay was prepared by: Janet RobertsonJCR Project Coordinator, Clarivate Analytics