Counting is easy; metrics are difficult

Counting is easy; metrics are difficult
by Marie McVeigh
Product Director
Science Research Connect

In assessment and metrics, it is increasingly the case that anyone can produce a count: a count of views, a count of shares, a count of comments, a count of downloads from this site or from that repository, a count of link-outs and link-backs and social activity. When information is online in vast quantity with easy accessibility, and the scholarly network is generously open, anyone can count. Counting is easy. Metrics for evaluation are different, like evidence and anecdote are different.


Three characteristics distinguish counting from metrics:

  1. A known and curated corpus of materials
  2. Rigor and consistency in data capture
  3. Objectivity of the output values


A curated corpus:

Every chef knows that the quality of a meal is determined by the quality of the ingredients. An excellent recipe will produce results that are visually appealing, but that’s not enough; you need to know what went into the dish.


Bibliographic data and metrics from Clarivate Analytics are not an add-on or by-product of another business interest. The importance of high quality, reliable data and objective metrics were baked in from the very start. Eugene Garfield noted in his 1955 paper “Citation Indexes for Science” (Science vol. 122: 108-111. doi: 10.1126/science.122.3159.108.) that a key question for the compilation of the Citation Index was “the selection of the periodicals to be covered in order to obtain citations”. Selection is not a separate question from the production of the data, it is a fundamental decision about the quality of the output.The selection, management and identification of sources is critical because these are the scholars, scientists and researchers whose citations create the metrics.


Clarivate Analytics’s publication valuation process is long established, and has resulted in a “Core” of regional, national, and international sources that reflect the global scholarly community in every topic. Selection considers howeach source can contribute to the assessment of other sources, because of the stringency of its Editorial oversight, or because of its ability to reflect the novel perspective of a topic or region. The index must have breadth, but it must also have authority. The Web of Science Core Collection includes over 18,400 journals, books, and proceedings from 106 countries, and encompassing the full range of scholarly and scientific research.


Once selected, the bibliographic and citation identity of each publication is maintained through the careful review of the source materials themselves. The overall content, its current and distinguishable relationship with other publications, both within and outside of the Clarivate Analytics source files are noted to establisha unique, title-centric algorithm for the resolution of citations. Sections and content types are reviewed to support our cover-to-cover indexing of all substantive content.


Rigor and consistency of data

For a measure to be valid, the instrument must be precise. Citation data capture at Clarivate Analytics is based on heuristic use of an historic backfile of 1.4 billion references. What matters, though, is not the scale of the number – a search retrieving millions of results is so commonplace that it inures us to large numbers – but the scale by which this complexity is reduced due to data management. The goal is not to capture a LOT of references, but to capture every meaningful reference only once, either as a source item or as a citation; any subsequent occurrence establishes a new relationship between two published works. 1.4 billion citations are compressed roughly five-fold to 220 million novel or previously captured citations.Variations are not discarded as errors, nor erased as irrelevant, but retained in the primary data to constantly inform and continuously tune both capture and linking.


A similar process of standardized capture and additional identity resolution is applied to institution names. The address as given by the authors in their publication is further identified as associated with a parent institution or other organizational level through the Organization Enhanced database. Similar to the citation index file, these data are both informed by observed variables in the original literature, but they are also subjected to review and resolution to allow effective aggregation and reporting.



When everyone is generating numbers, the field is quickly flooded by players who have too little at stake in the quality of their product, or those that have too much at stake in the results themselves. Web search and social metrics don’t need to be accurate, or stable, or reproducible.They only need to be interesting at the time you’re reviewing them. Publishers who provide metrics – whether article-level indicators of “interest” or journal-level indicators of impact – use those metrics to support the value of their own journals, to authors, to libraries, to the next Big Deal. They have an inescapable interest in how a metric supports the value of their content.


Clarivate Analytics is uniquely positioned to create high value because our business is dependent on the quality of the data and metrics we produce, NOT on the value of the output metric for any journal, any institution, any publisher, or any organization.


If you use metrics to support and guide your decision making, you need authoritative, independent, informed metrics based on accurate data. The curated entries in the Web of Science Core Collection, accurate data capture and standardization, and the objectivity of the resulting values ensure metrics you can trust.

Speak to our team

Want to learn more about our range of products?

Contact us


Accelerating innovation