Web of Science Big Data Blooms in Bloomington

This November, we had the opportunity to partner with the KnowledgeLab at the University of Chicago and the I-School of Indiana University to host a workshop with over 40 of our colleagues and customers to discuss research approaches to using our data. More than 8,500 scholarly articles based on Web of Science data have been published in the past 15 years by researchers in myriad fields; and with the recent explosion in big data technologies, this trend is sure to continue. Working with these researchers on ways to improve both our data services and their research was the inspiration for the event.

Attendees joined us from all over North America and as far away as Switzerland and Poland for two days of discussion, presentations, networking, and hacking with Web of Science data. The attendees included representatives from top university big data research labs, Key Opinion Leaders, development partners, and our expert colleagues. Check out our ‘lighting talks’ covering in-depth look at the history of, and technology powering, Web of Science.

The second day was really the highlight with the Hackathon. We provided everyone with a custom Web of Science dataset for the event and split up into small teams to focus on digging into the data. The teams dove in and brought all kinds of tools to bear on the dataset including data analytics and compute power using both the KnowledgeLab Data Enclave and the IU secure data desktop as well as visualizations in commercial tools like Tableau. In the collaborative spirit of the event, folks were sharing their own custom code from GitHub and immediately realized how much time they could save each other by sharing tools and approaches to things like disambiguation, recommendations, similarity metrics, SQL schemas, XML parsers, matching IDs, gender identification, and data format conversions.

We had wide-ranging discussions on many topics throughout the workshop, but the following stand out as perhaps the top takeaways and most interesting food for thought:

Our customers use multiple datasets—full text publisher data, usage data, PubMed and other freely available meta-data—but the Web of Science dataset is the standard they rely on for the broadest, deepest, and most authoritative record of scholarly research output. It is the scale, scope, and thoroughness of Web of Science that makes it an essential big data research tool.
Attendees of the event represented a Data Scientist Researcher user persona with a focus on technical software tools and data needs quite different than our more typical end-user researcher or librarian. The Web of Science product team is continually working to accommodate the needs of a broad range of customers and use cases.
Despite many of the attendees working for “competing” labs and universities, there was a highly open and collaborative spirit to the interactions with several teams starting new projects that they will continue after the event.
Many of our users are duplicating effort on core data cleanup tasks—disambiguation, ID matching, parsing, data format conversion, etc. Disambiguation of author names is perhaps the most substantial core data science challenge. We are working closely with this community to find ways we can better support them to help save time and money and enable them to focus on their core research—value Clarivate Analytics always strives to deliver to our customers.

This was a great two days and many people walked away with new collaborations started and plans to follow up and continue to learn from one another—connections they would very likely not have made without the chance to meet at this workshop.

This event is just one example of the way the Clarivate Analytics Web of Science team is engaging with our customers to build a vibrant user community. I would love to hear from other Web of Science users on ways that we can work with and support them in their citation-based big data analytical research projects. Feel free to reach out to me via email for further information.