Project Overview¶

Cropped section from "Cat's Adventures" by People Too. — Excerpt from “Cat’s Adventures” by People Too [1].¶

Introduction¶

This project was initially inspired by a request I received to conduct an analysis of the changes over time for the author affiliations and paper topics reflected in the International Society for Music Information Retrieval (ISMIR) conference proceedings. My motivations are both practical (e.g., the requestor suggested I use dblp to access the papers metadata), and born out of a genuine curiosity to learn more about RDF data and how to work with it using SPARQL. An RDF query language, SPARQL leverages the triple structure used in RDF data formats to enable targeted and complex pattern matching across interconnected data sources [2, 3]. Using the ISMIR conference proceedings as a case study, this project investigates how dblp’s new SPARQL query service mediates access to the 2000-2024 papers published in the proceedings volumes. To do this, RDF triples for the data were retrieved from dblp’s Knowledge Graph (dblp KG) via SPARQL. The triple outputs were then transformed into structured tables using Python’s pandas library and merged based on shared key properties. The report that follows details the steps taken to retrieve and transform the data, followed by an overview of the dataset composition and property descriptions. The dataset is intended for public use and feedback is welcome!

ISMIR Conference Proceedings¶

The Origins of ISMIR¶

On August 13, 1999, at a bar in Berkeley, California, J. Stephen Downie and Donald Byrd discovered they shared an interest in organizing a large-scale gathering that would bring together people from different fields to collaborate and explore emerging challenges in music information retrieval (MIR) [4, 5, 6]. Originally called the International Symposium on Music Information Retrieval, the first conference took place in October 2000 in Plymouth, Massachusetts. The name was changed in 2002 (from “Symposium” to “Conference”) and again in 2008 (from “Conference” to “Society”) when the group was formally established as the International Society for Music Information Retrieval. The conference has since been held every fall in locations around the world and includes participants from a wide range of disciplines–from computer science and engineering to library science, musicology, psychology, and many others [5, 7, 8, 9].

Discovery & Access of ISMIR Papers¶

From 2000-2024, an estimated 2,494 peer-reviewed papers have been published in the annual conference proceedings. The papers cover a broad range of MIR-related topics, such as signal processing techniques, music representation models, musicology perspectives, and more. The proceedings have always been made freely available through multiple sources, including open-access bibliographic databases (e.g., dblp), research repositories (e.g., Zenodo), AI-powered academic search engines (e.g., Semantic Scholar), individual conference websites (e.g., MUSIC IR 2000), the ISMIR conference archive portal, and a GitHub repository with version-controlled metadata [7, 10, 11, 12, 13]. While these sources are freely accessible, they differ in the level of detail provided and available functionalities, which may affect their usefulness depending on specific research needs.

For example, dblp offers highly structured, standardized metadata for basic publication information (e.g., titles, authors, venues, and dates), but it does not store full texts [14]. Instead, it links to external repositories like the ISMIR archive and Zenodo. While dblp’s consistent metadata format simplifies bibliographic searches, it lacks additional details like abstracts, full-text content, and keywords. In contrast, Zenodo provides full-text hosting (including support for datasets and software) and DOI assignment for each record. However, because it relies on voluntary uploads, it does not automatically index external records and offers limited author disambiguation. Authors can link their records to ORCID IDs to mitigate name ambiguity, but this is optional [15]. While this report primarily explores the steps taken to access the ISMIR papers metadata on dblp via SPARQL, a follow-up report (forthcoming) further assesses potential enhancements to the dataset, such as incorporating paper abstracts available via other sources and other ideas for future work.

dblp Access Methods¶

The dblp computer science bibliography is an online database focused on providing curated bibliographic metadata for computer science publications. Originally a small experimental web server affiliated with the University of Trier in 1993, dblp has since evolved into a branch of Schloss Dagstuhl – Leibniz-Zentrum für Informatik [14, 16]. As of January 2025, there are over 7.7 million publications indexed in the database [17]. All publication, author, and venue records are freely available and can be accessed through multiple methods:

Search page and API: Visitors can perform basic keyword searches via dblp’s website. The search API provides structured data for publications, persons, and venues in XML and JSON formats [18, 19].
Linked Open Data API: Lightweight RDF data is available via the Linked Open Data API for small requests involving individual bibliographic records and author profiles [20].
Direct website export: A drop-down menu supports direct data downloads from dblp’s webpages for any specific page or record (up to 1,000 items per export). This option doesn’t require scripting or SPARQL knowledge but can become cumbersome for large-scale exports.
Daily updated RDF and XML dumps: The entire dblp database is available to download in RDF (RDF/XML, N-Triples, and Turtle) and XML formats. These data dumps are updated daily but unversioned [20, 21].
Monthly RDF and XML snapshots: Monthly snapshots of the database are also available, archived with persistent URLs for citation and research purposes. While the daily data dumps and persistent snapshots provide direct access, both approaches require additional processing to parse and extract relevant records [16, 20, 21].
Public SPARQL query service: As of September 9, 2024, visitors can explore the dblp Knowledge Graph (dblp KG) through a public SPARQL endpoint. This option provides a fully semantic view of dblp’s bibliographic data, enhanced with OpenCitations for advanced querying and linked data integration [16]. All query results can be downloaded in CSV or TSV formats. While SPARQL is often considered an advanced method for data retrieval, dblp offers a guided tour and interactive user interface with embedded queries to help those without prior experience [22, 23, 24].