Data Retrieval

Cropped section from "Cat's Adventures" by People Too. A  miniature sheet music illustration of several cats having a blast while skiing and snowboarding outside. There's a cat in the final measure of the sheet music carrying a glowing star with a crescent moon shown in the treble clef overhead.

Excerpt from “Cat’s Adventures” by People Too [1].

“We are convinced that the best way to learn how to query a new knowledge graph is to get right into it and play with it…Be bold! Even if your query is too demanding for our server to compute a result, you can’t really break anything” [24].

dblp’s Public SPARQL Service

Fetching RDF triples from dblp’s Knowledge Graph via SPARQL

This next section provides an overview of the approach I used to retrieve the ISMIR papers metadata from the dblp KG via SPARQL. It includes the following:

  • Specific queries used to fetch RDF triples for distinct sets of information (e.g., proceedings, inproceedings, author metadata, and author order)

  • Links to run these queries directly in dblp’s SPARQL query service (for anyone who is interested in testing it out)

  • Access to the raw datasets, including output previews and full download links

Later sections in this report include additional details, such as the various transformations applied to pivot and merge the datasets. A data dictionary with property definitions from the dblp RDF schema is shared to provide a general overview of what the final dataset contains [25]. There are also several helpful external resources discussing dblp in RDF available on blog.dblp.org (many of which are also cited in the report bibliography) as well as a dblp KG Tutorial for readers interested in learning more about the dblp Knowledge Graph used to access the ISMIR papers data [24].

Query 1: Proceedings Metadata

SPARQL Query #1: Retrieve ISMIR Proceedings Metadata from dblp

Broadly speaking, ISMIR conference records can be differentiated into two main categories:

  1. Proceedings: The “parent” records that represent the entire conference volume.

  2. Inproceedings: The individual papers that are part of that volume.

The queries used to retrieve the RDF triples for both the ISMIR proceedings and inproceedings records are closely related. The property dblp:publishedInStream can be used to precisely select all ISMIR conference records. We can then filter these records further by using the dblp:bibtexType property to retrieve all triples for specific types of records (e.g., proceedings and inproceedings). This technique dumps all properties associated with a resource (an ISMIR publication in this case) and its corresponding values.

Query #1 retrieves all triples for ISMIR proceedings published during 2000-2024 (n=551 results).

PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX bibtex: <http://purl.org/net/nknouf/ns/bibtex#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?record ?property ?value
WHERE {
  ?record dblp:publishedInStream <https://dblp.org/streams/conf/ismir> ; 
          dblp:yearOfPublication ?year ;
          dblp:bibtexType bibtex:Proceedings .
  FILTER(?year >= "2000"^^xsd:gYear && ?year <= "2024"^^xsd:gYear)
  ?record ?property ?value .
}

Run this query (long URL; full query string runs in the dblp SPARQL query service)

Query #1 Output

Table 1. Example of Properties and Values for 2000 Proceedings.

Click to expand/collapse table

record

property

value

https://dblp.org/rec/conf/ismir/2000

http://purl.org/spar/datacite/hasIdentifier

_:bn10203589

https://dblp.org/rec/conf/ismir/2000

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

https://dblp.org/rdf/schema#Editorship

https://dblp.org/rec/conf/ismir/2000

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

https://dblp.org/rdf/schema#Publication

https://dblp.org/rec/conf/ismir/2000

http://www.w3.org/2000/01/rdf-schema#label

ISMIR 2000, 1st International Symposium on Music Information Retrieval, Plymouth, Massachusetts, USA, October 23-25, 2000, Proceedings (2000)

https://dblp.org/rec/conf/ismir/2000

https://dblp.org/rdf/schema#bibtexType

http://purl.org/net/nknouf/ns/bibtex#Proceedings

https://dblp.org/rec/conf/ismir/2000

https://dblp.org/rdf/schema#listedOnTocPage

https://dblp.org/db/conf/ismir/ismir2000

https://dblp.org/rec/conf/ismir/2000

https://dblp.org/rdf/schema#numberOfCreators

0

https://dblp.org/rec/conf/ismir/2000

https://dblp.org/rdf/schema#publishedInStream

https://dblp.org/streams/conf/ismir

https://dblp.org/rec/conf/ismir/2000

https://dblp.org/rdf/schema#title

ISMIR 2000, 1st International Symposium on Music Information Retrieval, Plymouth, Massachusetts, USA, October 23-25, 2000, Proceedings

https://dblp.org/rec/conf/ismir/2000

https://dblp.org/rdf/schema#yearOfPublication

2000

Total dataset size: 551 rows × 3 columns

Download the full dataset: dblp_ismir_proceedings_raw.csv

Query 2: Inproceedings Metadata

SPARQL Query #2: Retrieve ISMIR Inproceedings Metadata

To retrieve metadata for the inproceedings ISMIR records, simply change the BibTeX type from bibtex:Proceedings to bibtex:Inproceedings in the SPARQL query. This is the primary dataset I am interested in (n=65,567 results).

PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX bibtex: <http://purl.org/net/nknouf/ns/bibtex#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?record ?property ?value
WHERE {
  ?record dblp:publishedInStream <https://dblp.org/streams/conf/ismir> ;
          dblp:yearOfPublication ?year ;
          dblp:bibtexType bibtex:Inproceedings .
  FILTER(?year >= "2000"^^xsd:gYear && ?year <= "2024"^^xsd:gYear)
  ?record ?property ?value .
}

Run this query (long URL, with full query string)

Similar to Query #1, the results for Query #2 list all properties associated with ISMIR inproceedings records from 2000-2024, with columns for record, property, and value. Entries in the record column indicate a specific inproceedings record (e.g., an individual paper from a conference volume, such as the ISMIR 2023 paper by Castellanos et al. [26] shown in Table 2).

Query #2 Output

Table 2. Example of properties and values for a single ISMIR Paper (Inproceedings).

Click to expand/collapse table

record

property

value

https://dblp.org/rec/conf/ismir/00010CLJ20

http://purl.org/spar/datacite/hasIdentifier

_:bn10198088

https://dblp.org/rec/conf/ismir/00010CLJ20

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

https://dblp.org/rdf/schema#Inproceedings

https://dblp.org/rec/conf/ismir/00010CLJ20

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

https://dblp.org/rdf/schema#Publication

https://dblp.org/rec/conf/ismir/00010CLJ20

http://www.w3.org/2000/01/rdf-schema#label

Woo-Sung Choi et al.: Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation. (2020)

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#authoredBy

https://dblp.org/pid/132/3591

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#authoredBy

https://dblp.org/pid/43/5538

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#authoredBy

https://dblp.org/pid/45/3571

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#authoredBy

https://dblp.org/pid/65/2574-3

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#authoredBy

https://dblp.org/pid/75/5350-1

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#bibtexType

http://purl.org/net/nknouf/ns/bibtex#Inproceedings

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#createdBy

https://dblp.org/pid/132/3591

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#createdBy

https://dblp.org/pid/43/5538

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#createdBy

https://dblp.org/pid/45/3571

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#createdBy

https://dblp.org/pid/65/2574-3

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#createdBy

https://dblp.org/pid/75/5350-1

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#documentPage

http://archives.ismir.net/ismir2020/paper/000046.pdf

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#hasSignature

_:bn10198089

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#hasSignature

_:bn10198090

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#hasSignature

_:bn10198091

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#hasSignature

_:bn10198092

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#hasSignature

_:bn10198093

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#listedOnTocPage

https://dblp.org/db/conf/ismir/ismir2020

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#numberOfCreators

5

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#pagination

192-198

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#primaryDocumentPage

http://archives.ismir.net/ismir2020/paper/000046.pdf

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#publishedAsPartOf

https://dblp.org/rec/conf/ismir/2020

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#publishedIn

ISMIR

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#publishedInBook

ISMIR

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#publishedInStream

https://dblp.org/streams/conf/ismir

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#title

Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation.

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#yearOfEvent

2020

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/rdf/schema#yearOfPublication

2020

Total dataset size: 65,567 rows × 3 columns

Download the full dataset: dblp_ismir_inproceedings_raw.csv

Query 3: Authors Metadata

SPARQL Query #3: Retrieve ISMIR Authors Metadata

To retrieve metadata for the inproceedings ISMIR authors, [update with query changes made] in the SPARQL query. This is the primary dataset I am interested in (n=81,153 results).

PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX bibtex: <http://purl.org/net/nknouf/ns/bibtex#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?record ?creator ?property ?value
WHERE {
  # Start with Inproceedings records in the ISMIR stream
  ?record dblp:publishedInStream <https://dblp.org/streams/conf/ismir> ;
          dblp:yearOfPublication ?year ;
          dblp:bibtexType bibtex:Inproceedings ;
          dblp:hasSignature ?signature .
  FILTER(?year >= "2000"^^xsd:gYear && ?year <= "2024"^^xsd:gYear)
  
  # Get the creator from the signature
  ?signature dblp:signatureCreator ?creator .
  
  # Retrieve all available triples for this creator entity
  ?creator ?property ?value .
}
ORDER BY ?record ?creator ?property

Run this query (long URL, with full query string)

Query #3 lists all properties associated with ISMIR authors from 2000-2024 (see Table 3). In addition to the record, property, and value columns, the output also includes a creator column. Entries in this column indicate a unique inproceedings author (e.g., the example shown in Table 3 lists all properties and associated metadata for Francisco J. Castellanos that is linked to a specific publication [26]).

Query #3 Output

Table 3. Example of properties and values for an ISMIR Inproceedings Author.

Click to expand/collapse table

record

creator

property

value

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

http://purl.org/spar/datacite/hasIdentifier

_:bn29345491

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

http://purl.org/spar/datacite/hasIdentifier

_:bn29345492

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

http://purl.org/spar/datacite/hasIdentifier

_:bn29345494

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

http://purl.org/spar/datacite/hasIdentifier

_:bn29345495

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

https://dblp.org/rdf/schema#Creator

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

https://dblp.org/rdf/schema#Person

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

http://www.w3.org/2000/01/rdf-schema#label

Francisco J. Castellanos 0001

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

http://www.w3.org/2002/07/owl#sameAs

https://orcid.org/0000-0001-9949-5522

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#affiliation

University of Alicante, Department of Software and Computing Systems, Spain

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#creatorName

Francisco J. Castellanos

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#homepage

https://cvnet.cpd.ua.es/Directorio/Home/FichaPersona/114596

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#homepage

https://cvnet.cpd.ua.es/curriculum-breve/es/castellanos-regalado-francisco-jose/10063

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#orcid

https://orcid.org/0000-0001-9949-5522

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#primaryAffiliation

University of Alicante, Department of Software and Computing Systems, Spain

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#primaryCreatorName

Francisco J. Castellanos

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#primaryHomepage

https://cvnet.cpd.ua.es/curriculum-breve/es/castellanos-regalado-francisco-jose/10063

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#webpage

https://es.linkedin.com/in/francisco-josé-castellanos-regalado-b30095226

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

https://dblp.org/rdf/schema#webpage

https://scholar.google.com/citations?user=ugPQHzEAAAAJ

Total dataset size: 81,154 rows × 4 columns

Download the full dataset: dblp_ismir_authors_raw.csv

Query 4: Author Order (dblp Signatures)

SPARQL Query #4: Retrieve ISMIR Inproceedings Author order (dblp Signatures)

The author order for the ISMIR papers can also be retrieved by running a separate query to extract the signature information stored for the records and authors in dblp. Due to the number of ISMIR records with property values that are returned in the broader inproceedings query (Query #2), isolating the signature data (n=7,726 results) made it easier to process and later merge with record-level details. For comparison, Table 6 provides an example of the raw signature output for a single paper, alongside a snapshot of the author names as they appear in the document (Figure 1) [27].

PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX bibtex: <http://purl.org/net/nknouf/ns/bibtex#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?record ?creator ?ordinal
WHERE {
  ?record dblp:publishedInStream <https://dblp.org/streams/conf/ismir> ; 
          dblp:yearOfPublication ?year ;
          dblp:bibtexType bibtex:Inproceedings .
          dblp:hasSignature ?signature ;
  FILTER(?year >= "2000"^^xsd:gYear && ?year <= "2024"^^xsd:gYear)
  ?signature dblp:signatureCreator ?creator ;
             dblp:signatureOrdinal ?ordinal .
}
ORDER BY ?record ?ordinal

Run this query (long URL, with full query string)

Query #4 Output

Table 4. Example of the raw signature data for ISMIR inproceedings authors.

Click to expand/collapse table

record

creator

ordinal

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/pid/75/5350-1

1

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/pid/65/2574-3

2

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/pid/132/3591

3

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/pid/43/5538

4

https://dblp.org/rec/conf/ismir/00010CLJ20

https://dblp.org/pid/45/3571

5

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/216/1535

1

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/54/1887

2

https://dblp.org/rec/conf/ismir/00010F23

https://dblp.org/pid/f/IchiroFujinaga

3

https://dblp.org/rec/conf/ismir/0001BASL21

https://dblp.org/pid/l/JinHaLee

1

https://dblp.org/rec/conf/ismir/0001BASL21

https://dblp.org/pid/163/7773

2

Total dataset size: 7,726 rows × 3 columns

Download the full dataset: dblp_ismir_author_order_raw.csv

Example: Comparison of Document Author Listing with Extracted Author Order Data

Todo

Revisit: Annotate snapshot so that it highlights how the extracted RDF data corresponds to what is shown in the image. Includes:

  • Map PIDs and ordinal values to author names

  • Add a note highlighting the absence of Don Byrd’s name from the dblp data provided for the paper

Snapshot of Document Author Listing

paper_author-list Figure 1. Screenshot of the author listings as shown in the paper [27].

Corresponding Signature Data for the Document (Extracted RDF Data)

Table 5. Query #4 Output: Example of the raw signature output for single paper.

Click to expand/collapse table

record

creator

ordinal

https://dblp.org/rec/conf/ismir/PickensBCDMS02

https://dblp.org/pid/71/43

1

https://dblp.org/rec/conf/ismir/PickensBCDMS02

https://dblp.org/pid/56/4223

2

https://dblp.org/rec/conf/ismir/PickensBCDMS02

https://dblp.org/pid/48/5681

3

https://dblp.org/rec/conf/ismir/PickensBCDMS02

https://dblp.org/pid/33/82

4

https://dblp.org/rec/conf/ismir/PickensBCDMS02

https://dblp.org/pid/58/3427

5

https://dblp.org/rec/conf/ismir/PickensBCDMS02

https://dblp.org/pid/s/MarkBSandler

6