Data Retrieval¶
Excerpt from “Cat’s Adventures” by People Too [1].¶
“We are convinced that the best way to learn how to query a new knowledge graph is to get right into it and play with it…Be bold! Even if your query is too demanding for our server to compute a result, you can’t really break anything” [24].
dblp’s Public SPARQL Service¶
Fetching RDF triples from dblp’s Knowledge Graph via SPARQL¶
This next section provides an overview of the approach I used to retrieve the ISMIR papers metadata from the dblp KG via SPARQL. It includes the following:
Specific queries used to fetch RDF triples for distinct sets of information (e.g., proceedings, inproceedings, author metadata, and author order)
Links to run these queries directly in dblp’s SPARQL query service (for anyone who is interested in testing it out)
Access to the raw datasets, including output previews and full download links
Later sections in this report include additional details, such as the various transformations applied to pivot and merge the datasets. A data dictionary with property definitions from the dblp RDF schema is shared to provide a general overview of what the final dataset contains [25]. There are also several helpful external resources discussing dblp in RDF available on blog.dblp.org (many of which are also cited in the report bibliography) as well as a dblp KG Tutorial for readers interested in learning more about the dblp Knowledge Graph used to access the ISMIR papers data [24].
Query 1: Proceedings Metadata¶
SPARQL Query #1: Retrieve ISMIR Proceedings Metadata from dblp¶
Broadly speaking, ISMIR conference records can be differentiated into two main categories:
Proceedings: The “parent” records that represent the entire conference volume.
Inproceedings: The individual papers that are part of that volume.
The queries used to retrieve the RDF triples for both the ISMIR proceedings and inproceedings records are closely related. The property dblp:publishedInStream can be used to precisely select all ISMIR conference records. We can then filter these records further by using the dblp:bibtexType property to retrieve all triples for specific types of records (e.g., proceedings and inproceedings). This technique dumps all properties associated with a resource (an ISMIR publication in this case) and its corresponding values.
Query #1 retrieves all triples for ISMIR proceedings published during 2000-2024 (n=551 results).
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX bibtex: <http://purl.org/net/nknouf/ns/bibtex#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?record ?property ?value
WHERE {
?record dblp:publishedInStream <https://dblp.org/streams/conf/ismir> ;
dblp:yearOfPublication ?year ;
dblp:bibtexType bibtex:Proceedings .
FILTER(?year >= "2000"^^xsd:gYear && ?year <= "2024"^^xsd:gYear)
?record ?property ?value .
}
Run this query (long URL; full query string runs in the dblp SPARQL query service)
Query #1 Output¶
Table 1. Example of Properties and Values for 2000 Proceedings.
Click to expand/collapse table
record |
property |
value |
|---|---|---|
_:bn10203589 |
||
ISMIR 2000, 1st International Symposium on Music Information Retrieval, Plymouth, Massachusetts, USA, October 23-25, 2000, Proceedings (2000) |
||
0 |
||
ISMIR 2000, 1st International Symposium on Music Information Retrieval, Plymouth, Massachusetts, USA, October 23-25, 2000, Proceedings |
||
2000 |
Total dataset size: 551 rows × 3 columns
Download the full dataset: dblp_ismir_proceedings_raw.csv
Query 2: Inproceedings Metadata¶
SPARQL Query #2: Retrieve ISMIR Inproceedings Metadata¶
To retrieve metadata for the inproceedings ISMIR records, simply change the BibTeX type from bibtex:Proceedings to bibtex:Inproceedings in the SPARQL query. This is the primary dataset I am interested in (n=65,567 results).
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX bibtex: <http://purl.org/net/nknouf/ns/bibtex#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?record ?property ?value
WHERE {
?record dblp:publishedInStream <https://dblp.org/streams/conf/ismir> ;
dblp:yearOfPublication ?year ;
dblp:bibtexType bibtex:Inproceedings .
FILTER(?year >= "2000"^^xsd:gYear && ?year <= "2024"^^xsd:gYear)
?record ?property ?value .
}
Run this query (long URL, with full query string)
Similar to Query #1, the results for Query #2 list all properties associated with ISMIR inproceedings records from 2000-2024, with columns for record, property, and value. Entries in the record column indicate a specific inproceedings record (e.g., an individual paper from a conference volume, such as the ISMIR 2023 paper by Castellanos et al. [26] shown in Table 2).
Query #2 Output¶
Table 2. Example of properties and values for a single ISMIR Paper (Inproceedings).
Click to expand/collapse table
record |
property |
value |
|---|---|---|
_:bn10198088 |
||
Woo-Sung Choi et al.: Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation. (2020) |
||
_:bn10198089 |
||
_:bn10198090 |
||
_:bn10198091 |
||
_:bn10198092 |
||
_:bn10198093 |
||
5 |
||
192-198 |
||
ISMIR |
||
ISMIR |
||
Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation. |
||
2020 |
||
2020 |
Total dataset size: 65,567 rows × 3 columns
Download the full dataset: dblp_ismir_inproceedings_raw.csv
Query 3: Authors Metadata¶
SPARQL Query #3: Retrieve ISMIR Authors Metadata¶
To retrieve metadata for the inproceedings ISMIR authors, [update with query changes made] in the SPARQL query. This is the primary dataset I am interested in (n=81,153 results).
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX bibtex: <http://purl.org/net/nknouf/ns/bibtex#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?record ?creator ?property ?value
WHERE {
# Start with Inproceedings records in the ISMIR stream
?record dblp:publishedInStream <https://dblp.org/streams/conf/ismir> ;
dblp:yearOfPublication ?year ;
dblp:bibtexType bibtex:Inproceedings ;
dblp:hasSignature ?signature .
FILTER(?year >= "2000"^^xsd:gYear && ?year <= "2024"^^xsd:gYear)
# Get the creator from the signature
?signature dblp:signatureCreator ?creator .
# Retrieve all available triples for this creator entity
?creator ?property ?value .
}
ORDER BY ?record ?creator ?property
Run this query (long URL, with full query string)
Query #3 lists all properties associated with ISMIR authors from 2000-2024 (see Table 3). In addition to the record, property, and value columns, the output also includes a creator column. Entries in this column indicate a unique inproceedings author (e.g., the example shown in Table 3 lists all properties and associated metadata for Francisco J. Castellanos that is linked to a specific publication [26]).
Query #3 Output¶
Table 3. Example of properties and values for an ISMIR Inproceedings Author.
Click to expand/collapse table
record |
creator |
property |
value |
|---|---|---|---|
_:bn29345491 |
|||
_:bn29345492 |
|||
_:bn29345494 |
|||
_:bn29345495 |
|||
Francisco J. Castellanos 0001 |
|||
University of Alicante, Department of Software and Computing Systems, Spain |
|||
Francisco J. Castellanos |
|||
https://cvnet.cpd.ua.es/curriculum-breve/es/castellanos-regalado-francisco-jose/10063 |
|||
University of Alicante, Department of Software and Computing Systems, Spain |
|||
Francisco J. Castellanos |
|||
https://cvnet.cpd.ua.es/curriculum-breve/es/castellanos-regalado-francisco-jose/10063 |
|||
https://es.linkedin.com/in/francisco-josé-castellanos-regalado-b30095226 |
|||
Total dataset size: 81,154 rows × 4 columns
Download the full dataset: dblp_ismir_authors_raw.csv
Query 4: Author Order (dblp Signatures)¶
SPARQL Query #4: Retrieve ISMIR Inproceedings Author order (dblp Signatures)¶
The author order for the ISMIR papers can also be retrieved by running a separate query to extract the signature information stored for the records and authors in dblp. Due to the number of ISMIR records with property values that are returned in the broader inproceedings query (Query #2), isolating the signature data (n=7,726 results) made it easier to process and later merge with record-level details. For comparison, Table 6 provides an example of the raw signature output for a single paper, alongside a snapshot of the author names as they appear in the document (Figure 1) [27].
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX bibtex: <http://purl.org/net/nknouf/ns/bibtex#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?record ?creator ?ordinal
WHERE {
?record dblp:publishedInStream <https://dblp.org/streams/conf/ismir> ;
dblp:yearOfPublication ?year ;
dblp:bibtexType bibtex:Inproceedings .
dblp:hasSignature ?signature ;
FILTER(?year >= "2000"^^xsd:gYear && ?year <= "2024"^^xsd:gYear)
?signature dblp:signatureCreator ?creator ;
dblp:signatureOrdinal ?ordinal .
}
ORDER BY ?record ?ordinal
Run this query (long URL, with full query string)
Query #4 Output¶
Table 4. Example of the raw signature data for ISMIR inproceedings authors.
Click to expand/collapse table
record |
creator |
ordinal |
|---|---|---|
1 |
||
2 |
||
3 |
||
4 |
||
5 |
||
1 |
||
2 |
||
3 |
||
1 |
||
2 |
Total dataset size: 7,726 rows × 3 columns
Download the full dataset: dblp_ismir_author_order_raw.csv
Example: Comparison of Document Author Listing with Extracted Author Order Data¶
Todo
Revisit: Annotate snapshot so that it highlights how the extracted RDF data corresponds to what is shown in the image. Includes:
Map PIDs and ordinal values to author names
Add a note highlighting the absence of Don Byrd’s name from the dblp data provided for the paper
Snapshot of Document Author Listing¶
Figure 1. Screenshot of the author listings as shown in the paper [27].
Corresponding Signature Data for the Document (Extracted RDF Data)¶
Table 5. Query #4 Output: Example of the raw signature output for single paper.
Click to expand/collapse table
record |
creator |
ordinal |
|---|---|---|
1 |
||
2 |
||
3 |
||
4 |
||
5 |
||
6 |