Published onDec 02, 2020
Citation-ID proposal + schema

I'd like a Citation ID is to clarify how parsing is improving over time -- "C192b + 193 newly identified, C51 reclassified as line noise, intent of C22 was different across 3 source but has now converged on 'ironic result confirmation, in passing' ".  WP citations are cleanly delineated, but not persistently numbered, so it's currently hard to talk about how a specific cite changes over time. ("the citation for <string> pointing to <url>"?)

citation ID [CID]: something unique to each instance of a citation within a document.  Ex: the WP article on the Internet Archive has 111 refs, and 14 repeated refs.  That would be 125 different CIDs, each with the same source (the article), a different location in the article, various citation strings (the ref text) and targets. As a document is edited, citation numbers would change but the CIDs usually would not.

overlap across citation indexes : overlap in coverage (citations per document) and in the details of equivalent cites (maps b/t what they capture and infer).  It can be harder to properly get inline/repeated/abbreviated cites than to get the right total set of distinct cited sources. 

inferred cites : extract a cite string, reconcile its target.

citation intent :  scicite and scite have taxonomies for this (+ conf. scores)

confidence scores : separate confidence that a source string is a cite, that its target has been identified, that its intent has been well-classified.

