I'd like a Citation ID is to clarify how parsing is improving over time -- "C192b + 193 newly identified, C51 reclassified as line noise, intent of C22 was different across 3 source but has now converged on 'ironic result confirmation, in passing' ". WP citations are cleanly delineated, but not persistently numbered, so it's currently hard to talk about how a specific cite changes over time. ("the citation for <string> pointing to <url>"?)
citation ID [CID]: something unique to each instance of a citation within a document. Ex: the WP article on the Internet Archive has 111 refs, and 14 repeated refs. That would be 125 different CIDs, each with the same source (the article), a different location in the article, various citation strings (the ref text) and targets. As a document is edited, citation numbers would change but the CIDs usually would not.
overlap across citation indexes : overlap in coverage (citations per document) and in the details of equivalent cites (maps b/t what they capture and infer). It can be harder to properly get inline/repeated/abbreviated cites than to get the right total set of distinct cited sources.
inferred cites : extract a cite string, reconcile its target.
confidence scores : separate confidence that a source string is a cite, that its target has been identified, that its intent has been well-classified.