September 28, 2020, added following text to end of point 5:
(These style rules should not be applied if the attribute value is a URI. If the attribute value is a URI, the published URI should be used.)
At the end of 2018, the JATS4R Steering Committee was reviewing the roadmap to plan our work when we realized that several of the topics on the list to be discussed were related to defining and controlling a list of values for attributes.
One example was to define a list of approved values for @fn-type. We knew that we had been making recommendations for attribute usage and had defined some values, but we had no understanding of how and when these recommendations were created. Restricting attribute values to a controlled list has a great positive effect on the “reusefulness” of XML because future users know what to expect and have an understanding of what the values mean.
But controlled lists of attribute values can have a negative effect on adoption. A user who has a need for a value that is not on the controlled list has two choices: petition the restricting agent to add their value to the list of acceptable values or ignore the recommendation. Restricting agents think that they can respond to user’s requests to keep the controlled values lists up to date, but in reality the pace of work for any restricting agency cannot keep up with XML user’s needs who are under real publication deadlines.
This does not mean that we should not write any controlled value lists into JATS4R recommendations, but we should be aware of the costs and tradeoffs of doing so.
By early 2019, JATS4R had published 11 recommendations. We reviewed these published recommendations to get an idea of what guidance we had already given. The results of this research are presented in the Appendix. Reviewing the current recommendations, we were able to classify the attribute rules we had written into:
- Identification: define the attribute:value combination as an Identifier for an object type
- Prescribed usage: prescribe how an attribute should be used in a given circumstance
- Prescribe which values may/must be used:
- controlled values: values must be from a list, either defined in the recommendation or from an outside authority
- Suggested values: values should be from a list, either defined in the recommendation or from an outside authority
Types of attribute recommendations
This is an attribute:value pair that will identify the object as a specific thing. This is important to JATS4R because we can apply specific tests to general JATS objects if we know what the object is.
For example: if we can identify a citation as a data citation (with @publication-type=”data”), then we can apply tests for rules specific to data citations to the citation element and all of its descendants if necessary. Is there an <article-title> but no <data-title>?
Testing: There is no way to test whether an identification attribute is set appropriately. We will have to assume that the identification is correct. Then we can make other tests related to that specific object. We can also set values of other attributes related to that object, given the knowledge of that object’s identity. For example, if a <contrib> is an author (@contrib-type=”author”), we should be able to control values in another attribute on that contrib, but not other contribs.
This is a recommendation that describes how, when, and for what an attribute may, should, or must be used. The recommendation for usage of the attribute is separate from any values that we suggest or require be used. Prescribed usage should not be conflated with value lists, although most of the time Prescribed usage is paired with either a controlled values or suggested values list.
For example. The recommendation for <institution-id> says to use @institution-id-type to indicate the type of ID; e.g., “orcid” or “ringgold”.
A controlled value list describes an attribute whose content must be from a list, either defined in the recommendation or from an outside authority. An attribute that has a value not in the list would be an ERROR. We can write rules and apply validations to controlled value lists on general attributes for given circumstances once we have identified the element with an identification attribute.
For example. @date-type on <pub-date>: Use value “original-publication” to indicate that the date is the original date of publication, and “update” for dates that represent published updates to the publication.
A suggested value list describes an attribute whose content should be from a list, either defined in the recommendation or from an outside authority.
For example. Use “journal” or “book” as the value of @publication-type to indicate that the citation is to a journal or book, respectively. Other examples are: “letter”, “review”, “patent”, “report”, “standard”, “data”, “working-paper”. This list is not exhaustive and is sourced from the JATS guidelines. This is not a limited field so others can be used as appropriate, for example, “website”. However, in the interests of standardisation, JATS4R requests publishers to contact JATS4R if using additional values so we can create a definitive list and reduce variation across XML sources. “Other” is not a preferred value.
Early version workaround
This is not a type of attribute recommendation, but it is worth mentioning here. This is the “early version workaround” where we describe a way to meet the recommendation with an earlier version of the DTD that does not have the attribute that we recommend as the best solution. An example of this is from Article and publication dates: “use /article/@specific-use for article version values from NISO JAV for JATS 1.1 and earlier schemas”
Testing the recommendations
Attributes used for identification are not tested. Instead, inform the validator WHAT an object is so that tests specific to that object type can be applied; we can run a clean value test on the values of attributes that we have defined as identification.
Example: We can’t test to see whether every <sec> in a document has a type that conforms to a list of values, but if a user has <sec-type=”data_availability_statement”>, we could have an error. In testing, we need to anticipate possible things that people would do which are close to the value we want.
Tests for attribute usage will be situational and may be based on identifying the object that we are in. Attributes will be tested
- that they exist if they are defined as REQUIRED (these may be situational based on any property that can be found in the article XML type of object, or existence of other attributes/attribute values)
- that they don’t exist if they are declared DO NOT USE (I don’t think we have any of these)
- Values can be tested if the prescribed usage is paired with a controlled values or suggested values list
Controlled and suggested values
At first it seems that the difference in tests for a controlled value list and a suggested value list would be ERROR vs. WARNING. But it is not that simple. We have already defined controlled value lists and suggested value lists in our recommendations. We have examples that have been defined in the recommendation itself and examples that refer to an outside controlled list like the JAV. The proper values will be circumstantial in many cases, so the tests are not as simple as “on the good list” or “not on the good list”.
We need to be aware of how our attribute recommendations can be tested so that they can be most effective.
Should out-of-list values be errors or warnings? That is: do we allow things in a JATS4R-compliant article that we have not defined?
We prescribe @sec-type=”data-availability” for Data availability statements, but do we want to exclude all other possible values of @sec-type?
We will either need to
- define all values of @sec-type that we expect to see in JATS4R-conforming articles (the Maloney proposal https://github.com/JATS4R/JATS4R-Participant-Hub/issues/118), or
- Test only “circumstantial” values (it is possible that person-group-type may be allowed to have different values if it is in a Data citation than if it is in some other citation) or
- come up with a “forbidden list” of values related to the ones that we define. This is the “clean value test”.
Clean value test
A clean value test is a way to try to control an attribute value without restricting that attribute to a controlled set of allowed values. The test involves thinking up as many ways for the value you want to be written and then explicitly excluding those. This can be frustrating and require ongoing maintenance.
A good example of this is when we are defining values for Identification. We prescribe @sec-type=”data-availability” to identify Data availability statements. We cannot exclude all values for @sec-type except for “data-availability”. Nor can we provide a list of all “approved” values for @sec-type. Instead we must write an error for any value that is approaching but not equal to “data-availability”. The list of values that would generate an ERROR would include but not be limited to: data_availability, data availability, Data-Availability, data-statement, dataavailbility.
This is a list that will grow as people find new ways to misrepresent “data-availability”. The question may come up about “normalizing” the values before testing, but this would weaken our recommendations because any normalizing we do in the validator must then be done for any future application that is looking for “data-availability”.
- Define as many identification attribute/values as we can. This allows us to identify a given object in the XML for testing or for later text mining. Usually these are set using general attributes like @content-type or @fn-type. We can define the value we want used, but we cannot exclude any other value. We can check identification values with a clean value test. We will avoid using @specific-use for identification attributes. If there is no “-type” attribute available, we will request that one be added by the JATS Standing Committee.
- Define prescribed usage for attributes. The existence of attributes under certain circumstances can be tested (with the aid of Identification attributes) easily. These can be ERRORS or WARNINGS depending on the cases.
- Be careful about defining controlled value lists; we should not use them generally. So no rules like “@fn-type must be one of (corresp | reference | suppdata)”. But we could make up a rule something like “a footnote referenced from a contrib with a role of ’illustrator’ must have a @fn-type from (media | style | corresp)” Not matching a value in a controlled value list (under the appropriate circumstances) will be an ERROR.
- Use suggested value lists, which are a little more forgiving. Not matching a value in a suggested value list will be a WARNING. These will be tempting to use when we don’t want to commit to a decision. I think we should avoid this and define them circumstantially like the controlled value lists. Because there is no ERROR, suggested value lists have no teeth. I suggest that we use them to do two things: strongly encourage usage to move in a certain way and/or test out values that we may want to control in a future version of the recommendation. If this is our intent, we should list this in the recommendation.
- Use the following style for attribute recommendations:Â
- All attributes we define should follow a style: all letters are lowercase and where a space to differentiate words would be used in text, a hyphen (U+002D) is added, for example original-publication. (These style rules should not be applied if the attribute value is a URI. If the attribute value is a URI, the published URI should be used.)
Appendix: survey of attributes in recommendations (early 2019)
- ID: Identification
- PU: Prescribed usage
- CV: Controlled values
- SV: Suggested values
- PU/CV – use /article/@specific-use for article version values from NISO JAV for JATS 1.1 and earlier schemas
- CV – @date-type on <pub-date>: Use value “original-publication” to indicate that the date is the original date of publication, and “update” for dates that represent published updates to the publication.
- PU/CV – Use @date-type on <date> in <event> to indicate what stage of publication this version was in. Use NISO JAV values.
- CV – @ref-type (on <xref>). When linking a <contrib> to its <aff> use @ref-type=”aff” on <xref> [[Validator tool result: error if @ref-type on <xref> != “aff” if @rid references an <aff> element ]]
- PU – <institution-id>. Capturing the institutional ID is not mandatory at this time. However, if the publisher does capture it, they should make every effort to ensure that it is accurate. Use <institution-id> to contain the ID, and @institution-id-type to indicate the type of ID; e.g., “grid” or “ringgold” the <institution-id> and <institution> elements within <institution-wrap>.
- PU/CV – @country. If <country> is used, then @country must also be used and must be set to the 2-digit country code, as specified in ISO 3166-1 (recommended in the JATS tag library)binary
- ID – <contrib>, @contrib-type. Contain each author within a <contrib> element. If a <contrib> contains an author, then @contrib-type must be set to “author”
- PU – @corresp. Use the corresp attribute on <contrib>, set to value “yes”, to identify the corresponding author(s).binary
- PU/SV – @publication-type=”…” on <mixed-citation> or <element-citation>. Use “journal” or “book” as the value of @publication-type to indicate that the citation is to a journal or book, respectively. Other examples are: “letter”, “review”, “patent”, “report”, “standard”, “data”, “working-paper”. This list is not exhaustive and is sourced from the JATS guidelines. This is not a limited field so others can be used as appropriate, for example, “website”. However, in the interests of standardisation, JATS4R requests publishers to contact JATS4R if using additional values so we can create a definitive list and reduce variation across XML sources. “Other” is not a preferred value
- PU/CV – <person-group> and @person-group-type. Use the <person-group> element to specify authors and other contributors in a citation. Use the @person-group-type attribute to specify the role of a contributor, when it is possible to identify them with a role. A separate <person-group> element should be used for each role. This attribute has a fixed list of allowed values in the Journal Publishing tag set: all-authors; assignee; author; compiler; curator; director; editor; guest-editor; inventor; transed; translator.
- PU/CV – @pub-id-type on <pub-id>. Use this attribute to specify the type of the identifier. For example, a DOI would have the @pub-id-type value of “doi”. The value should be one of the valid values from the list in the Tag Library.
- PU/CV (Should be ID) – related-object/@content-type. Use the content-type attribute optionally to indicate which stage of the trial the publication is reporting on. Since this information is intended for content providers submitting linked clinical trial information to Crossref, if @content-type is used, its value must be “pre-results”, “results”, or “post-results”, as defined in the crossref schema. [[Validator result: If absent, no message. If present, ERROR if values not equal “pre-results” or “results” or “post-results” ]]
- PU – related-object/@source-id. The source-id attribute must be used to identify the clinical trial registry. Crossref curates a list of WHO-approved registries and assigns them a DOI. Content providers are encouraged to select an appropriate registry from this list and supply the registry DOI or the WHO registry name as the source-id value.
- PU/CV – related-object/@source-id-type. The source-id-type attribute must be used to identify the type of ID provided in @source-id. The value of @source-id-type should be “crossref-doi” or “registry-name”, as appropriate (see Recommendation 3.)
- PU/CV – related-object/@document-id-type: The document-id-type attribute is required and must identify the kind of @document-id. The value must be either “clinical-trial-number” or “doi”.
- ID – @publication-type=”data” on <mixed-citation> or <element-citation>. Use “data” as the value of @publication-type to indicate that the citation is to a data set, even if that data set is the entire data repository. binary
- ID – @sec-type=”data-availability”. Use this attribute on the <sec> containing the DAS. binary
- ID – <element-citation> or <mixed-citation>, @publication-type=”data”. Use this attribute on all <element-citation> or <mixed-citation> elements that contain references to data.binary
- PU/CV – @specific-use on <element-citation> or <mixed-citation>. For publishers who elect to collect such granularity in their workflow, see the table below for four @specific-use attributes recommended for JATS XML.