Character Level Attributes
USFM and USX 3 provide a syntax for adding named attributes to character markers. Attributes define additional properties for the marked content, and are a means of extending the meta-information contained within in a text. USFM formally defines attributes for a selected set of current character types.
General Syntax
In USFM, within a character marker span an attributes list is separated from the text content by a vertical bar |. Attributes are listed as pairs of name + corresponding value using the syntax: attribute="value". The attribute name is a single ASCII string. The value is wrapped in quotes.
In USX, attributes are applied to elements in the standard XML syntax: attribute="value".
-
USFM
-
USX
-
USJ
lemma attribute\w gracious|lemma="grace"\w*
lemma attribute<char style="w" lemma="grace">gracious</char>
lemma attribute{
"type": "char",
"marker": "w",
"lemma": "grace",
"content": ["gracious"]
}
Default Attribute
In USFM, when content is supplied in the position of an attribute, but without an explicit attribute name, the specification defines a single default. This allows a commonly used attribute (the default) to be added with as little additional markup as possible.
The concept and syntax of a "default attribute" is only valid in USFM. In USX, attributes always need to be expressed fully as attribute="value" together.
-
USFM
-
USX
-
USJ
lemma attribute\w gracious|grace\w*
lemma attribute<char style="w" lemma="grace">gracious</char>
lemma attribute{
"type": "char",
"marker": "w",
"lemma": "grace",
"content": ["gracious"]
}
Multiple Attribute Values
In cases where more than one value is needed for an attribute key, use a comma separated list within the value string. Whitespace adjacent to the comma separators is ignored.
-
USFM
-
USX
-
USJ
strong values\w gracious|strong="H1234,G5485"\w*
strong values<char style="w" strong="H1234,G5485">gracious</char>
strong values{
"type": "char",
"marker": "w",
"strong": "H1234,G5485",
"content": ["gracious"]
}
See the attributes for wordlist/glossary entry for other examples.
Multiple Attribute Parts
In cases where an attribute value is composed of multiple parts (e.g. a compound word or phrase), separate the parts using a colon : within the value string.
See the gloss attribute for ruby glosses for an example of the use of this syntax.
Backward Compatibility
Pre-existing markers which formally provide attributes in USFM/USX 3 (or newer) may continue to be used without attributes. \w gracious\w* (no attributes) and <char style="w">gracious</char> remain valid.
User Defined Attributes
Using the general syntax above, attributes may be added to any character markers beyond the formally defined set in the current version of the USFM/USX specification. These will not be considered canonical, and there are no specific requirements defining how software supporting USFM/USX must process user-defined attributes.
User defined attributes should begin with the prefix x or z.
-
USFM
-
USX
-
USJ
\w gracious|x-myattr="value"\w*
\w gracious|lemma="grace" x-myattr="value"\w*
<char style="w" x-myattr="value">gracious</char>
<char style="w" lemma="grace" x-myattr="value">gracious</char>
{
"type": "char",
"marker": "w",
"x-myattr": "value",
"content": ["gracious"]
}
{
"type": "char",
"marker": "w",
"lemma": "grace",
"x-myattr": "value",
"content": ["gracious"]
}
Characters Types with Attributes
-
jmp - Link text —
href,title,id -
rb - Ruby gloss —
gloss -
w - Wordlist entry —
lemma,strong,srcloc -
ref - Scripture reference(s) —
loc,gen -
fig - Figure —
alt,src,size,loc,copy,ref