Character Level Attributes
USFM and USX 3 provide a syntax for adding named attributes to character markers. Attributes define additional properties for the marked content, and are a means of extending the meta-information contained within in a text. USFM formally defines attributes for a selected set of current character types.
General Syntax
In USFM, within a character marker span an attributes list is separated from the text content by a vertical bar |
. Attributes are listed as pairs of name + corresponding value using the syntax: attribute="value"
. The attribute name is a single ASCII string. The value is wrapped in quotes.
In USX, attributes are applied to elements in the standard XML syntax: attribute="value"
.
-
USFM
-
USX
-
USJ
lemma
attribute\w gracious|lemma="grace"\w*
lemma
attribute<char style="w" lemma="grace">gracious</char>
lemma
attribute{
"type": "char",
"marker": "w",
"lemma": "grace",
"content": ["gracious"]
}
Default Attribute
In USFM, when content is supplied in the position of an attribute, but without an explicit attribute name, the specification defines a single default. This allows a commonly used attribute (the default) to be added with as little additional markup as possible.
The concept and syntax of a "default attribute" is only valid in USFM. In USX, attributes always need to be expressed fully as attribute="value"
together.
-
USFM
-
USX
-
USJ
lemma
attribute\w gracious|grace\w*
lemma
attribute<char style="w" lemma="grace">gracious</char>
lemma
attribute{
"type": "char",
"marker": "w",
"lemma": "grace",
"content": ["gracious"]
}
Multiple Attribute Values
In cases where more than one value is needed for an attribute key, use a comma separated list within the value string. Whitespace adjacent to the comma separators is ignored.
-
USFM
-
USX
-
USJ
strong
values\w gracious|strong="H1234,G5485"\w*
strong
values<char style="w" strong="H1234,G5485">gracious</char>
strong
values{
"type": "char",
"marker": "w",
"strong": "H1234,G5485",
"content": ["gracious"]
}
See the attributes for wordlist/glossary entry for other examples.
Multiple Attribute Parts
In cases where an attribute value is composed of multiple parts (e.g. a compound word or phrase), separate the parts using a colon :
within the value string.
See the gloss
attribute for ruby glosses for an example of the use of this syntax.
Backward Compatibility
Pre-existing markers which formally provide attributes in USFM/USX 3 (or newer) may continue to be used without attributes. \w gracious\w*
(no attributes) and <char style="w">gracious</char>
remain valid.
User Defined Attributes
Using the general syntax above, attributes may be added to any character markers beyond the formally defined set in the current version of the USFM/USX specification. These will not be considered canonical, and there are no specific requirements defining how software supporting USFM/USX must process user-defined attributes.
User defined attributes should begin with the prefix x
or z
.
-
USFM
-
USX
-
USJ
\w gracious|x-myattr="value"\w*
\w gracious|lemma="grace" x-myattr="value"\w*
<char style="w" x-myattr="value">gracious</char>
<char style="w" lemma="grace" x-myattr="value">gracious</char>
{
"type": "char",
"marker": "w",
"x-myattr": "value",
"content": ["gracious"]
}
{
"type": "char",
"marker": "w",
"lemma": "grace",
"x-myattr": "value",
"content": ["gracious"]
}
Characters Types with Attributes
-
jmp - Link text —
href
,title
,id
-
rb - Ruby gloss —
gloss
-
w - Wordlist entry —
lemma
,strong
,srcloc
-
ref - Scripture reference(s) —
loc
,gen
-
fig - Figure —
alt
,src
,size
,loc
,copy
,ref