Character Level Attributes

USFM and USX 3 provide a syntax for adding named attributes to character markers. Attributes define additional properties for the marked content, and are a means of extending the meta-information contained within in a text. USFM formally defines attributes for a selected set of current character types.

General Syntax

In USFM, within a character marker span an attributes list is separated from the text content by a vertical bar |. Attributes are listed as pairs of name + corresponding value using the syntax: attribute="value". The attribute name is a single ASCII string. The value is wrapped in quotes.

In USX, attributes are applied to elements in the standard XML syntax: attribute="value".

  • USFM

  • USX

  • USJ

Example 1. Glossary word with lemma attribute
\w gracious|lemma="grace"\w*
Example 2. Glossary word with lemma attribute
<char style="w" lemma="grace">gracious</char>
Example 3. Glossary word with lemma attribute
{
  "type": "char",
  "marker": "w",
  "lemma": "grace",
  "content": ["gracious"]
}

Nearly all USX elements contain a required style attribute. For <para> and <char> elements, the style attribute defines the paragraph or character type.

Default Attribute

In USFM, when content is supplied in the position of an attribute, but without an explicit attribute name, the specification defines a single default. This allows a commonly used attribute (the default) to be added with as little additional markup as possible.

The concept and syntax of a "default attribute" is only valid in USFM. In USX, attributes always need to be expressed fully as attribute="value" together.

  • USFM

  • USX

  • USJ

Example 4. Glossary word with un-named default lemma attribute
\w gracious|grace\w*
Example 5. USX: Glossary word with lemma attribute
<char style="w" lemma="grace">gracious</char>
Example 6. USX: Glossary word with lemma attribute
{
  "type": "char",
  "marker": "w",
  "lemma": "grace",
  "content": ["gracious"]
}

Multiple Attribute Values

In cases where more than one value is needed for an attribute key, use a comma separated list within the value string. Whitespace adjacent to the comma separators is ignored.

  • USFM

  • USX

  • USJ

Example 7. Glossary word with multiple strong values
\w gracious|strong="H1234,G5485"\w*
Example 8. Glossary word with multiple strong values
<char style="w" strong="H1234,G5485">gracious</char>
Example 9. Glossary word with multiple strong values
{
  "type": "char",
  "marker": "w",
  "strong": "H1234,G5485",
  "content": ["gracious"]
}

See the attributes for wordlist/glossary entry for other examples.

Multiple Attribute Parts

In cases where an attribute value is composed of multiple parts (e.g. a compound word or phrase), separate the parts using a colon : within the value string.

See the gloss attribute for ruby glosses for an example of the use of this syntax.

Backward Compatibility

Pre-existing markers which formally provide attributes in USFM/USX 3 (or newer) may continue to be used without attributes. \w gracious\w* (no attributes) and <char style="w">gracious</char> remain valid.

User Defined Attributes

Using the general syntax above, attributes may be added to any character markers beyond the formally defined set in the current version of the USFM/USX specification. These will not be considered canonical, and there are no specific requirements defining how software supporting USFM/USX must process user-defined attributes.

User defined attributes should begin with the prefix x or z.

  • USFM

  • USX

  • USJ

Example 10. Glossary word with user defined attribute
\w gracious|x-myattr="value"\w*

\w gracious|lemma="grace" x-myattr="value"\w*
Example 11. Glossary word with user defined attribute
<char style="w" x-myattr="value">gracious</char>

<char style="w" lemma="grace" x-myattr="value">gracious</char>
Example 12. Glossary word with user defined attribute
{
  "type": "char",
  "marker": "w",
  "x-myattr": "value",
  "content": ["gracious"]
}

{
  "type": "char",
  "marker": "w",
  "lemma": "grace",
  "x-myattr": "value",
  "content": ["gracious"]
}

Characters Types with Attributes