General Markup Syntax
There are five general categories of USFM markup:
Markup Expressions
USFM (USFM-FM)
USFM is a backslash-based \
representation of the USFM data model. In USFM, all markers begin with a backslash character \
.
-
Paragraph markers, and the opening marker for characters and notes are followed by a space. Example:
\p
. -
Character markers occur in pairs, marking a span of text within a paragraph. The closing marker is identical to the opening marker, terminated with an asterisk character
*
. Example:\w grace\w*
. -
Note markers also occur in pairs, marking the start and end of the footnote or cross reference content.
-
Milestone markers follow a syntax similar to character markers, but using a self-closing syntax which immediately terminates the marker with a second backslash plus asterisk
\*
. Example:\\qt-s\*
-
Chapters and verses are unique elements. Their syntax is similar to a paragraph marker. The chapter or verse number itself is added after the marker and its space. Chapter and verses are also like milestones because they identify the location for the start of a chapter or verse. In USFM there is no marker to indicate the end of a chapter or verse.
USX (USFM-XML)
USX is an XML representation of the USFM data model.
-
Paragraphs occur within a
<para>
element. -
Character spans occur within a
<char>
element. -
Notes occur within a
<note>
element. -
Milestones occur as an
<ms/>
element. -
Chapters and verses occur as 'milestone-like' self-closing
<chapter/>
or<verse/>
elements which identify the start for a chapter or verse. Optionally,<chapter/>
and<verse/>
elements can be added to mark the end of a chapter or verse by using theeid
attribute instead of thesid
attribute. The end elements should be placed at the end of the Scripture text for chapter or verse.
Different types of paragraph, character, note, or milestones are identified by a style
attribute. The style
attribute is the value which associates the element with its corresponding USFM marker.
USJ (USFM-JSON)
USJ is a JSON representation of the USFM data model. It is a first-class citizen among the other formats (USFM and USX) and has been added as a target in the test suite maintained by the committee.
JSON (JavaScript Object Notation) is a modern data serialization format supported by many software tools and libraries. It is especially convenient to use in JavaScript-based technologies where it is natively supported.
-
Chapters and verses are treated as milestones – denoting only their absolute position and not encapsulating text or other content within them.
-
Chapter elements are valid at the outermost level – the same level as id and book headers.
-
Verse elements, like character elements, are valid within paragraph-like parents at the same level as the textual content.
-
The
type
key has values that refer to the node types in USX. e.g. para, char, book, chapter, verse, ms. -
The
marker
key has values that refer to marker names used in USFM and the style attribute in USX. -
Every attribute in a USX node becomes a key in the corresponding JSON object.
{
"type": "book",
"marker": "id",
"code": "GEN",
"content": []
},
{
"type": "chapter",
"marker": "c",
"number": "1",
"sid": "GEN 1"
}
-
All objects may have nested content or textual content in an array for the
contents
key whose value is an array.
{
"type": "para",
"marker": "ide",
"content": ["UTF-8"]
},
{
"type": "para",
"marker": "usfm",
"content": ["3.0"]
},
{
"type": "para",
"marker": "is",
"content": [
"Introduction"
]
},
{
"type": "para",
"marker": "ip",
"content": [
{
"type": "char",
"marker": "bk",
"content": [
"The Gospel according to Mark"
]
},
" begins with the statement..."
]
}
Additional Syntax Notes
Numbered Markers/Styles
Some markers or styles include an optional numeric variable, which is represented in this documentation by a hash character #
. The number indicates:
-
A portion of a larger text, or relative weighting of the components of the text, such as
mt1
,mt2
,mt3
which are parts of a main title. -
The level of a division or section (hierarchy).
-
The level of indentation relative to other like elements, as in poetry (q#) or lists (li#) or outlines (io#).
marker = marker1 — The unnumbered version of a marker or style should only be used when one level of this marker exists within the text (only). Numbered markers should always be used when more than one level of the marker exists within the text.
A specific numbered marker or style should not be used to indicate a specific occurrence of the element type (i.e. you should not use |
USFM Endmarkers in Footnotes and Cross References
Elements which make up footnote or cross reference content are character level markers. With USFM 3.1, ending character markers have been made required — except for the markers used to start sections in footnotes and cross references (the note’s 'structural elements'). The majority of scripture translation projects working with USFM already follow the implicit closure syntax for footnote or cross reference structural markup.
Examples of the two markup approaches for notes are given below. Both of these are syntactically acceptable in USFM, but the implicit syntax is strongly recommended. Note content must always occur within a submarker; there can be no unmarked note content only within the note container itself.
Other nested character markers within a notes structural sections always require explicit opening and closing markers.
-
USFM (implicit closure)
-
USFM (explicit closure)
-
USX
\f + \fk Issac: \ft In Hebrew means "laughter"\f*
\f + \fk Issac: \fk*\ft In Hebrew means "laughter"\ft*\f*
<note caller="+" style="f">
<char style="fk">Issac: </char>
<char style="ft">In Hebrew means "laughter"</char>
</note>
-
USFM (implicit closure)
-
USFM (explicit closure)
-
USX
\f + \fr 1.14 \fq religious festivals; \ft or \fqa seasons.\f*
\f + \fr 1.14 \fr*\fq religious festivals; \fq*\ft or \ft*\fqa seasons.\fqa*\f*
<note caller="+" style="f">
<char style="fr">1.14 </char>
<char style="fq">religious festivals; </char>
<char style="ft">or </char>
<char style="fqa">seasons.</char>
</note>
-
USFM (implicit closure)
-
USFM (explicit closure)
-
USX
\f + \fr 2.4 \fk The \nd Lord\nd*: \ft See \nd Lord\nd* in Word List.\f*
\f + \fr 2.4 \fr*\fk The \nd Lord\nd*: \fk*\ft See \nd Lord\nd* in
Word List.\ft*\f*
<note caller="+" style="f">
<char style="fr">2.4 </char>
<char style="fk">The <char style="nd">Lord</char>: </char>
<char style="ft">See <char style="nd">Lord</char> in Word List.</char>
</note>