API Reference

Parsing

JuliaSyntax.parsestmtFunction
# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

source
JuliaSyntax.parseallFunction
# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

source
JuliaSyntax.parseatomFunction
# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

source

Low level parsing API

The ParseStream interface which provides a low-level stream-like I/O interface for writing the parser. The parser does not depend on or produce any concrete tree data structure as part of the parsing phase but the output spans can be post-processed into various tree data structures as required using JuliaSyntax.build_tree.

JuliaSyntax.parse!Function
parse!(stream::ParseStream; rule=:all)

Parse Julia source code from a ParseStream object. Output tree data structures may be extracted from stream with the build_tree function.

rule may be any of

  • :all (default) — parse a whole "file" of top level statements. In this mode, the parser expects to fully consume the input.
  • :statement — parse a single statement, or statements separated by semicolons.
  • :atom — parse a single syntax "atom": a literal, identifier, or parenthesized expression.
source
parse!(TreeType, io::IO; rule=:all, version=VERSION)

Parse Julia source code from a seekable IO object. The output is a tuple (tree, diagnostics). When parse! returns, the stream io is positioned directly after the last byte which was consumed during parsing.

source
JuliaSyntax.ParseStreamType
ParseStream(text::AbstractString,          index::Integer=1; version=VERSION)
ParseStream(text::IO;                                        version=VERSION)
ParseStream(text::Vector{UInt8},           index::Integer=1; version=VERSION)
ParseStream(ptr::Ptr{UInt8}, len::Integer, index::Integer=1; version=VERSION)

Construct a ParseStream from input which may come in various forms:

  • An string (zero copy for String and SubString)
  • An IO object (zero copy for IOBuffer). The IO object must be seekable.
  • A buffer of bytes (zero copy). The caller is responsible for preserving buffers passed as (ptr,len).

A byte index may be provided as the position to start parsing.

ParseStream provides an IO interface for the parser which provides lexing of the source text input into tokens, manages insignificant whitespace tokens on behalf of the parser, and stores output tokens and tree nodes in a pair of output arrays.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

source
JuliaSyntax.build_treeFunction
build_tree(make_node::Function, ::Type{StackEntry}, stream::ParseStream; kws...)

Construct a tree from a ParseStream using depth-first traversal. make_node must have the signature

make_node(head::SyntaxHead, span::Integer, children)

where children is either nothing for leaf nodes or an iterable of the children of type StackEntry for internal nodes. StackEntry may be a node type, but also may include other information required during building the tree.

If the ParseStream has multiple nodes at the top level, K"wrapper" is used to wrap them in a single node.

The tree here is constructed depth-first in postorder.

source

Tokenization

JuliaSyntax.tokenizeFunction
tokenize(text; operators_as_identifiers=true)

Returns the tokenized UTF-8 encoded text as a vector of Tokens. The text for the token can be retrieved by using untokenize(). The full text can be reconstructed with, for example, join(untokenize.(tokenize(text), text)).

This interface works on UTF-8 encoded string or buffer data only.

The keyword operators_as_identifiers specifies whether operators in identifier-position should have K"Identifier" as their kind, or be emitted as more specific operator kinds. For example, whether the + in a + b should be emitted as K"Identifier" (the default) or as K"+".

source
JuliaSyntax.untokenizeFunction

Return the string representation of a token kind, or nothing if the kind represents a class of tokens like K"Identifier".

When unique=true only return a string when the kind uniquely defines the corresponding input token, otherwise return nothing. When unique=false, return the name of the kind.

TODO: Replace untokenize() with Base.string()?

source
JuliaSyntax.TokenType

Token type resulting from calling tokenize(text)

Use

  • kind(tok) to get the token kind
  • untokenize(tok, text) to retrieve the text
  • Predicates like is_error(tok) to query token categories and flags
source

Source code handling

This section describes the generic functions for source text, source location computation and formatting functions.

Contiguous syntax objects like nodes in the syntax tree should implement the following where possible:

JuliaSyntax.sourcefileFunction
sourcefile(x)

Get the source file object (usually SourceFile) for a given syntax object x. The source file along with a byte range may be used to compute source_line(), source_location(), filename(), etc.

source

This will provide implementations of the following which include range information, line numbers, and fancy highlighting of source ranges:

JuliaSyntax.filenameFunction
filename(x)

Get file name associated with source, or an empty string if one didn't exist.

For objects x such as syntax trees, defers to filename(sourcefile(x)) by default.

source
JuliaSyntax.source_lineFunction
source_line(x)
source_line(source::SourceFile, byte_index::Integer)

Get the line number of the first line on which object x appears. In the second form, get the line number at the given byte_index within source.

source
JuliaSyntax.source_locationFunction
source_location(x)
source_location(source::SourceFile, byte_index::Integer)

source_location(LineNumberNode, x)
source_location(LineNumberNode, source, byte_index)

Get (line,column) of the first byte where object x appears in the source. The second form allows one to be more precise with the byte_index, given the source file.

Providing LineNumberNode as the first argument will return the line and file name in a line number node object.

source
JuliaSyntax.char_rangeFunction
char_range(x)

Compute the range in character indices over the source text for syntax object x. If you want to index the source string you need this, rather than byte_range.

source
JuliaSyntax.sourcetextFunction
sourcetext(x)

Get the full source text syntax object x

source
sourcetext(source::SourceFile)

Get the full source text of a SourceFile as a string.

source
JuliaSyntax.highlightFunction
highlight(io, x; color, note, notecolor,
          context_lines_before, context_lines_inner, context_lines_after)

highlight(io::IO, source::SourceFile, range::UnitRange; kws...)

Print the lines of source code surrounding x which is highlighted with background color and underlined with markers in the text. A note in notecolor may be provided as annotation. By default, x should be an object with sourcefile(x) and byte_range(x) implemented.

The context arguments context_lines_before, etc, refer to the number of lines of code which will be printed as context before and after, with inner referring to context lines inside a multiline region.

The second form shares the keywords of the first but allows an explicit source file and byte range to be supplied.

source

SourceFile-specific functions:

JuliaSyntax.SourceFileType
SourceFile(code [; filename=nothing, first_line=1, first_index=1])

UTF-8 source text with associated file name and line number, storing the character indices of the start of each line. first_line and first_index can be used to specify the line number and index of the first character of code within a larger piece of source text.

SourceFile may be indexed via getindex or view to get a string. Line information for a byte offset can be looked up via the source_line, source_location and source_line_range functions.

source

Expression predicates, kinds and flags

Expressions are tagged with a kind - like a type, but represented as an integer tag rather than a full Julia type for efficiency. (Very like the tag of a "sum type".) Kinds are constructed with the @K_str macro.

JuliaSyntax.@K_strMacro
K"s"

The kind of a token or AST internal node with string "s".

For example

  • K")" is the kind of the right parenthesis token
  • K"block" is the kind of a block of code (eg, statements within a begin-end).
source
JuliaSyntax.KindType
K"name"
Kind(namestr)

Kind is a type tag for specifying the type of tokens and interior nodes of a syntax tree. Abstractly, this tag is used to define our own sum types for syntax tree nodes. We do this explicitly outside the Julia type system because (a) Julia doesn't have sum types and (b) we want concrete data structures which are unityped from the Julia compiler's point of view, for efficiency.

Naming rules:

  • Kinds which correspond to exactly one textural form are represented with that text. This includes keywords like K"for" and operators like K"*".
  • Kinds which represent many textural forms have UpperCamelCase names. This includes kinds like K"Identifier" and K"Comment".
  • Kinds which exist merely as delimiters are all uppercase
source

The kind of an expression ex in a tree should be accessed with kind(ex)

In addition to the kind, a small integer set of "flags" is included to further distinguish details of each expression, accessed with the flags function. The kind and flags can be wrapped into a SyntaxHead which is accessed with the head function.

JuliaSyntax.flagsFunction
flags(x)

Return the flag bits of a syntactic construct. Prefer to query these with the predicates is_trivia, is_prefix_call, is_infix_op_call, is_prefix_op_call, is_postfix_op_call, is_dotted, is_suffixed, is_decorated.

Or extract numeric portion of the flags with numeric_flags.

source
JuliaSyntax.SyntaxHeadType
SyntaxHead(kind, flags)

A SyntaxHead combines the Kind of a syntactic construct with a set of flags. The kind defines the broad "type" of the syntactic construct, while the flag bits compactly store more detailed information about the construct.

source

Details about the flags may be extracted using various predicates:

JuliaSyntax.is_triviaFunction
is_trivia(x)

Return true for "syntax trivia": tokens in the tree which are either largely invisible to the parser (eg, whitespace) or implied by the structure of the AST (eg, reserved words).

source
JuliaSyntax.numeric_flagsFunction
numeric_flags(x)

Return the number attached to a SyntaxHead. This is only for kinds K"nrow" and K"ncat", for now.

source

Some of the more unusual predicates are accessed merely with has_flags(x, flag_bits), where any of the following uppercase constants may be used for flag_bits after checking that the kind is correct.

JuliaSyntax.TRAILING_COMMA_FLAGConstant

Set for various delimited constructs when they contains a trailing comma. For example, to distinguish (a,b,) vs (a,b), and f(a) vs f(a,). Kinds where this applies are: tuple call dotcall macrocall vect curly braces <: >:.

source

Syntax trees

Access to the children of a tree node is provided by the functions

JuliaSyntax.is_leafFunction
is_leaf(node)

Determine whether the node is a leaf of the tree. In our trees a "leaf" corresponds to a single token in the source text.

source
JuliaSyntax.childrenFunction
children(node)

Return an iterable list of children for the node. For leaves, return nothing.

source

For convenient access to the children, we also provide node[i], node[i:j] and node[begin:end] by implementing Base.getindex(), Base.firstindex() and Base.lastindex(). We choose to return a view from node[i:j] to make it non-allocating.

Tree traversal is supported by using these functions along with the predicates such as kind listed above.

Trees referencing the source

JuliaSyntax.SyntaxNodeType
SyntaxNode(source::SourceFile, raw::GreenNode{SyntaxHead};
           keep_parens=false, position::Integer=1)

An AST node with a similar layout to Expr. Typically constructed from source text by calling one of the parser API functions such as parseall

source

Functions applicable to SyntaxNode include everything in the sections on heads/kinds as well as the accessor functions in the source code handling section.

Relocatable syntax trees

GreenNode is a special low level syntax tree: it's "relocatable" in the sense that it doesn't carry an absolute position in the source code or even a reference to the source text. This allows it to be reused for incremental parsing, but does make it a pain to work with directly!

JuliaSyntax.GreenNodeType
GreenNode(head, span)
GreenNode(head, children...)

A "green tree" is a lossless syntax tree which overlays all the source text. The most basic properties of a green tree are that:

  • Nodes cover a contiguous span of bytes in the text
  • Sibling nodes are ordered in the same order as the text

As implementation choices, we choose that:

  • Nodes are immutable and don't know their parents or absolute position, so can be cached and reused
  • Nodes are homogeneously typed at the language level so they can be stored concretely, with the head defining the node type. Normally this would include a "syntax kind" enumeration, but it can also include flags and record information the parser knew about the layout of the child nodes.
  • For simplicity and uniformity, leaf nodes cover a single token in the source. This is like rust-analyzer, but different from Roslyn where leaves can include syntax trivia.
source

Green nodes only have a relative position so implement span() instead of byte_range():