API Reference

Parsing

JuliaSyntax.parsestmt — Function

# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

JuliaSyntax.parseall — Function

# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

JuliaSyntax.parseatom — Function

# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

Low level parsing API

The ParseStream interface which provides a low-level stream-like I/O interface for writing the parser. The parser does not depend on or produce any concrete tree data structure as part of the parsing phase but the output spans can be post-processed into various tree data structures as required using JuliaSyntax.build_tree.

JuliaSyntax.parse! — Function

parse!(stream::ParseStream; rule=:all)

Parse Julia source code from a ParseStream object. Output tree data structures may be extracted from stream with the build_tree function.

rule may be any of

:all (default) — parse a whole "file" of top level statements. In this mode, the parser expects to fully consume the input.
:statement — parse a single statement, or statements separated by semicolons.
:atom — parse a single syntax "atom": a literal, identifier, or parenthesized expression.

parse!(TreeType, io::IO; rule=:all, version=VERSION)

Parse Julia source code from a seekable IO object. The output is a tuple (tree, diagnostics). When parse! returns, the stream io is positioned directly after the last byte which was consumed during parsing.

JuliaSyntax.ParseStream — Type

ParseStream(text::AbstractString,          index::Integer=1; version=VERSION)
ParseStream(text::IO;                                        version=VERSION)
ParseStream(text::Vector{UInt8},           index::Integer=1; version=VERSION)
ParseStream(ptr::Ptr{UInt8}, len::Integer, index::Integer=1; version=VERSION)

Construct a ParseStream from input which may come in various forms:

An string (zero copy for String and SubString)
An IO object (zero copy for IOBuffer). The IO object must be seekable.
A buffer of bytes (zero copy). The caller is responsible for preserving buffers passed as (ptr,len).

A byte index may be provided as the position to start parsing.

ParseStream provides an IO interface for the parser which provides lexing of the source text input into tokens, manages insignificant whitespace tokens on behalf of the parser, and stores output tokens and tree nodes in a pair of output arrays.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Missing docstring.

Missing docstring for JuliaSyntax.build_tree. Check Documenter's build log for details.

Tokenization

JuliaSyntax.tokenize — Function

tokenize(text; operators_as_identifiers=true)

Returns the tokenized UTF-8 encoded text as a vector of Tokens. The text for the token can be retrieved by using untokenize(). The full text can be reconstructed with, for example, join(untokenize.(tokenize(text), text)).

This interface works on UTF-8 encoded string or buffer data only.

The keyword operators_as_identifiers specifies whether operators in identifier-position should have K"Identifier" as their kind, or be emitted as more specific operator kinds. For example, whether the + in a + b should be emitted as K"Identifier" (the default) or as K"+".

JuliaSyntax.untokenize — Function

Return the string representation of a token kind, or nothing if the kind represents a class of tokens like K"Identifier".

When unique=true only return a string when the kind uniquely defines the corresponding input token, otherwise return nothing. When unique=false, return the name of the kind.

TODO: Replace untokenize() with Base.string()?

JuliaSyntax.Token — Type

Token type resulting from calling tokenize(text)

Use

kind(tok) to get the token kind
untokenize(tok, text) to retrieve the text
Predicates like is_error(tok) to query token categories and flags

Source code handling

This section describes the generic functions for source text, source location computation and formatting functions.

Contiguous syntax objects like nodes in the syntax tree should implement the following where possible:

JuliaSyntax.sourcefile — Function

sourcefile(x)

Get the source file object (usually SourceFile) for a given syntax object x. The source file along with a byte range may be used to compute source_line(), source_location(), filename(), etc.

JuliaSyntax.byte_range — Function

byte_range(x)

Return the range of bytes which x covers in the source text. See also char_range.

This will provide implementations of the following which include range information, line numbers, and fancy highlighting of source ranges:

JuliaSyntax.first_byte — Function

first_byte(x)

Return the first byte of x in the source text.

JuliaSyntax.last_byte — Function

last_byte(x)

Return the last byte of x in the source text.

JuliaSyntax.filename — Function

filename(x)

Get file name associated with source, or an empty string if one didn't exist.

For objects x such as syntax trees, defers to filename(sourcefile(x)) by default.

JuliaSyntax.source_line — Function

source_line(x)
source_line(source::SourceFile, byte_index::Integer)

Get the line number of the first line on which object x appears. In the second form, get the line number at the given byte_index within source.

JuliaSyntax.source_location — Function

source_location(x)
source_location(source::SourceFile, byte_index::Integer)

source_location(LineNumberNode, x)
source_location(LineNumberNode, source, byte_index)

Get (line,column) of the first byte where object x appears in the source. The second form allows one to be more precise with the byte_index, given the source file.

Providing LineNumberNode as the first argument will return the line and file name in a line number node object.

JuliaSyntax.char_range — Function

char_range(x)

Compute the range in character indices over the source text for syntax object x. If you want to index the source string you need this, rather than byte_range.

JuliaSyntax.sourcetext — Function

sourcetext(x)

Get the full source text syntax object x

sourcetext(source::SourceFile)

Get the full source text of a SourceFile as a string.

JuliaSyntax.highlight — Function

highlight(io, x; color, note, notecolor,
          context_lines_before, context_lines_inner, context_lines_after)

highlight(io::IO, source::SourceFile, range::UnitRange; kws...)

Print the lines of source code surrounding x which is highlighted with background color and underlined with markers in the text. A note in notecolor may be provided as annotation. By default, x should be an object with sourcefile(x) and byte_range(x) implemented.

The context arguments context_lines_before, etc, refer to the number of lines of code which will be printed as context before and after, with inner referring to context lines inside a multiline region.

The second form shares the keywords of the first but allows an explicit source file and byte range to be supplied.

SourceFile-specific functions:

JuliaSyntax.SourceFile — Type

SourceFile(code [; filename=nothing, first_line=1, first_index=1])

UTF-8 source text with associated file name and line number, storing the character indices of the start of each line. first_line and first_index can be used to specify the line number and index of the first character of code within a larger piece of source text.

SourceFile may be indexed via getindex or view to get a string. Line information for a byte offset can be looked up via the source_line, source_location and source_line_range functions.

JuliaSyntax.source_line_range — Function

Get byte range of the source line at byteindex, buffered by `contextlinesbeforeandcontextlines_after` before and after.

Expression predicates, kinds and flags

Expressions are tagged with a kind - like a type, but represented as an integer tag rather than a full Julia type for efficiency. (Very like the tag of a "sum type".) Kinds are constructed with the @K_str macro.

JuliaSyntax.@K_str — Macro

K"s"

The kind of a token or AST internal node with string "s".

For example

K")" is the kind of the right parenthesis token
K"block" is the kind of a block of code (eg, statements within a begin-end).

JuliaSyntax.Kind — Type

K"name"
Kind(namestr)

Kind is a type tag for specifying the type of tokens and interior nodes of a syntax tree. Abstractly, this tag is used to define our own sum types for syntax tree nodes. We do this explicitly outside the Julia type system because (a) Julia doesn't have sum types and (b) we want concrete data structures which are unityped from the Julia compiler's point of view, for efficiency.

Naming rules:

Kinds which correspond to exactly one textural form are represented with that text. This includes keywords like K"for" and operators like K"*".
Kinds which represent many textural forms have UpperCamelCase names. This includes kinds like K"Identifier" and K"Comment".
Kinds which exist merely as delimiters are all uppercase

The kind of an expression ex in a tree should be accessed with kind(ex)

JuliaSyntax.kind — Function

kind(x)

Return the Kind of x.

In addition to the kind, a small integer set of "flags" is included to further distinguish details of each expression, accessed with the flags function. The kind and flags can be wrapped into a SyntaxHead which is accessed with the head function.

JuliaSyntax.flags — Function

flags(x)

Return the flag bits of a syntactic construct. Prefer to query these with the predicates is_trivia, is_prefix_call, is_infix_op_call, is_prefix_op_call, is_postfix_op_call, is_dotted, is_suffixed, is_decorated.

Or extract numeric portion of the flags with numeric_flags.

JuliaSyntax.SyntaxHead — Type

SyntaxHead(kind, flags)

A SyntaxHead combines the Kind of a syntactic construct with a set of flags. The kind defines the broad "type" of the syntactic construct, while the flag bits compactly store more detailed information about the construct.

JuliaSyntax.head — Function

head(x)

Get the SyntaxHead of a node of a tree or other syntax-related data structure.

Details about the flags may be extracted using various predicates:

JuliaSyntax.is_trivia — Function

is_trivia(x)

Return true for "syntax trivia": tokens in the tree which are either largely invisible to the parser (eg, whitespace) or implied by the structure of the AST (eg, reserved words).

JuliaSyntax.is_prefix_call — Function

is_prefix_call(x)

Return true for normal prefix function call syntax such as the f call node parsed from f(x).

JuliaSyntax.is_infix_op_call — Function

is_infix_op_call(x)

Return true for infix operator calls such as the + call node parsed from x + y.

JuliaSyntax.is_prefix_op_call — Function

is_prefix_op_call(x)

Return true for prefix operator calls such as the + call node parsed from +x.

JuliaSyntax.is_postfix_op_call — Function

is_postfix_op_call(x)

Return true for postfix operator calls such as the 'ᵀ call node parsed from x'ᵀ.

Missing docstring.

Missing docstring for JuliaSyntax.is_dotted. Check Documenter's build log for details.

JuliaSyntax.is_suffixed — Function

is_suffixed(x)

Return true for operators which have suffixes, such as +₁

Missing docstring.

Missing docstring for JuliaSyntax.is_decorated. Check Documenter's build log for details.

JuliaSyntax.numeric_flags — Function

numeric_flags(x)

Return the number attached to a SyntaxHead. This is only for kinds K"nrow" and K"ncat", for now.

Some of the more unusual predicates are accessed merely with has_flags(x, flag_bits), where any of the following uppercase constants may be used for flag_bits after checking that the kind is correct.

JuliaSyntax.has_flags — Function

has_flags(x, test_flags)

Return true if any of test_flags are set.

JuliaSyntax.TRIPLE_STRING_FLAG — Constant

Set when K"string" or K"cmdstring" was triple-delimited as with """ or ```

JuliaSyntax.RAW_STRING_FLAG — Constant

Set when a K"string", K"cmdstring" or K"Identifier" needs raw string unescaping

JuliaSyntax.PARENS_FLAG — Constant

Set for K"tuple", K"block" or K"macrocall" which are delimited by parentheses

JuliaSyntax.TRAILING_COMMA_FLAG — Constant

Set for various delimited constructs when they contains a trailing comma. For example, to distinguish (a,b,) vs (a,b), and f(a) vs f(a,). Kinds where this applies are: tuple call dotcall macrocall vect curly braces <: >:.

JuliaSyntax.COLON_QUOTE — Constant

Set for K"quote" for the short form :x as opposed to long form quote x end

JuliaSyntax.TOPLEVEL_SEMICOLONS_FLAG — Constant

Set for K"toplevel" which is delimited by parentheses

JuliaSyntax.MUTABLE_FLAG — Constant

Set for K"struct" when mutable

JuliaSyntax.BARE_MODULE_FLAG — Constant

Set for K"module" when it's not bare (module, not baremodule)

JuliaSyntax.SHORT_FORM_FUNCTION_FLAG — Constant

Set for K"function" in short form definitions such as f() = 1

Syntax trees

Access to the children of a tree node is provided by the functions

JuliaSyntax.is_leaf — Function

is_leaf(node)

Determine whether the node is a leaf of the tree. In our trees a "leaf" corresponds to a single token in the source text.

JuliaSyntax.numchildren — Function

numchildren(node)

Return length(children(node)) but possibly computed in a more efficient way.

JuliaSyntax.children — Function

children(node)

Return an iterable list of children for the node. For leaves, return nothing.

For convenient access to the children, we also provide node[i], node[i:j] and node[begin:end] by implementing Base.getindex(), Base.firstindex() and Base.lastindex(). We choose to return a view from node[i:j] to make it non-allocating.

Tree traversal is supported by using these functions along with the predicates such as kind listed above.

Trees referencing the source

JuliaSyntax.SyntaxNode — Type

SyntaxNode(source::SourceFile, raw::GreenNode{SyntaxHead};
           keep_parens=false, position::Integer=1)

An AST node with a similar layout to Expr. Typically constructed from source text by calling one of the parser API functions such as parseall

Functions applicable to SyntaxNode include everything in the sections on heads/kinds as well as the accessor functions in the source code handling section.

Relocatable syntax trees

GreenNode is a special low level syntax tree: it's "relocatable" in the sense that it doesn't carry an absolute position in the source code or even a reference to the source text. This allows it to be reused for incremental parsing, but does make it a pain to work with directly!

JuliaSyntax.GreenNode — Type

struct GreenNode

An explicit pointer-y representation of the green tree produced by the parser. See RawGreenNode for documentation on working with the implicit green tree directly. However, this representation is useful for introspection as it provides O(1) access to the children (as well as forward iteration).

Green nodes only have a relative position so implement span() instead of byte_range():

JuliaSyntax.span — Function

span(node)

Get the number of bytes this node covers in the source text.

span(node)

Get the number of bytes this node covers in the source text.