API Reference

Parsing

JuliaSyntax.parsestmtFunction
# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

source
JuliaSyntax.parseallFunction
# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

source
JuliaSyntax.parseatomFunction
# Parse a single expression/statement
parsestmt(TreeType, text, [index];
          version=VERSION,
          ignore_trivia=true,
          filename=nothing,
          ignore_errors=false,
          ignore_warnings=ignore_errors)

# Parse all statements at top level (file scope)
parseall(...)

# Parse a single syntax atom
parseatom(...)

Parse Julia source code string text into a data structure of type TreeType. parsestmt parses a single Julia statement, parseall parses top level statements at file scope and parseatom parses a single Julia identifier or other "syntax atom".

If text is passed without index, all the input text must be consumed and a tree data structure is returned. When an integer byte index is passed, a tuple (tree, next_index) will be returned containing the next index in text to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

Pass filename to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.

A ParseError will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true. To also avoid exceptions due to errors, use ignore_errors=true.

source

Low level parsing API

The ParseStream interface which provides a low-level stream-like I/O interface for writing the parser. The parser does not depend on or produce any concrete tree data structure as part of the parsing phase but the output spans can be post-processed into various tree data structures as required using JuliaSyntax.build_tree.

JuliaSyntax.parse!Function
parse!(stream::ParseStream; rule=:all)

Parse Julia source code from a ParseStream object. Output tree data structures may be extracted from stream with the build_tree function.

rule may be any of

  • :all (default) — parse a whole "file" of top level statements. In this mode, the parser expects to fully consume the input.
  • :statement — parse a single statement, or statements separated by semicolons.
  • :atom — parse a single syntax "atom": a literal, identifier, or parenthesized expression.
source
parse!(TreeType, io::IO; rule=:all, version=VERSION)

Parse Julia source code from a seekable IO object. The output is a tuple (tree, diagnostics). When parse! returns, the stream io is positioned directly after the last byte which was consumed during parsing.

source
JuliaSyntax.ParseStreamType
ParseStream(text::AbstractString,          index::Integer=1; version=VERSION)
ParseStream(text::IO;                                        version=VERSION)
ParseStream(text::Vector{UInt8},           index::Integer=1; version=VERSION)
ParseStream(ptr::Ptr{UInt8}, len::Integer, index::Integer=1; version=VERSION)

Construct a ParseStream from input which may come in various forms:

  • An string (zero copy for String and SubString)
  • An IO object (zero copy for IOBuffer). The IO object must be seekable.
  • A buffer of bytes (zero copy). The caller is responsible for preserving buffers passed as (ptr,len).

A byte index may be provided as the position to start parsing.

ParseStream provides an IO interface for the parser which provides lexing of the source text input into tokens, manages insignificant whitespace tokens on behalf of the parser, and stores output tokens and tree nodes in a pair of output arrays.

version (default VERSION) may be used to set the syntax version to any Julia version >= v"1.0". We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version.

source
JuliaSyntax.build_treeFunction
build_tree(make_node::Function, ::Type{StackEntry}, stream::ParseStream; kws...)

Construct a tree from a ParseStream using depth-first traversal. make_node must have the signature

make_node(head::SyntaxHead, span::Integer, children)

where children is either nothing for leaf nodes or an iterable of the children of type StackEntry for internal nodes. StackEntry may be a node type, but also may include other information required during building the tree.

If the ParseStream has multiple nodes at the top level, K"wrapper" is used to wrap them in a single node.

The tree here is constructed depth-first in postorder.

source

Tokenization

JuliaSyntax.tokenizeFunction
tokenize(text)

Returns the tokenized UTF-8 encoded text as a vector of Tokens. The text for the token can be retreived by using untokenize(). The full text can be reconstructed with, for example, join(untokenize.(tokenize(text), text)).

This interface works on UTF-8 encoded string or buffer data only.

source
JuliaSyntax.untokenizeFunction

Return the string representation of a token kind, or nothing if the kind represents a class of tokens like K"Identifier".

When unique=true only return a string when the kind uniquely defines the corresponding input token, otherwise return nothing. When unique=false, return the name of the kind.

TODO: Replace untokenize() with Base.string()?

source
JuliaSyntax.TokenType

Token type resulting from calling tokenize(text)

Use

  • kind(tok) to get the token kind
  • untokenize(tok, text) to retreive the text
  • Predicates like is_error(tok) to query token categories and flags
source

Source file handling

JuliaSyntax.SourceFileType
SourceFile(code [; filename=nothing, first_line=1, first_index=1])

UTF-8 source text with associated file name and line number, storing the character indices of the start of each line. first_line and first_index can be used to specify the line number and index of the first character of code within a larger piece of source text.

SourceFile may be indexed via getindex or view to get a string. Line information for a byte offset can be looked up via the source_line, source_location and source_line_range functions.

source
JuliaSyntax.highlightFunction

Print the lines of source code surrounding the given byte range, which is highlighted with background color and markers in the text.

source
JuliaSyntax.sourcetextFunction
sourcetext(source::SourceFile)

Get the full source text of a SourceFile as a string.

source
sourcetext(node)

Get the full source text of a node.

source

Expression heads/kinds

JuliaSyntax.KindType
K"name"
Kind(namestr)

Kind is a type tag for specifying the type of tokens and interior nodes of a syntax tree. Abstractly, this tag is used to define our own sum types for syntax tree nodes. We do this explicitly outside the Julia type system because (a) Julia doesn't have sum types and (b) we want concrete data structures which are unityped from the Julia compiler's point of view, for efficiency.

Naming rules:

  • Kinds which correspond to exactly one textural form are represented with that text. This includes keywords like K"for" and operators like K"*".
  • Kinds which represent many textural forms have UpperCamelCase names. This includes kinds like K"Identifier" and K"Comment".
  • Kinds which exist merely as delimiters are all uppercase
source
JuliaSyntax.SyntaxHeadType
SyntaxHead(kind, flags)

A SyntaxHead combines the Kind of a syntactic construct with a set of flags. The kind defines the broad "type" of the syntactic construct, while the flag bits compactly store more detailed information about the construct.

source
JuliaSyntax.@K_strMacro
K"s"

The kind of a token or AST internal node with string "s".

For example

  • K")" is the kind of the right parenthesis token
  • K"block" is the kind of a block of code (eg, statements within a begin-end).
source
JuliaSyntax.flagsFunction
flags(x)

Return the flag bits of a syntactic construct. Prefer to query these with the predicates is_trivia, is_prefix_call, is_infix_op_call, is_prefix_op_call, is_postfix_op_call, is_dotted, is_suffixed, is_decorated.

Or extract numeric portion of the flags with numeric_flags.

source

see also predicates related to flags.

Syntax tree types

JuliaSyntax.SyntaxNodeType
SyntaxNode(source::SourceFile, raw::GreenNode{SyntaxHead};
           keep_parens=false, position::Integer=1)

An AST node with a similar layout to Expr. Typically constructed from source text by calling one of the parser API functions such as parseall

source
JuliaSyntax.GreenNodeType
GreenNode(head, span)
GreenNode(head, children...)

A "green tree" is a lossless syntax tree which overlays all the source text. The most basic properties of a green tree are that:

  • Nodes cover a contiguous span of bytes in the text
  • Sibling nodes are ordered in the same order as the text

As implementation choices, we choose that:

  • Nodes are immutable and don't know their parents or absolute position, so can be cached and reused
  • Nodes are homogenously typed at the language level so they can be stored concretely, with the head defining the node type. Normally this would include a "syntax kind" enumeration, but it can also include flags and record information the parser knew about the layout of the child nodes.
  • For simplicity and uniformity, leaf nodes cover a single token in the source. This is like rust-analyzer, but different from Roslyn where leaves can include syntax trivia.
source