API Reference
Parsing
JuliaSyntax.parsestmt
— Function# Parse a single expression/statement
parsestmt(TreeType, text, [index];
version=VERSION,
ignore_trivia=true,
filename=nothing,
ignore_errors=false,
ignore_warnings=ignore_errors)
# Parse all statements at top level (file scope)
parseall(...)
# Parse a single syntax atom
parseatom(...)
Parse Julia source code string text
into a data structure of type TreeType
. parsestmt
parses a single Julia statement, parseall
parses top level statements at file scope and parseatom
parses a single Julia identifier or other "syntax atom".
If text
is passed without index
, all the input text must be consumed and a tree data structure is returned. When an integer byte index
is passed, a tuple (tree, next_index)
will be returned containing the next index in text
to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false
.
version
(default VERSION
) may be used to set the syntax version to any Julia version >= v"1.0"
. We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version
.
Pass filename
to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.
A ParseError
will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true
. To also avoid exceptions due to errors, use ignore_errors=true
.
JuliaSyntax.parseall
— Function# Parse a single expression/statement
parsestmt(TreeType, text, [index];
version=VERSION,
ignore_trivia=true,
filename=nothing,
ignore_errors=false,
ignore_warnings=ignore_errors)
# Parse all statements at top level (file scope)
parseall(...)
# Parse a single syntax atom
parseatom(...)
Parse Julia source code string text
into a data structure of type TreeType
. parsestmt
parses a single Julia statement, parseall
parses top level statements at file scope and parseatom
parses a single Julia identifier or other "syntax atom".
If text
is passed without index
, all the input text must be consumed and a tree data structure is returned. When an integer byte index
is passed, a tuple (tree, next_index)
will be returned containing the next index in text
to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false
.
version
(default VERSION
) may be used to set the syntax version to any Julia version >= v"1.0"
. We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version
.
Pass filename
to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.
A ParseError
will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true
. To also avoid exceptions due to errors, use ignore_errors=true
.
JuliaSyntax.parseatom
— Function# Parse a single expression/statement
parsestmt(TreeType, text, [index];
version=VERSION,
ignore_trivia=true,
filename=nothing,
ignore_errors=false,
ignore_warnings=ignore_errors)
# Parse all statements at top level (file scope)
parseall(...)
# Parse a single syntax atom
parseatom(...)
Parse Julia source code string text
into a data structure of type TreeType
. parsestmt
parses a single Julia statement, parseall
parses top level statements at file scope and parseatom
parses a single Julia identifier or other "syntax atom".
If text
is passed without index
, all the input text must be consumed and a tree data structure is returned. When an integer byte index
is passed, a tuple (tree, next_index)
will be returned containing the next index in text
to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false
.
version
(default VERSION
) may be used to set the syntax version to any Julia version >= v"1.0"
. We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version
.
Pass filename
to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.
A ParseError
will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true
. To also avoid exceptions due to errors, use ignore_errors=true
.
Low level parsing API
The ParseStream
interface which provides a low-level stream-like I/O interface for writing the parser. The parser does not depend on or produce any concrete tree data structure as part of the parsing phase but the output spans can be post-processed into various tree data structures as required using JuliaSyntax.build_tree
.
JuliaSyntax.parse!
— Functionparse!(stream::ParseStream; rule=:all)
Parse Julia source code from a ParseStream
object. Output tree data structures may be extracted from stream
with the build_tree
function.
rule
may be any of
:all
(default) — parse a whole "file" of top level statements. In this mode, the parser expects to fully consume the input.:statement
— parse a single statement, or statements separated by semicolons.:atom
— parse a single syntax "atom": a literal, identifier, or parenthesized expression.
parse!(TreeType, io::IO; rule=:all, version=VERSION)
Parse Julia source code from a seekable IO
object. The output is a tuple (tree, diagnostics)
. When parse!
returns, the stream io
is positioned directly after the last byte which was consumed during parsing.
JuliaSyntax.ParseStream
— TypeParseStream(text::AbstractString, index::Integer=1; version=VERSION)
ParseStream(text::IO; version=VERSION)
ParseStream(text::Vector{UInt8}, index::Integer=1; version=VERSION)
ParseStream(ptr::Ptr{UInt8}, len::Integer, index::Integer=1; version=VERSION)
Construct a ParseStream
from input which may come in various forms:
- An string (zero copy for
String
andSubString
) - An
IO
object (zero copy forIOBuffer
). TheIO
object must be seekable. - A buffer of bytes (zero copy). The caller is responsible for preserving buffers passed as
(ptr,len)
.
A byte index
may be provided as the position to start parsing.
ParseStream provides an IO interface for the parser which provides lexing of the source text input into tokens, manages insignificant whitespace tokens on behalf of the parser, and stores output tokens and tree nodes in a pair of output arrays.
version
(default VERSION
) may be used to set the syntax version to any Julia version >= v"1.0"
. We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version
.
JuliaSyntax.build_tree
— Functionbuild_tree(make_node::Function, ::Type{StackEntry}, stream::ParseStream; kws...)
Construct a tree from a ParseStream using depth-first traversal. make_node
must have the signature
make_node(head::SyntaxHead, span::Integer, children)
where children
is either nothing
for leaf nodes or an iterable of the children of type StackEntry
for internal nodes. StackEntry
may be a node type, but also may include other information required during building the tree.
If the ParseStream has multiple nodes at the top level, K"wrapper"
is used to wrap them in a single node.
The tree here is constructed depth-first in postorder.
Tokenization
JuliaSyntax.tokenize
— Functiontokenize(text)
Returns the tokenized UTF-8 encoded text
as a vector of Token
s. The text for the token can be retrieved by using untokenize()
. The full text can be reconstructed with, for example, join(untokenize.(tokenize(text), text))
.
This interface works on UTF-8 encoded string or buffer data only.
JuliaSyntax.untokenize
— FunctionReturn the string representation of a token kind, or nothing
if the kind represents a class of tokens like K"Identifier".
When unique=true
only return a string when the kind uniquely defines the corresponding input token, otherwise return nothing
. When unique=false
, return the name of the kind.
TODO: Replace untokenize()
with Base.string()
?
JuliaSyntax.Token
— TypeToken type resulting from calling tokenize(text)
Use
kind(tok)
to get the token kinduntokenize(tok, text)
to retrieve the text- Predicates like
is_error(tok)
to query token categories and flags
Source code handling
This section describes the generic functions for source text, source location computation and formatting functions.
Contiguous syntax objects like nodes in the syntax tree should implement the following where possible:
JuliaSyntax.sourcefile
— Functionsourcefile(x)
Get the source file object (usually SourceFile
) for a given syntax object x
. The source file along with a byte range may be used to compute source_line()
, source_location()
, filename()
, etc.
JuliaSyntax.byte_range
— Functionbyte_range(x)
Return the range of bytes which x
covers in the source text. See also char_range
.
This will provide implementations of the following which include range information, line numbers, and fancy highlighting of source ranges:
JuliaSyntax.first_byte
— Functionfirst_byte(x)
Return the first byte of x
in the source text.
JuliaSyntax.last_byte
— Functionfirst_byte(x)
Return the last byte of x
in the source text.
JuliaSyntax.filename
— Functionfilename(x)
Get file name associated with source
, or an empty string if one didn't exist.
For objects x
such as syntax trees, defers to filename(sourcefile(x))
by default.
JuliaSyntax.source_line
— Functionsource_line(x)
source_line(source::SourceFile, byte_index::Integer)
Get the line number of the first line on which object x
appears. In the second form, get the line number at the given byte_index
within source
.
JuliaSyntax.source_location
— Functionsource_location(x)
source_location(source::SourceFile, byte_index::Integer)
source_location(LineNumberNode, x)
source_location(LineNumberNode, source, byte_index)
Get (line,column)
of the first byte where object x
appears in the source. The second form allows one to be more precise with the byte_index
, given the source file.
Providing LineNumberNode
as the first argument will return the line and file name in a line number node object.
JuliaSyntax.char_range
— Functionchar_range(x)
Compute the range in character indices over the source text for syntax object x
. If you want to index the source string you need this, rather than byte_range
.
JuliaSyntax.sourcetext
— Functionsourcetext(x)
Get the full source text syntax object x
sourcetext(source::SourceFile)
Get the full source text of a SourceFile
as a string.
JuliaSyntax.highlight
— Functionhighlight(io, x; color, note, notecolor,
context_lines_before, context_lines_inner, context_lines_after)
highlight(io::IO, source::SourceFile, range::UnitRange; kws...)
Print the lines of source code surrounding x
which is highlighted with background color
and underlined with markers in the text. A note
in notecolor
may be provided as annotation. By default, x
should be an object with sourcefile(x)
and byte_range(x)
implemented.
The context arguments context_lines_before
, etc, refer to the number of lines of code which will be printed as context before and after, with inner
referring to context lines inside a multiline region.
The second form shares the keywords of the first but allows an explicit source file and byte range to be supplied.
SourceFile
-specific functions:
JuliaSyntax.SourceFile
— TypeSourceFile(code [; filename=nothing, first_line=1, first_index=1])
UTF-8 source text with associated file name and line number, storing the character indices of the start of each line. first_line
and first_index
can be used to specify the line number and index of the first character of code
within a larger piece of source text.
SourceFile
may be indexed via getindex
or view
to get a string. Line information for a byte offset can be looked up via the source_line
, source_location
and source_line_range
functions.
JuliaSyntax.source_line_range
— FunctionGet byte range of the source line at byteindex, buffered by `contextlinesbeforeand
contextlines_after` before and after.
Expression predicates, kinds and flags
Expressions are tagged with a kind - like a type, but represented as an integer tag rather than a full Julia type for efficiency. (Very like the tag of a "sum type".) Kind
s are constructed with the @K_str
macro.
JuliaSyntax.@K_str
— MacroK"s"
The kind of a token or AST internal node with string "s".
For example
- K")" is the kind of the right parenthesis token
- K"block" is the kind of a block of code (eg, statements within a begin-end).
JuliaSyntax.Kind
— TypeK"name"
Kind(namestr)
Kind
is a type tag for specifying the type of tokens and interior nodes of a syntax tree. Abstractly, this tag is used to define our own sum types for syntax tree nodes. We do this explicitly outside the Julia type system because (a) Julia doesn't have sum types and (b) we want concrete data structures which are unityped from the Julia compiler's point of view, for efficiency.
Naming rules:
- Kinds which correspond to exactly one textural form are represented with that text. This includes keywords like K"for" and operators like K"*".
- Kinds which represent many textural forms have UpperCamelCase names. This includes kinds like K"Identifier" and K"Comment".
- Kinds which exist merely as delimiters are all uppercase
The kind of an expression ex
in a tree should be accessed with kind(ex)
JuliaSyntax.kind
— Functionkind(x)
Return the Kind
of x
.
In addition to the kind
, a small integer set of "flags" is included to further distinguish details of each expression, accessed with the flags
function. The kind and flags can be wrapped into a SyntaxHead
which is accessed with the head
function.
JuliaSyntax.flags
— Functionflags(x)
Return the flag bits of a syntactic construct. Prefer to query these with the predicates is_trivia
, is_prefix_call
, is_infix_op_call
, is_prefix_op_call
, is_postfix_op_call
, is_dotted
, is_suffixed
, is_decorated
.
Or extract numeric portion of the flags with numeric_flags
.
JuliaSyntax.SyntaxHead
— TypeSyntaxHead(kind, flags)
A SyntaxHead
combines the Kind
of a syntactic construct with a set of flags. The kind defines the broad "type" of the syntactic construct, while the flag bits compactly store more detailed information about the construct.
JuliaSyntax.head
— Functionhead(x)
Get the SyntaxHead
of a node of a tree or other syntax-related data structure.
Details about the flags may be extracted using various predicates:
JuliaSyntax.is_trivia
— Functionis_trivia(x)
Return true for "syntax trivia": tokens in the tree which are either largely invisible to the parser (eg, whitespace) or implied by the structure of the AST (eg, reserved words).
JuliaSyntax.is_prefix_call
— Functionis_prefix_call(x)
Return true for normal prefix function call syntax such as the f
call node parsed from f(x)
.
JuliaSyntax.is_infix_op_call
— Functionis_infix_op_call(x)
Return true for infix operator calls such as the +
call node parsed from x + y
.
JuliaSyntax.is_prefix_op_call
— Functionis_prefix_op_call(x)
Return true for prefix operator calls such as the +
call node parsed from +x
.
JuliaSyntax.is_postfix_op_call
— Functionis_postfix_op_call(x)
Return true for postfix operator calls such as the 'ᵀ
call node parsed from x'ᵀ
.
JuliaSyntax.is_dotted
— Functionis_dotted(x)
Return true for dotted syntax tokens
JuliaSyntax.is_suffixed
— Functionis_suffixed(x)
Return true for operators which have suffixes, such as +₁
JuliaSyntax.is_decorated
— Functionis_decorated(x)
Return true for operators which are decorated with a dot or suffix.
JuliaSyntax.numeric_flags
— Functionnumeric_flags(x)
Return the number attached to a SyntaxHead
. This is only for kinds K"nrow"
and K"ncat"
, for now.
Some of the more unusual predicates are accessed merely with has_flags(x, flag_bits)
, where any of the following uppercase constants may be used for flag_bits
after checking that the kind
is correct.
JuliaSyntax.has_flags
— Functionhas_flags(x, test_flags)
Return true if any of test_flags
are set.
JuliaSyntax.TRIPLE_STRING_FLAG
— ConstantSet when K"string" or K"cmdstring" was triple-delimited as with """ or ```
JuliaSyntax.RAW_STRING_FLAG
— ConstantSet when a K"string", K"cmdstring" or K"Identifier" needs raw string unescaping
JuliaSyntax.PARENS_FLAG
— ConstantSet for K"tuple", K"block" or K"macrocall" which are delimited by parentheses
JuliaSyntax.COLON_QUOTE
— ConstantSet for K"quote" for the short form :x
as opposed to long form quote x end
JuliaSyntax.TOPLEVEL_SEMICOLONS_FLAG
— ConstantSet for K"toplevel" which is delimited by parentheses
JuliaSyntax.MUTABLE_FLAG
— ConstantSet for K"struct" when mutable
JuliaSyntax.BARE_MODULE_FLAG
— ConstantSet for K"module" when it's not bare (module
, not baremodule
)
JuliaSyntax.SHORT_FORM_FUNCTION_FLAG
— ConstantSet for K"function" in short form definitions such as f() = 1
Syntax trees
Access to the children of a tree node is provided by the functions
JuliaSyntax.is_leaf
— Functionis_leaf(node)
Determine whether the node is a leaf of the tree. In our trees a "leaf" corresponds to a single token in the source text.
JuliaSyntax.numchildren
— Functionnumchildren(node)
Return length(children(node))
but possibly computed in a more efficient way.
JuliaSyntax.children
— Functionchildren(node)
Return an iterable list of children for the node. For leaves, return nothing
.
For convenient access to the children, we also provide node[i]
, node[i:j]
and node[begin:end]
by implementing Base.getindex()
, Base.firstindex()
and Base.lastindex()
. We choose to return a view from node[i:j]
to make it non-allocating.
Tree traversal is supported by using these functions along with the predicates such as kind
listed above.
Trees referencing the source
JuliaSyntax.SyntaxNode
— TypeSyntaxNode(source::SourceFile, raw::GreenNode{SyntaxHead};
keep_parens=false, position::Integer=1)
An AST node with a similar layout to Expr
. Typically constructed from source text by calling one of the parser API functions such as parseall
Functions applicable to SyntaxNode
include everything in the sections on heads/kinds as well as the accessor functions in the source code handling section.
Relocatable syntax trees
GreenNode
is a special low level syntax tree: it's "relocatable" in the sense that it doesn't carry an absolute position in the source code or even a reference to the source text. This allows it to be reused for incremental parsing, but does make it a pain to work with directly!
JuliaSyntax.GreenNode
— TypeGreenNode(head, span)
GreenNode(head, children...)
A "green tree" is a lossless syntax tree which overlays all the source text. The most basic properties of a green tree are that:
- Nodes cover a contiguous span of bytes in the text
- Sibling nodes are ordered in the same order as the text
As implementation choices, we choose that:
- Nodes are immutable and don't know their parents or absolute position, so can be cached and reused
- Nodes are homogeneously typed at the language level so they can be stored concretely, with the
head
defining the node type. Normally this would include a "syntax kind" enumeration, but it can also include flags and record information the parser knew about the layout of the child nodes. - For simplicity and uniformity, leaf nodes cover a single token in the source. This is like rust-analyzer, but different from Roslyn where leaves can include syntax trivia.
Green nodes only have a relative position so implement span()
instead of byte_range()
:
JuliaSyntax.span
— Functionspan(node)
Get the number of bytes this node covers in the source text.