API Reference
Parsing
JuliaSyntax.parsestmt
— Function# Parse a single expression/statement
parsestmt(TreeType, text, [index];
version=VERSION,
ignore_trivia=true,
filename=nothing,
ignore_errors=false,
ignore_warnings=ignore_errors)
# Parse all statements at top level (file scope)
parseall(...)
# Parse a single syntax atom
parseatom(...)
Parse Julia source code string text
into a data structure of type TreeType
. parsestmt
parses a single Julia statement, parseall
parses top level statements at file scope and parseatom
parses a single Julia identifier or other "syntax atom".
If text
is passed without index
, all the input text must be consumed and a tree data structure is returned. When an integer byte index
is passed, a tuple (tree, next_index)
will be returned containing the next index in text
to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false
.
version
(default VERSION
) may be used to set the syntax version to any Julia version >= v"1.0"
. We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version
.
Pass filename
to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.
A ParseError
will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true
. To also avoid exceptions due to errors, use ignore_errors=true
.
JuliaSyntax.parseall
— Function# Parse a single expression/statement
parsestmt(TreeType, text, [index];
version=VERSION,
ignore_trivia=true,
filename=nothing,
ignore_errors=false,
ignore_warnings=ignore_errors)
# Parse all statements at top level (file scope)
parseall(...)
# Parse a single syntax atom
parseatom(...)
Parse Julia source code string text
into a data structure of type TreeType
. parsestmt
parses a single Julia statement, parseall
parses top level statements at file scope and parseatom
parses a single Julia identifier or other "syntax atom".
If text
is passed without index
, all the input text must be consumed and a tree data structure is returned. When an integer byte index
is passed, a tuple (tree, next_index)
will be returned containing the next index in text
to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false
.
version
(default VERSION
) may be used to set the syntax version to any Julia version >= v"1.0"
. We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version
.
Pass filename
to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.
A ParseError
will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true
. To also avoid exceptions due to errors, use ignore_errors=true
.
JuliaSyntax.parseatom
— Function# Parse a single expression/statement
parsestmt(TreeType, text, [index];
version=VERSION,
ignore_trivia=true,
filename=nothing,
ignore_errors=false,
ignore_warnings=ignore_errors)
# Parse all statements at top level (file scope)
parseall(...)
# Parse a single syntax atom
parseatom(...)
Parse Julia source code string text
into a data structure of type TreeType
. parsestmt
parses a single Julia statement, parseall
parses top level statements at file scope and parseatom
parses a single Julia identifier or other "syntax atom".
If text
is passed without index
, all the input text must be consumed and a tree data structure is returned. When an integer byte index
is passed, a tuple (tree, next_index)
will be returned containing the next index in text
to resume parsing. By default whitespace and comments before and after valid code are ignored but you can turn this off by setting ignore_trivia=false
.
version
(default VERSION
) may be used to set the syntax version to any Julia version >= v"1.0"
. We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version
.
Pass filename
to set any file name information embedded within the output tree, if applicable. This will also annotate errors and warnings with the source file name.
A ParseError
will be thrown if any errors or warnings occurred during parsing. To avoid exceptions due to warnings, use ignore_warnings=true
. To also avoid exceptions due to errors, use ignore_errors=true
.
Low level parsing API
The ParseStream
interface which provides a low-level stream-like I/O interface for writing the parser. The parser does not depend on or produce any concrete tree data structure as part of the parsing phase but the output spans can be post-processed into various tree data structures as required using JuliaSyntax.build_tree
.
JuliaSyntax.parse!
— Functionparse!(stream::ParseStream; rule=:all)
Parse Julia source code from a ParseStream
object. Output tree data structures may be extracted from stream
with the build_tree
function.
rule
may be any of
:all
(default) — parse a whole "file" of top level statements. In this mode, the parser expects to fully consume the input.:statement
— parse a single statement, or statements separated by semicolons.:atom
— parse a single syntax "atom": a literal, identifier, or parenthesized expression.
parse!(TreeType, io::IO; rule=:all, version=VERSION)
Parse Julia source code from a seekable IO
object. The output is a tuple (tree, diagnostics)
. When parse!
returns, the stream io
is positioned directly after the last byte which was consumed during parsing.
JuliaSyntax.ParseStream
— TypeParseStream(text::AbstractString, index::Integer=1; version=VERSION)
ParseStream(text::IO; version=VERSION)
ParseStream(text::Vector{UInt8}, index::Integer=1; version=VERSION)
ParseStream(ptr::Ptr{UInt8}, len::Integer, index::Integer=1; version=VERSION)
Construct a ParseStream
from input which may come in various forms:
- An string (zero copy for
String
andSubString
) - An
IO
object (zero copy forIOBuffer
). TheIO
object must be seekable. - A buffer of bytes (zero copy). The caller is responsible for preserving buffers passed as
(ptr,len)
.
A byte index
may be provided as the position to start parsing.
ParseStream provides an IO interface for the parser which provides lexing of the source text input into tokens, manages insignificant whitespace tokens on behalf of the parser, and stores output tokens and tree nodes in a pair of output arrays.
version
(default VERSION
) may be used to set the syntax version to any Julia version >= v"1.0"
. We aim to parse all Julia syntax which has been added after v"1.0", emitting an error if it's not compatible with the requested version
.
JuliaSyntax.build_tree
— Functionbuild_tree(make_node::Function, ::Type{StackEntry}, stream::ParseStream; kws...)
Construct a tree from a ParseStream using depth-first traversal. make_node
must have the signature
make_node(head::SyntaxHead, span::Integer, children)
where children
is either nothing
for leaf nodes or an iterable of the children of type StackEntry
for internal nodes. StackEntry
may be a node type, but also may include other information required during building the tree.
If the ParseStream has multiple nodes at the top level, K"wrapper"
is used to wrap them in a single node.
The tree here is constructed depth-first in postorder.
Tokenization
JuliaSyntax.tokenize
— Functiontokenize(text)
Returns the tokenized UTF-8 encoded text
as a vector of Token
s. The text for the token can be retreived by using untokenize()
. The full text can be reconstructed with, for example, join(untokenize.(tokenize(text), text))
.
This interface works on UTF-8 encoded string or buffer data only.
JuliaSyntax.untokenize
— FunctionReturn the string representation of a token kind, or nothing
if the kind represents a class of tokens like K"Identifier".
When unique=true
only return a string when the kind uniquely defines the corresponding input token, otherwise return nothing
. When unique=false
, return the name of the kind.
TODO: Replace untokenize()
with Base.string()
?
JuliaSyntax.Token
— TypeToken type resulting from calling tokenize(text)
Use
kind(tok)
to get the token kinduntokenize(tok, text)
to retreive the text- Predicates like
is_error(tok)
to query token categories and flags
Source file handling
JuliaSyntax.SourceFile
— TypeSourceFile(code [; filename=nothing, first_line=1, first_index=1])
UTF-8 source text with associated file name and line number, storing the character indices of the start of each line. first_line
and first_index
can be used to specify the line number and index of the first character of code
within a larger piece of source text.
SourceFile
may be indexed via getindex
or view
to get a string. Line information for a byte offset can be looked up via the source_line
, source_location
and source_line_range
functions.
JuliaSyntax.highlight
— FunctionPrint the lines of source code surrounding the given byte range
, which is highlighted with background color
and markers in the text.
JuliaSyntax.sourcetext
— Functionsourcetext(source::SourceFile)
Get the full source text of a SourceFile
as a string.
sourcetext(node)
Get the full source text of a node.
JuliaSyntax.source_line
— FunctionGet the line number at the given byte index.
JuliaSyntax.source_location
— FunctionGet line number and character within the line at the given byte index.
JuliaSyntax.source_line_range
— FunctionGet byte range of the source line at byteindex, buffered by `contextlinesbeforeand
contextlines_after` before and after.
Expression heads/kinds
JuliaSyntax.Kind
— TypeK"name"
Kind(namestr)
Kind
is a type tag for specifying the type of tokens and interior nodes of a syntax tree. Abstractly, this tag is used to define our own sum types for syntax tree nodes. We do this explicitly outside the Julia type system because (a) Julia doesn't have sum types and (b) we want concrete data structures which are unityped from the Julia compiler's point of view, for efficiency.
Naming rules:
- Kinds which correspond to exactly one textural form are represented with that text. This includes keywords like K"for" and operators like K"*".
- Kinds which represent many textural forms have UpperCamelCase names. This includes kinds like K"Identifier" and K"Comment".
- Kinds which exist merely as delimiters are all uppercase
JuliaSyntax.SyntaxHead
— TypeSyntaxHead(kind, flags)
A SyntaxHead
combines the Kind
of a syntactic construct with a set of flags. The kind defines the broad "type" of the syntactic construct, while the flag bits compactly store more detailed information about the construct.
JuliaSyntax.@K_str
— MacroK"s"
The kind of a token or AST internal node with string "s".
For example
- K")" is the kind of the right parenthesis token
- K"block" is the kind of a block of code (eg, statements within a begin-end).
JuliaSyntax.kind
— Functionkind(x)
Return the Kind
of x
.
JuliaSyntax.head
— Functionhead(x)
Get the SyntaxHead
of a node of a tree or other syntax-related data structure.
JuliaSyntax.flags
— Functionflags(x)
Return the flag bits of a syntactic construct. Prefer to query these with the predicates is_trivia
, is_prefix_call
, is_infix_op_call
, is_prefix_op_call
, is_postfix_op_call
, is_dotted
, is_suffixed
, is_decorated
.
Or extract numeric portion of the flags with numeric_flags
.
see also predicates related to flags
.
Syntax tree types
JuliaSyntax.SyntaxNode
— TypeSyntaxNode(source::SourceFile, raw::GreenNode{SyntaxHead};
keep_parens=false, position::Integer=1)
An AST node with a similar layout to Expr
. Typically constructed from source text by calling one of the parser API functions such as parseall
JuliaSyntax.GreenNode
— TypeGreenNode(head, span)
GreenNode(head, children...)
A "green tree" is a lossless syntax tree which overlays all the source text. The most basic properties of a green tree are that:
- Nodes cover a contiguous span of bytes in the text
- Sibling nodes are ordered in the same order as the text
As implementation choices, we choose that:
- Nodes are immutable and don't know their parents or absolute position, so can be cached and reused
- Nodes are homogenously typed at the language level so they can be stored concretely, with the
head
defining the node type. Normally this would include a "syntax kind" enumeration, but it can also include flags and record information the parser knew about the layout of the child nodes. - For simplicity and uniformity, leaf nodes cover a single token in the source. This is like rust-analyzer, but different from Roslyn where leaves can include syntax trivia.