1
0
mirror of https://github.com/cmur2/joe-syntax.git synced 2024-12-22 02:54:23 +01:00

Updated HowItWorks

This commit is contained in:
Christian Nicolai 2011-08-01 19:21:50 +02:00
parent 81411e1446
commit f4be1d570d

View File

@ -1,5 +1,8 @@
# How JOE syntax highlighting works How JOE syntax highlighting works
*from [c.jsf](http://joe-editor.hg.sourceforge.net/hgweb/joe-editor/joe-editor/file/tip/syntax/c.jsf.in)* =================================
*from [c.jsf](http://joe-editor.hg.sourceforge.net/hgweb/joe-editor/joe-editor/file/tip/syntax/c.jsf.in),
slightly modified*
A (deterministic) state machine which performs lexical analysis of C. A (deterministic) state machine which performs lexical analysis of C.
(This is the "assembly language" of syntax highlighting. A separate (This is the "assembly language" of syntax highlighting. A separate
@ -10,6 +13,7 @@ Each state begins with:
:<name> <color-name> :<name> <color-name>
\<name\> is the state's name.
\<color-name\> is the color used for characters eaten by the state \<color-name\> is the color used for characters eaten by the state
(really a symbol for a user definable color). (really a symbol for a user definable color).
@ -18,9 +22,9 @@ The first state defined is the initial state.
Within a state, define transitions (jumps) to other states. Each Within a state, define transitions (jumps) to other states. Each
jump has the form: jump has the form:
<character-list> <target-state> [<option>s] <character-list> <target-state-name> [<option>s]
There are three ways to specify <character-list>s, either * for any There are three ways to specify \<character-list\>s, either * for any
character not otherwise specified, & to match the character in the character not otherwise specified, & to match the character in the
delimiter match buffer or a literal list of characters within quotes delimiter match buffer or a literal list of characters within quotes
(ranges and escape sequences allowed). When the next character matches (ranges and escape sequences allowed). When the next character matches
@ -30,67 +34,71 @@ eaten (we advance to the next character of the file to be colored).
The * transition should be the first transition specified in the state. The * transition should be the first transition specified in the state.
There are several options: There are several options:
noeat do not eat the character, instead feed it to the next state
* __noeat__ - Do not eat the character, instead feed it to the next state
(this tends to make the states smaller, but be careful: you (this tends to make the states smaller, but be careful: you
can make infinite loops). 'noeat' implies 'recolor=-1'. can make infinite loops). 'noeat' implies 'recolor=-1'.
recolor=-N Recolor the past N characters with the color of the * __recolor=-N__ - Recolor the past N characters with the color of the
target-state. For example once /* is recognized as the target-state. For example once /* is recognized as the
start of C comment, you want to color the /* with the C start of C comment, you want to color the /* with the C
comment color with recolor=-2. comment color with recolor=-2.
mark Mark beginning of a region with current position. * __mark__ - Mark beginning of a region with current position.
markend Mark end of region. * __markend__ - Mark end of region.
recolormark Recolor all of the characters in the marked region with * __recolormark__ - Recolor all of the characters in the marked region with
the color of the target-state. If markend is not given, the color of the target-state. If markend is not given,
all of the characters up to the current position are recolored. all of the characters up to the current position are recolored.
Note that the marked region can not cross line boundaries and Note that the marked region can not cross line boundaries and
must be on the same line as recolormark. must be on the same line as recolormark.
buffer start copying characters to a string buffer, beginning with this * __buffer__ - Start copying characters to a string buffer, beginning with this
one (it's ok to not terminate buffering with a matching one (it's ok to not terminate buffering with a matching
'strings' option- the buffer is limited to leading 23 'strings' option- the buffer is limited to leading 23
characters). characters).
save_c Save character in delimiter match buffer. * __save_c__ - Save character in delimiter match buffer.
save_s Copy string buffer to delimiter match buffer. * __save_s__ - Copy string buffer to delimiter match buffer.
strings A list of strings follows. If the buffer matches any of the * __strings__ - A list of strings follows. If the buffer matches any of the
given strings, a jump to the target-state in the string list given strings, a jump to the target-state in the string list
is taken instead of the normal jump. is taken instead of the normal jump.
istrings Same as strings, but case is ignored. * __istrings__ - Same as strings, but case is ignored.
Note: strings and istrings should be the last option on the Note: strings and istrings should be the last option on the
line. They cause any options which follow them to be ignored. line. They cause any options which follow them to be ignored.
hold Stop buffering string- a future 'strings' or 'istrings' will * __hold__ - Stop buffering string- a future 'strings' or 'istrings' will
look at contents of buffer at this point. Useful for distinguishing look at contents of buffer at this point. Useful for distinguishing
commands and function calls in some languages 'write 7' is a command commands and function calls in some languages 'write 7' is a command
'write (' is a function call- hold lets us stop at the space and delay 'write (' is a function call- hold lets us stop at the space and delay
the string lookup until the ( or 7. the string lookup until the ( or 7.
The format of the string list is: The format of the string list is:
"string" <target-state> [<options>s] "string" <target-state> [<options>s]
"string" <target-state> [<options>s] "string" <target-state> [<options>s]
"&" <target-state> [<options>s] # matches contents of delimiter match buffer "&" <target-state> [<options>s] # matches contents of delimiter match buffer
done done
(all of the options above are allowed except "strings", "istrings" and "noeat". noeat is (all of the options above are allowed except "strings", "istrings" and "noeat". noeat is
always implied after a matched string). always implied after a matched string).
Weirdness: only states have colors, not transitions. This means that you Weirdness: only states have colors, not transitions. This means that you
sometimes have to make dummy states with '* next-state noeat' just to get sometimes have to make dummy states with
a color specification.
* <next-state> noeat
just to get a color specification.
Delimiter match buffer is for perl and shell: a regex in perl can be s<..>(...) Delimiter match buffer is for perl and shell: a regex in perl can be s<..>(...)
and in shell you can say: <<EOS ....... EOS and in shell you can say: <<EOS ....... EOS
New feature: subroutines New feature: subroutines
------------------------
Highlighter state machines can now make subroutine calls. This works by Highlighter state machines can now make subroutine calls. This works by
template instantiation: the called state machine is included in your template instantiation: the called state machine is included in your
@ -108,6 +116,7 @@ The subroutine called 'string' is called and the jump to 'fred' is
ignored. The 'dquote' option is passed to the subroutine. ignored. The 'dquote' option is passed to the subroutine.
The subroutine itself returns to the caller like this: The subroutine itself returns to the caller like this:
"\"" whatever return "\"" whatever return
If we're in a subroutine, the return is made. Otherwise the jump If we're in a subroutine, the return is made. Otherwise the jump
@ -116,14 +125,14 @@ to 'whatever' is made.
There are several ways of delimiting subroutines which show up in how it There are several ways of delimiting subroutines which show up in how it
is called. Here are the options: is called. Here are the options:
call=string() A file called string.jsf is the subroutine. * __call=string()__ - A file called string.jsf is the subroutine.
The entire file is the subroutine. The starting The entire file is the subroutine. The starting
point is the first state in the file. point is the first state in the file.
call=library.string() A file called library.jsf has the subroutine. * __call=library.string()__ - A file called library.jsf has the subroutine.
The subroutine within the file is called string. The subroutine within the file is called string.
call=.string() There is a subroutine called string in the current file. * __call=.string()__ - There is a subroutine called string in the current file.
When a subroutine is within a file, but is not the whole file, it is delimited When a subroutine is within a file, but is not the whole file, it is delimited
as follows: as follows:
@ -137,11 +146,11 @@ as follows:
Option flags can be passed to subroutines which control preprocessor-like Option flags can be passed to subroutines which control preprocessor-like
directives. For example: directives. For example:
.ifdef dquote .ifdef dquote
"\"" idle return "\"" idle return
.endif .endif
.ifdef squote .ifdef squote
"'" idle return "'" idle return
.endif .endif
.else if also available. .ifdefs can be nested. .else if also available. .ifdefs can be nested.