mirror of
git://git.sv.gnu.org/emacs.git
synced 2026-01-04 19:10:37 -08:00
The PDF versions of the GNU manuals used curved single quotes to
represent grave accent and apostrophe, which made it a pain to cut
and paste code examples from them. Fix the PDF versions to use
grave accent and apostrophe for Lisp source code, keystrokes, etc.
This change does not affect the info files, nor does it affect
ordinary uses of curved single quotes in PDF.
* doc/emacs/docstyle.texi: New file, which specifies treatment for
grave accent and apostrophe, as well as the document encoding.
* doc/emacs/emacs-xtra.texi, doc/emacs/emacs.texi:
* doc/lispintro/emacs-lisp-intro.texi:
* doc/lispref/back.texi, doc/lispref/book-spine.texi:
* doc/lispref/elisp.texi, doc/lispref/lay-flat.texi:
* doc/misc/ada-mode.texi, doc/misc/auth.texi:
* doc/misc/autotype.texi, doc/misc/bovine.texi, doc/misc/calc.texi:
* doc/misc/cc-mode.texi, doc/misc/cl.texi, doc/misc/dbus.texi:
* doc/misc/dired-x.texi, doc/misc/ebrowse.texi, doc/misc/ede.texi:
* doc/misc/ediff.texi, doc/misc/edt.texi, doc/misc/efaq-w32.texi:
* doc/misc/efaq.texi, doc/misc/eieio.texi, doc/misc/emacs-gnutls.texi:
* doc/misc/emacs-mime.texi, doc/misc/epa.texi, doc/misc/erc.texi:
* doc/misc/ert.texi, doc/misc/eshell.texi, doc/misc/eudc.texi:
* doc/misc/eww.texi, doc/misc/flymake.texi, doc/misc/forms.texi:
* doc/misc/gnus-coding.texi, doc/misc/gnus-faq.texi:
* doc/misc/gnus.texi, doc/misc/htmlfontify.texi:
* doc/misc/idlwave.texi, doc/misc/ido.texi, doc/misc/info.texi:
* doc/misc/mairix-el.texi, doc/misc/message.texi, doc/misc/mh-e.texi:
* doc/misc/newsticker.texi, doc/misc/nxml-mode.texi:
* doc/misc/octave-mode.texi, doc/misc/org.texi, doc/misc/pcl-cvs.texi:
* doc/misc/pgg.texi, doc/misc/rcirc.texi, doc/misc/reftex.texi:
* doc/misc/remember.texi, doc/misc/sasl.texi, doc/misc/sc.texi:
* doc/misc/semantic.texi, doc/misc/ses.texi, doc/misc/sieve.texi:
* doc/misc/smtpmail.texi, doc/misc/speedbar.texi:
* doc/misc/srecode.texi, doc/misc/todo-mode.texi, doc/misc/tramp.texi:
* doc/misc/url.texi, doc/misc/vhdl-mode.texi, doc/misc/vip.texi:
* doc/misc/viper.texi, doc/misc/widget.texi, doc/misc/wisent.texi:
* doc/misc/woman.texi:
Use it instead of '@documentencoding UTF-8', to lessen the need for
global changes like this in the future.
* doc/emacs/Makefile.in (EMACS_XTRA):
* doc/lispintro/Makefile.in (srcs):
* doc/lispref/Makefile.in (srcs):
Add dependency on docstyle.texi.
* doc/misc/Makefile.in (style): New macro.
(${buildinfodir}/%.info, %.dvi, %.pdf, %.html)
(${buildinfodir}/ccmode.info, ${buildinfodir}/efaq%.info, gnus_deps):
Use it.
475 lines
14 KiB
Text
475 lines
14 KiB
Text
\input texinfo @c -*-texinfo-*-
|
|
@c %**start of header
|
|
@setfilename ../../info/bovine.info
|
|
@set TITLE Bovine parser development
|
|
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
|
|
@settitle @value{TITLE}
|
|
@include docstyle.texi
|
|
|
|
@c *************************************************************************
|
|
@c @ Header
|
|
@c *************************************************************************
|
|
|
|
@c Merge all indexes into a single index for now.
|
|
@c We can always separate them later into two or more as needed.
|
|
@syncodeindex vr cp
|
|
@syncodeindex fn cp
|
|
@syncodeindex ky cp
|
|
@syncodeindex pg cp
|
|
@syncodeindex tp cp
|
|
|
|
@c @footnotestyle separate
|
|
@c @paragraphindent 2
|
|
@c @@smallbook
|
|
@c %**end of header
|
|
|
|
@copying
|
|
Copyright @copyright{} 1999--2004, 2012--2015 Free Software Foundation, Inc.
|
|
|
|
@quotation
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
any later version published by the Free Software Foundation; with no
|
|
Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
|
|
and with the Back-Cover Texts as in (a) below. A copy of the license
|
|
is included in the section entitled ``GNU Free Documentation License''.
|
|
|
|
(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
|
|
modify this GNU manual.''
|
|
@end quotation
|
|
@end copying
|
|
|
|
@dircategory Emacs misc features
|
|
@direntry
|
|
* Bovine: (bovine). Semantic bovine parser development.
|
|
@end direntry
|
|
|
|
@iftex
|
|
@finalout
|
|
@end iftex
|
|
|
|
@c @setchapternewpage odd
|
|
@c @setchapternewpage off
|
|
|
|
@titlepage
|
|
@sp 10
|
|
@title @value{TITLE}
|
|
@author by @value{AUTHOR}
|
|
@page
|
|
@vskip 0pt plus 1 fill
|
|
@insertcopying
|
|
@end titlepage
|
|
@page
|
|
|
|
@macro semantic{}
|
|
@i{Semantic}
|
|
@end macro
|
|
|
|
@c *************************************************************************
|
|
@c @ Document
|
|
@c *************************************************************************
|
|
@contents
|
|
|
|
@node top
|
|
@top @value{TITLE}
|
|
|
|
The @dfn{bovine} parser is the original @semantic{} parser, and is an
|
|
implementation of an @acronym{LL} parser. It is good for simple
|
|
languages. It has many conveniences making grammar writing easy. The
|
|
conveniences make it less powerful than a Bison-like @acronym{LALR}
|
|
parser. For more information, @inforef{Top, The Wisent Parser Manual,
|
|
wisent}.
|
|
|
|
Bovine @acronym{LL} grammars are stored in files with a @file{.by}
|
|
extension. When compiled, the contents is converted into a file of
|
|
the form @file{NAME-by.el}. This, in turn is byte compiled.
|
|
@inforef{top, Grammar Framework Manual, grammar-fw}.
|
|
|
|
@ifnottex
|
|
@insertcopying
|
|
@end ifnottex
|
|
|
|
@menu
|
|
* Starting Rules:: The starting rules for the grammar.
|
|
* Bovine Grammar Rules:: Rules used to parse a language.
|
|
* Optional Lambda Expression:: Actions to take when a rule is matched.
|
|
* Bovine Examples:: Simple Samples.
|
|
* GNU Free Documentation License:: The license for this documentation.
|
|
@c * Index::
|
|
@end menu
|
|
|
|
@node Starting Rules
|
|
@chapter Starting Rules
|
|
|
|
In Bison, one and only one nonterminal is designated as the ``start''
|
|
symbol. In @semantic{}, one or more nonterminals can be designated as
|
|
the ``start'' symbol. They are declared following the @code{%start}
|
|
keyword separated by spaces. @inforef{start Decl, ,grammar-fw}.
|
|
|
|
If no @code{%start} keyword is used in a grammar, then the very first
|
|
is used. Internally the first start nonterminal is targeted by the
|
|
reserved symbol @code{bovine-toplevel}, so it can be found by the
|
|
parser harness.
|
|
|
|
To find locally defined variables, the local context handler needs to
|
|
parse the body of functional code. The @code{scopestart} declaration
|
|
specifies the name of a nonterminal used as the goal to parse a local
|
|
context, @inforef{scopestart Decl, ,grammar-fw}. Internally the
|
|
scopestart nonterminal is targeted by the reserved symbol
|
|
@code{bovine-inner-scope}, so it can be found by the parser harness.
|
|
|
|
@node Bovine Grammar Rules
|
|
@chapter Bovine Grammar Rules
|
|
|
|
The rules are what allow the compiler to create tags from a language
|
|
file. Once the setup is done in the prologue, you can start writing
|
|
rules. @inforef{Grammar Rules, ,grammar-fw}.
|
|
|
|
@example
|
|
@var{result} : @var{components1} @var{optional-semantic-action1})
|
|
| @var{components2} @var{optional-semantic-action2}
|
|
;
|
|
@end example
|
|
|
|
@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
|
|
@var{components} is a list of elements that are to be matched if @var{result}
|
|
is to be made. @var{optional-semantic-action} is an optional sequence
|
|
of simplified Emacs Lisp expressions for concocting the parse tree.
|
|
|
|
In bison, each time an element of @var{components} is found, it is
|
|
@dfn{shifted} onto the parser stack. (The stack of matched elements.)
|
|
When all @var{components}' elements have been matched, it is
|
|
@dfn{reduced} to @var{result}. @xref{Algorithm,,, bison, The GNU Bison Manual}.
|
|
|
|
A particular @var{result} written into your grammar becomes
|
|
the parser's goal. It is designated by a @code{%start} statement
|
|
(@pxref{Starting Rules}). The value returned by the associated
|
|
@var{optional-semantic-action} is the parser's result. It should be
|
|
a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
|
|
semantic-appdev}.
|
|
|
|
@var{components} is made up of symbols. A symbol such as @code{FOO}
|
|
means that a syntactic token of class @code{FOO} must be matched.
|
|
|
|
@menu
|
|
* How Lexical Tokens Match::
|
|
* Grammar-to-Lisp Details::
|
|
* Order of components in rules::
|
|
@end menu
|
|
|
|
@node How Lexical Tokens Match
|
|
@section How Lexical Tokens Match
|
|
|
|
A lexical rule must be used to define how to match a lexical token.
|
|
|
|
For instance:
|
|
|
|
@example
|
|
%keyword FOO "foo"
|
|
@end example
|
|
|
|
Means that @code{FOO} is a reserved language keyword, matched as such
|
|
by looking up into a keyword table, @inforef{keyword Decl,
|
|
,grammar-fw}. This is because @code{"foo"} will be converted to
|
|
@code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO}
|
|
won't be available any other way.
|
|
|
|
If we specify our token in this way:
|
|
|
|
@example
|
|
%token <symbol> FOO "foo"
|
|
@end example
|
|
|
|
then @code{FOO} will match the string @code{"foo"} explicitly, but it
|
|
won't do so at the lexical level, allowing use of the text
|
|
@code{"foo"} in other forms of regular expressions.
|
|
|
|
In that case, @code{FOO} is a @code{symbol}-type token. To match, a
|
|
@code{symbol} must first be encountered, and then it must
|
|
@code{string-match "foo"}.
|
|
|
|
@table @strong
|
|
@item Caution:
|
|
Be especially careful to remember that @code{"foo"}, and more
|
|
generally the %token's match-value string, is a regular expression!
|
|
@end table
|
|
|
|
Non symbol tokens are also allowed. For example:
|
|
|
|
@example
|
|
%token <punctuation> PERIOD "[.]"
|
|
|
|
filename : symbol PERIOD symbol
|
|
;
|
|
@end example
|
|
|
|
@code{PERIOD} is a @code{punctuation}-type token that will explicitly
|
|
match one period when used in the above rule.
|
|
|
|
@table @strong
|
|
@item Please Note:
|
|
@code{symbol}, @code{punctuation}, etc., are predefined lexical token
|
|
types, based on the @dfn{syntax class}-character associations
|
|
currently in effect.
|
|
@end table
|
|
|
|
@node Grammar-to-Lisp Details
|
|
@section Grammar-to-Lisp Details
|
|
|
|
For the bovinator, lexical token matching patterns are @emph{inlined}.
|
|
When the grammar-to-lisp converter encounters a lexical token
|
|
declaration of the form:
|
|
|
|
@example
|
|
%token <@var{type}> @var{token-name} @var{match-value}
|
|
@end example
|
|
|
|
It substitutes every occurrences of @var{token-name} in rules, by its
|
|
expanded form:
|
|
|
|
@example
|
|
@var{type} @var{match-value}
|
|
@end example
|
|
|
|
For example:
|
|
|
|
@example
|
|
%token <symbol> MOOSE "moose"
|
|
|
|
find_a_moose: MOOSE
|
|
;
|
|
@end example
|
|
|
|
Will generate this pseudo equivalent-rule:
|
|
|
|
@example
|
|
find_a_moose: symbol "moose" ;; invalid syntax!
|
|
;
|
|
@end example
|
|
|
|
Thus, from the bovinator point of view, the @var{components} part of a
|
|
rule is made up of symbols and strings. A string in the mix means
|
|
that the previous symbol must have the additional constraint of
|
|
exactly matching it, as described in @ref{How Lexical Tokens Match}.
|
|
|
|
@table @strong
|
|
@item Please Note:
|
|
For the bovinator, this task was mixed into the language definition to
|
|
simplify implementation, though Bison's technique is more efficient.
|
|
@end table
|
|
|
|
@node Order of components in rules
|
|
@section Order of components in rules
|
|
|
|
If a rule has multiple components, order is important, for example
|
|
|
|
@example
|
|
headerfile : symbol PERIOD symbol
|
|
| symbol
|
|
;
|
|
@end example
|
|
|
|
would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
|
|
The bovine parser will first attempt to match the long form, and then
|
|
the short form. If they were in reverse order, then the long form
|
|
would never be tested.
|
|
|
|
@c @xref{Default syntactic tokens}.
|
|
|
|
@node Optional Lambda Expression
|
|
@chapter Optional Lambda Expressions
|
|
|
|
The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
|
|
a bovine lambda. This lambda has special short-cuts to simplify
|
|
reading the semantic action definition. An @acronym{OLE} like this:
|
|
|
|
@example
|
|
( $1 )
|
|
@end example
|
|
|
|
results in a lambda return which consists entirely of the string
|
|
or object found by matching the first (zeroth) element of match.
|
|
An @acronym{OLE} like this:
|
|
|
|
@example
|
|
( ,(foo $1) )
|
|
@end example
|
|
|
|
executes @code{foo} on the first argument, and then splices its return
|
|
into the return list whereas:
|
|
|
|
@example
|
|
( (foo $1) )
|
|
@end example
|
|
|
|
executes @code{foo}, and that is placed in the return list.
|
|
|
|
Here are other things that can appear inline:
|
|
|
|
@table @code
|
|
@item $1
|
|
The first object matched.
|
|
|
|
@item ,$1
|
|
The first object spliced into the list (assuming it is a list from a
|
|
non-terminal).
|
|
|
|
@item '$1
|
|
The first object matched, placed in a list. I.e., @code{( $1 )}.
|
|
|
|
@item foo
|
|
The symbol @code{foo} (exactly as displayed).
|
|
|
|
@item (foo)
|
|
A function call to foo which is stuck into the return list.
|
|
|
|
@item ,(foo)
|
|
A function call to foo which is spliced into the return list.
|
|
|
|
@item '(foo)
|
|
A function call to foo which is stuck into the return list in a list.
|
|
|
|
@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
|
|
A list starting with @code{EXPAND} performs a recursive parse on the
|
|
token passed to it (represented by @samp{$1} above.) The
|
|
@dfn{semantic list} is a common token to expand, as there are often
|
|
interesting things in the list. The @var{nonterminal} is a symbol in
|
|
your table which the bovinator will start with when parsing.
|
|
@var{nonterminal}'s definition is the same as any other nonterminal.
|
|
@var{depth} should be at least @samp{1} when descending into a
|
|
semantic list.
|
|
|
|
@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
|
|
Is like @code{EXPAND}, except that the parser will iterate over
|
|
@var{nonterminal} until there are no more matches. (The same way the
|
|
parser iterates over the starting rule (@pxref{Starting Rules}). This
|
|
lets you have much simpler rules in this specific case, and also lets
|
|
you have positional information in the returned tokens, and error
|
|
skipping.
|
|
|
|
@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
|
|
This is used for creating an association list. Each @var{symbol} is
|
|
included in the list if the associated @var{value} is non-@code{nil}.
|
|
While the items are all listed explicitly, the created structure is an
|
|
association list of the form:
|
|
|
|
@example
|
|
((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
|
|
@end example
|
|
|
|
@item (TAG @var{name} @var{class} [@var{attributes}])
|
|
This creates one tag in the current buffer.
|
|
|
|
@table @var
|
|
@item name
|
|
Is a string that represents the tag in the language.
|
|
|
|
@item class
|
|
Is the kind of tag being create, such as @code{function}, or
|
|
@code{variable}, though any symbol will work.
|
|
|
|
@item attributes
|
|
Is an optional set of labeled values such as @code{:constant-flag t :parent
|
|
"parenttype"}.
|
|
@end table
|
|
|
|
@item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
|
|
@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
|
|
@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
|
|
@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
|
|
@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
|
|
@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
|
|
Create a tag with @var{name} of respectively the class
|
|
@code{variable}, @code{function}, @code{type}, @code{include},
|
|
@code{package}, and @code{code}.
|
|
See @inforef{Creating Tags, , semantic-appdev} for the lisp
|
|
functions these translate into.
|
|
@end table
|
|
|
|
If the symbol @code{%quotemode backquote} is specified, then use
|
|
@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
|
|
This lets you send @code{$1} as a symbol into a list instead of having
|
|
it expanded inline.
|
|
|
|
@node Bovine Examples
|
|
@chapter Examples
|
|
|
|
The rule:
|
|
|
|
@example
|
|
any-symbol: symbol
|
|
;
|
|
@end example
|
|
|
|
is equivalent to
|
|
|
|
@example
|
|
any-symbol: symbol
|
|
( $1 )
|
|
;
|
|
@end example
|
|
|
|
which, if it matched the string @samp{"A"}, would return
|
|
|
|
@example
|
|
( "A" )
|
|
@end example
|
|
|
|
If this rule were used like this:
|
|
|
|
@example
|
|
%token <punctuation> EQUAL "="
|
|
@dots{}
|
|
assign: any-symbol EQUAL any-symbol
|
|
( $1 $3 )
|
|
;
|
|
@end example
|
|
|
|
it would match @samp{"A=B"}, and return
|
|
|
|
@example
|
|
( ("A") ("B") )
|
|
@end example
|
|
|
|
The letters @samp{A} and @samp{B} come back in lists because
|
|
@samp{any-symbol} is a nonterminal, not an actual lexical element.
|
|
|
|
To get a better result with nonterminals, use @asis{,} to splice lists
|
|
in like this:
|
|
|
|
@example
|
|
%token <punctuation> EQUAL "="
|
|
@dots{}
|
|
assign: any-symbol EQUAL any-symbol
|
|
( ,$1 ,$3 )
|
|
;
|
|
@end example
|
|
|
|
which would return
|
|
|
|
@example
|
|
( "A" "B" )
|
|
@end example
|
|
|
|
@node GNU Free Documentation License
|
|
@appendix GNU Free Documentation License
|
|
|
|
@include doclicense.texi
|
|
|
|
@c There is nothing to index at the moment.
|
|
@ignore
|
|
@node Index
|
|
@unnumbered Index
|
|
@printindex cp
|
|
@end ignore
|
|
|
|
@iftex
|
|
@contents
|
|
@summarycontents
|
|
@end iftex
|
|
|
|
@bye
|
|
|
|
@c Following comments are for the benefit of ispell.
|
|
|
|
@c LocalWords: bovinator inlined
|