Sale!

CSC 467 Lab 3 AST Construction and Semantic Checking solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (9 votes)

Overview This lab is about constructing abstract syntax trees (ASTs) and making three AST visitors:
one that does semantic checking, one that de-allocates an AST, and one that prints an AST. These
steps will be described in this document.
Note: This lab does not include the while control-flow statement. Be sure to adjust your scanners/-
parsers accordingly.
Note: This lab includes many bonus opportunities. Some of these bonuses are challenging and should
not be taken lightly. If you decide to do one or more of the bonus questions, then state exactly which
bonuses you have done in your report. Your mark in this report will not exceed 100%.
Deliverables The following must be implemented:
• AST construction.
• AST tear-down.
• AST printing (-Da).
• Semantic checking.
• Report: approach to type/semantic checking, approach to AST structure, breakdown of work by
teammates, challenges faced, novelties, etc.
1 Starter Files
This assignment includes the following starter files. As before, you are free to use these starter files,
use your own files, or mix and match.
The following is a list of interesting starter files:
compiler467.c Revised main module for Lab 3.
Makefile Revised Makefile for Lab 3.
scanner.l The working Flex scanner.
parser.y The working Bison parser with full Lab 2 functionality (excluding while loops).
semantic.h/c This is where you should put your code for traversing the AST and checking types,
argument counts, etc. See the AST visitors section and semantic checking section below.
symbol.h/c This is where you would likely put your symbol table. You will need to build some form of
symbol table abstraction wherein you will know the mappings of symbols to types within
each scope. The symbol table will allow you to determine if a variable has been defined,
and if so, what it’s type is.
1
2 ASTs 2
ast.h/c This is where you will define your AST structures/classes/etc. There is already an example
of one approach; however, this approach is not the end-all-be-all. Consider finding your
own approach. Please be sure to make your AST nodes self descriptive; it will make
understanding your programs easier for me, and it will make debugging your programs
easier for you.
Note: the starter files are provided “as is” and should represent a complete starting point for assignment 3; however, there might be bugs, or bug fixes. If you discover errors in the starter files then
please politely report them on the course group.
2 ASTs
An AST is a tree that represents the high-level structure of a program. The AST should omit unnecessary complexities/particularities of the grammar. That is, you will likely have fewer AST node types
than you will non-terminals, because some non-terminals exist for the sake of getting a specific kind
of parse (e.g. operator precedence).
2.1 Construction
Below is a snippet of parser action code that constructs an AST node for the logical AND binary
operator. The exampled code is valid according to the starter files (see ast.h and ast.c); however,
there are many ways to approach AST construction.
$$ = a st_a ll o c a t e (BINARY_EXPRESSION_NODE, AND, $1 , $3 );
Like parsing, AST construction will happen bottom-up, because the parser operates bottom-up. This
is very convenient because it implies that when construction some AST node, you have already constructed that node’s children, and so you need only connect them in. Hopefully the following hypothetical calculator will help you to understand the “flow” of data through the parser:
. . .
%union {
i n t as_int ;
. . .
}
. . .
%type sum
%type product
%type f a c t o r
%token INT
. . .
sum
: sum ’+ ’ product { $$ = $1 + $3 ; }
| product { $$ = $1 ; }
;
product
: product ’∗ ’ f a c t o r { $$ = $1 ∗ $3 ; }
| f a c t o r { $$ = $1 ; }
;
f a c t o r
: INT { $$ = $1 ; }
;
Fig. 1: Hypothetical calculator parser
2 ASTs 3
2.2 Visitors
An AST visitor is a function that is recursively called on each node of an AST. The traversal order
depends on what the visitor is doing. For example, printing an AST would involve a pre-order traversal,
whereas de-allocating an AST would involve a post-order traversal.
There are two typical and analogous ways of implementing visitors. One way is using virtual function
dispatch (as in C++), and another way is to manually switch and dispatch. Below are examples of
each:
clas s Node {
publ ic :
virtual void v i s i t ( V i s i t o r &v i s i t o r ) = 0 ;
} ;
clas s Expr e s s ion : publ ic Node {
publ ic :
virtual void v i s i t ( V i s i t o r &v i s i t o r ) {
v i s i t o r . v i s i t ( this ) ;
}
} ;
clas s Statement : publ ic Node {
publ ic :
virtual void v i s i t ( V i s i t o r &v i s i t o r ) {
v i s i t o r . v i s i t ( this ) ;
}
} ;
clas s V i s i t o r {
publ ic :
void v i s i t ( Expr e s s ion ∗ expr ) ;
void v i s i t ( Statement ∗stmt ) ;
}
Fig. 2: Example AST visitor in C++
2 ASTs 4
struct Node {
enum {
STATEMENT, EXPRESSION
} kind ;
union {
struct {
struct Node ∗ next ;
} stmt ;
struct {
. . .
} expr ;
} ;
} ;
struct V i s i t o r {
void (∗ v i s i t _ e x p r e s s i o n ) ( V i s i t o r ∗ , Node ∗ ) ;
void (∗ vi s i t_s t a t ement ) ( V i s i t o r ∗ , Node ∗ ) ;
}
void v i s i t ( V i s i t o r ∗ v i s i t o r , Node ∗ ro ot ) {
i f (NULL == ro ot ) return ;
switch ( ro ot−>kind ) {
case STATEMENT: v i s i t o r −>vi s i t_s t a t ement ( ro ot ) ; break ;
case EXPRESSION: v i s i t o r −>v i s i t _ e x p r e s s i o n ( ro ot ) ; break ;
}
} ;
void my_vi s i t_expres s ion ( V i s i t o r ∗ v i s i t o r , Node ∗ expr ) ;
void my_visit_statement ( V i s i t o r ∗ v i s i t o r , Node ∗stmt ) {
. . .
v i s i t ( v i s i t o r , stmt−>next )
}
Fig. 3: Example AST visitor in C
2.3 Printing
You must print out the AST. Your AST printer must print the AST according to the following recursively defined forms. Note: your AST printer can add arbitrary spaces for readability.
AST printing is done with the function ast_print, in ast.c.
• Statement Forms
SCOPE (SCOPE (DECLARATIONS …) (STATEMENTS …))
DECLARATIONS (DECLARATIONS …) where … is zero or more DECLARATIONs
DECLARATION (DECLARATION variable-name type-name initial-value?) where initial-value? is
present only if an initial value is assigned to the variable. The value printed should
be an expression form.
STATEMENTS (STATEMENTS …) where … is zero or more statements (ASSIGN, IF, SCOPE).
ASSIGN (ASSIGN type variable-name new-value) where new-value is an expression form, and
type is the type form corresponding to the type of the variable.
IF (IF cond then-stmt else-stmt?) where cond is an expression form, then-stmt is a
statement form, and else-stmt? is an optional statement form.
• Expression Forms
UNARY (UNARY type op expr) where op is either of – or ! and expr is an expression form, and
type is the type form corresponding to the type of the resulting unary expression.
BINARY (BINARY type op left right) where op is one of the binary operators of the language,
left and right are both expression forms, and type is the type form corresponding to
the type of the resulting binary expression.
3 Semantic Checking 5
INDEX (INDEX type id index) where id is the identifier of the variable, index is an expression
form, and type is the type form corresponding to the type of the resulting value.
CALL (CALL name …), where name is the name of a function (in the case of a function
call) or type (in the cast of a constructor call) and … are one or more expression
forms representing the arguments, respectively.
hliterali A literal value. If the value has type bool then true or false should be printed. If
the value is an integer, then a decimal number should be printed. If the value is a
floating point number then a decimal floating point number should be printed.
hidentifieri The exact name of the variable.
• Type Forms
Print out the exact name of the type (e.g. int, bool, vec3, etc.).
Hint: Printing your AST is a crucial step in this lab because your assignment will be graded according
to what is printed.
Hint: Printing should be performed top-down.
3 Semantic Checking
Semantic checking must be automatically performed after AST construction (Hint: parser action of
scope). Technically, you can do it during AST construction; however, I strongly suggest doing it after!
You must implement the following semantic checks. If you feel that some are missing, then please
implement more:
• Implicit Type Conversions
No implicit type conversions are supported. This means, for example, that one cannot use an
int where a float is expected.
• Operators
– All operands to logical operators must have boolean types.
– All operands to arithmetic operators must have arithmetic types.
– All operands to comparison operators must have arithmetic types.
– Both opeands of a binary operator must have exactly the same base type (e.g. it is valid to
multiple an int and an ivec3). See the below table for an outline of which classes of types
can be operated on simultaneously.
Hint: you might find it convenient to consider a basic type to be a vector of size 1 as
a way of treating types in a uniform way.
– If both arguments to a binary operator are vectors, then they must be vectors of the same
order. For example, it is not valid to add an ivec2 with an ivec3.
– The table documents each operator and the classes of operands which that operator accepts.
In the table, s represents a scalar operand and v represents a vector operand:
3 Semantic Checking 6
Operator Operand Classes Category
– s, v
Arithmetic +, – ss, vv
∗ ss, vv, sv, vs
/, ^ ss
! s, v Logical
&&, || ss, vv Logical
<, <=, >, >= ss Comparison
==, != ss, vv
• Conditions
The expression that determins which branch of an if statement should be taken must have the
type bool (not bveci).
• Function Calls
The following functions must be type-checked according to the following C declarations.
float r sq ( float ) ;
float r sq ( int ) ;
float dp3 ( vec4 , ve c4 ) ;
float dp3 ( vec3 , ve c3 ) ;
int dp3 ( ive c4 , i v e c 4 ) ;
int dp3 ( ive c3 , i v e c 3 ) ;
ve c4 l i t ( ve c4 ) ;
• Constructor Calls
Constructors for basic types (bool, int, float) must have one argument that exactly matches
that type. Constructors for vector types must have as many arguments as there are elements
in the vector, and the types of each argument must be exactly the type of the basic type corresponding to the vector type. For example, the following is valid bvec2(true,false), whereas
bvec2(1,true) is invalid.
• Vector Indexing
– The index into a vector (e.g. 1 in foo[1]) must be in the range [0,i − 1] if the vector has
type veci. For example, the maximum index into a variable of type vec2 is 1.
– The result type of indexing into a vector is the base type associated with that vector’s type.
For example, if v has type bvec2 then v[0] has type bool.
• Initialization
– const qualified variables must be initialized with a literal value or with a uniform variable
(see pre-defined variables below), and not an expression.
– Bonus: If you choose to do this bonus then the previous requirement is subsumed by this
requirement. const qualified variables must be initialized with constant expressions. A constant expression is an expression where every operand is either a literal or const qualified.
If you do this bonus, then the value printed for the initial-value? field of the DECLARATION
must be the value of the expression, and not the AST-printed expression. This is challenging because you will need to do minimal constant folding and constant propagation.
Hint: Store the constant value of a variable/expression in a field in each AST node, as well
as in your symbol table. Add an extra field/bit to your types to distinguish constant (i.e.
compile-time) expressions from non-constant expressions. Interpret constant expressions
while checking types.
3 Semantic Checking 7
• Assignment
– The value assigned to a variable (this includes variables declared with an initial value) must
have the same type as that variable, or a type that can be widened to the correct type.
• Variables
– Every variable must be declared before it is used. A variable’s declaration must appear
either within the current scope or an enclosing scope.
– A variable can only be declared once within the same scope. That is, the following is invalid:
{
int a = 1 ;
int a = 2 ;
}
– If a variable is const qualified then it’s value cannot be re-assigned.
– One variable can shadow another variable. That is, the following is valid:
{
int a = 1 ;
{
int a = 2 ;
/∗ in here , a == 2 ∗/
}
/∗ down here , a == 1 ∗/
}
– Bonus: Ensure that every variable has been assigned a value before being read. This is
challenging because it requires checking potentially all control flow paths that lead to a read
of a variable.
• Pre-Defined Variables
There are several pre-defined variables. Each variable fits into one of the following type classes:
Attribute Read-only, non-constant.
Uniform Read-only, constant. These can be assigned to const qualified variables.
Result Write-only, cannot be assigned anywhere in the scope of an if or else statement.
Below is a list of the pre-defined variables. and their types. Each has been annotated with its
type class.
r e s u l t ve c4 gl_FragColor ;
r e s u l t bo ol gl_FragDepth ;
r e s u l t ve c4 gl_FragCoord ;
a t t r i b u t e ve c4 gl_TexCoord ;
a t t r i b u t e ve c4 gl_Color ;
a t t r i b u t e ve c4 gl_Se condary ;
a t t r i b u t e ve c4 gl_FogFragCoord ;
uniform ve c4 gl_Light_Half ;
uniform ve c4 gl_Light_Ambient ;
uniform ve c4 gl_Mat e r ial_S hinine s s ;
uniform ve c4 env1 ;
uniform ve c4 env2 ;
uniform ve c4 env3 ;
Hint: type/semantic checking is usually done bottom-to-top so that you can determine the type of an
expression by looking at the types of its sub-expressions.
4 Symbol Table 8
3.1 Reporting Semantic Errors
Report as many semantic errors as possible, and fprintf each one to errorFile (as declared in
common.h). If an error is found, change errorOccurred to 1. Your errors should be descibe what the
issue encountered was.
Reporting as many errors as possible implies that you have some mechanism for partially recovering
from an error. Here are two example approaches of how to recover from an error that can easily be
generalized:
• Use of an undefined variable: Declare the variable with an any type. If the variable is later
declared in the same scope, then overwrite the fake definition.
• Bad operand type: If an int and a float are used as operands to a +, then report the error,
and set the expression to have the any type.
We say an expression with an any type can be used anywhere. In this sense, assigning an erroneous
expression to have the any type allows the compiler to continue performing type/semantic checks.
Bonus: Report the line number on which the semantic error occured.
Bonus: Report the column of the line, with the line number, on which the error occured.
Bonus: Provide a contextual message that gives extra meaning to the error to help the programmer
debug the error. For example, if the error involves writing to a const qualified variable, then report
the line number where that variable was declared.
4 Symbol Table
The symbol table maps variables to information about those variables (e.g. their type, etc.). The
symbol table must have a notion of scope/context so that two same-named variables in different
scopes do not clobber eachother within the table.
It is suggested that you opt for a simple symbol table implementation. You will not be graded on the
efficiency/cleverness of your implementation.
Hint: One way to implement a symbol table is to have a cactus stack of tables, where each table
corresponds to one scope. Each time you enter a scope, you push a table on to the stack and each time
you exit a scope, you pop the scope table from the stack and insert it into the corresponding scope
AST node.