CS 440 Homework 2 DFAs, NFAs, Regexes, CFLs solution

$30.00

Download Details:

  • Name: hw2-kae9fp.zip
  • Type: zip
  • Size: 259.19 KB

Category:

Description

5/5 - (6 votes)

Programming Languages and Translators
1 Regular Expressions and Finite Automata
Task 1.1 (Written, 9 points).
For each of the following language descriptions, give a regular expression that matches exactly that
language. You don’t have to find the shortest possible regular expression. You may use expressions such
as [a − z] to represent the letters a through z, and/or regular definitions:
letter → [a − z]
string → letter ∗
To make things clearer, use to mean a single space (e.g. write “Hello world” instead of “Hello
world”.)
(a) Strings over the alphabet {a, b, c, d} that are in alphabetical order.
Examples: aabccc, abcd, bd, a, ad
Not examples: dc, ca, cba
(b) Natural numbers (0 and up), with no leading zero unless the whole thing is a single zero, and going
right-to-left, groups of 3 digits are separated by commas.
Examples: 0; 1; 12; 123; 1,234; 12,345; 1,000,000
Not examples: 01; 1000; 123,4; 12,34
(c) Strings consisting of letters a-z and spaces that do not begin or end with whitespace and all white
space is one space long.
Example: “so this is ok”
Not examples: “this is bad”, ” this too”, “and this ”
Task 1.2 (Written, 10 points).
For each of the following NFAs, identify the language accepted by the NFA (describe it or give a regular
expression).
(a)
start 0
1
3
2
4


a
b
a
b
(b)
start 0 1 2 3
a, b
a b b
2 CS 440
1.1 NFA to DFA
Consider the following NFA, which accepts strings that:
• begin with b and end with b or aa (with 0 or more as and bs in between) or
• begin and end with a (with 0 or more as and bs in between).
(convince yourself this is the lanugage recognized by the NFA).
start 0
1 2 3
4 5
6 7 8
9 10 11


b 


b
a a
a a
a, b
a, b
Task 1.3 (Written, 6 points).
Recall that −closure(s) for a state s is the set of states reachable from s using only -transitions, and
that this idea of “reachability” is transitive, that is, if there is an -transition from s to s
0 and one from s
0
to s
00, then s
00 is in −closure(s).
Compute
(a) −closure(0)
(b) −closure(3)
(c) −closure(2)
Task 1.4 (Written, 10 points).
Use the subset construction to convert the NFA above into a DFA. You may present the resulting DFA
using either a transition diagram or a transition table. Recall that, in the subset construction, states of
the DFA correspond to sets of states of the NFA. Label your DFA states with the set of NFA states,
e.g. {1, 2, 3}. The start state should be −closure(0).
Task 1.5 (Written, 5 points).
Why does the DFA you constructed above not have an “error” state? (Think about when, during
scanning an input, we would realize that a string cannot be in this language.)
2 DFA Simulator
In the next section, we’re going to build a regular expression matcher, which requires simulating an NFA.
As a warmup, in this section, you’re going to simulate a DFA, which is a little easier. Your code for this
section will be in dfa.ml. At the top, we give a few type definitions for DFAs: states and symbols are just
integers and characters, respectively, but we still define them using “type synonyms”:
type state = int
3 CS 440
This defines a type state that’s just the same as int, but we can refer to state to make it clearer when
we mean a state in the code. (This is also good software engineering practice because if we decide we don’t
want to just represent states using ints later, we can change it easily.)
DFAs are implemented as records:
type dfa = { states : int; (* Number of states *)
(* States are numbered 0 through states
* State 0 is the start state *)
accept : state list; (* List of accept states *)
alpha : symbol list; (* Alphabet of the DFA *)
trans : (state * symbol * state) list
(* List of transitions in the form
* (from state, transition symbol, to state) *)
}
For example, we could represent the DFA
start 0 1 2 3
b
a
a
b
b
a
a
b
with the record
{ states = 4; (* 4 states, 0-3 *)
accept = [2; 3]; (* states 2 and 3 are accepting *)
alpha = [’a’; ’b’]; (* alphabet is a, b *)
trans = [(0, ’b’, 0); (0, ’a’, 1);
(1, ’a’, 1); (1, ’b’, 2);
(2, ’b’, 2); (2, ’a’, 3);
(3, ’a’, 3); (3, ’b’, 2)]
}
Writing large DFAs like that is hard though, so we’ve provided functions that parse DFAs and inputs
from input files. The format of the input file is as follows:

<alphabet, a character string with no spaces>
<input, a character string with no spaces>
<transitions, one per line, each of the format:>
An example is in examples/dfa1. You may want to write your own examples for testing.
The function Parser.parse_file : string -> dfa * symbol list takes a filename and returns a pair
of the DFA and the input. The parsing code is in a different file, so you won’t be able to use it by just copying
and pasting code into the toplevel or TryOCaml. You can still test using asserts, though. For example:
assert (transition (fst (Parser.parse_file “examples/dfa1”)).trans ’a’ 0 = 1)
Updated (Code updated 3/1). You can then check if your assert passes by compiling the file and running it:
ocamlc -o dfa_test parseDFA.ml dfa.ml
./dfa_test
4 CS 440
If you don’t see any errors, then your assertion passed. You can also test that your whole DFA simulator
works using the instructions at the end of this section.
Task 2.1 (Programming, 5 points).
Implement the function
transition : (state * symbol * state) list) -> symbol -> state -> state.
transition d.trans sy st should return the new state that we’ll be in if we see symbol sy while in
state st, where d.trans is the list of transitions of the DFA (in the form above).
For example, if d is the DFA above,
• transition d.trans ’a’ 0 = 1
• transition d.trans ’b’ 1 = 2
You should raise the exception IllformedInput if the state or symbol isn’t in the transition list (e.g.
if st is 5 or sy is ’q’). The exception IllformedInput takes a string, so you can output a helpful message
that will be printed, e.g.
raise (IllformedInput
(Printf.sprintf
“State, symbol pair (%d, %c) not in transitions”
state symb))
Now you’ll write the function that actually simulates the DFA. Applying dfa_sim dfa st input simulates execution of a DFA starting in state st on the input input. It should return true if we end up in
an accepting state, and false otherwise. We should get an IllformedInput exception if something is wrong
(e.g., a transition is missing from dfa.trans or we see a symbol not in dfa.alpha). The initial state of the
DFA is always state 0, but you’ll probably want dfa_sim to call itself recursively when you’re in a new state,
so dfa_sim takes the current state as an argument.
Task 2.2 (Programming, 8 points).
Implement the function dfa_sim: dfa -> state -> symbol list -> bool as described above. Remember that a DFA only accepts or rejects once it has read all of the input: we can pass through an
accepting state, then read more input, and end up in a rejecting state.
2.1 Testing
Run make dfa to compile the whole DFA simulator. If you don’t have make installed, you can run
ocamlc -o dfa parseDFA.ml dfa.ml dfaMain.ml instead.
Then run ./dfa to parse the DFA and input from the given file and run your DFA simulator
on it. The program will print either “ACCEPTED” or “REJECTED” depending on whether your simulator
returns true or false.
3 Regular Expression Matcher
Lots of software has to solve the problem of deciding whether or not a piece of text matches a given
regular expression. In fact, the find/replace feature of the editor you’re using to write your code probably
implements it. In this section, you’ll build part of a regular expression matcher. To do this, you’ll write code
that simulates an NFA (recall that we can “compile” any regular expression into an NFA that determines
whether input matches it).
3.1 NFA Simulator
You’ll implement the NFA simulator in nfa.ml. This file already includes type definitions for NFAs.
They’re similar to the types for DFAs above, with the difference that the type of a transition is now
5 CS 440
state * symbol option * state. If the symbol is None, this is an -transition.
We can represent the NFA
start 0
1
3
2
4


a
b
a
b
with the record
{ states = 5; (* 5 states, 0-4 *)
accept = [2; 4]; (* states 2 and 4 are accepting *)
alpha = [’a’; ’b’]; (* alphabet is a, b *)
trans = [(0, None, 1); (0, None, 3); (* epsilon transitions *)
(1, Some ’a’, 2); (3, Some ’b’, 4);
(2, Some ’a’, 2); (4, Some ’b’, 4)]
}
Recall that NFAs don’t always list all of the transitions: an unlisted transition is assumed to automatically
go to a rejecting state and stay there regardless of the rest of the input. So, for example, on the above NFA,
if we’re in state 1 and see a b, we can immediately reject the input. States like this that lead to immediate
rejecting states aren’t listed in the transition list (so, in particular, unlike with the DFA simulator, seeing a
state and symbol not listed in the transition list should no longer raise an exception).
Again, we’ve given you the function Parser.parse_file: string -> nfa * symbol list. The file
format is the same as above, except that the lines for transitions can now have the form
indicating an -transition. You can test asserts by compiling as follows:
ocamlc -o nfa_test parseNFA.ml nfa.ml
./nfa_test
You can, of course, simulate an NFA by converting it to a DFA using the algorithm we learned in class
and using your DFA simulator. However, this is both less efficient (because the DFA might be exponentially
larger than the NFA) and more complicated than just simulating the NFA directly. An efficient algorithm
for this is listed in Chapter 3.7.2 of Aho et al.’s “Purple Dragon Book.” The basic idea is to keep track of a
set S of states that can be reached from the initial state with the input we’ve seen so far. We start off with
S = {0}, and then repeat the following until we’ve seen all the input.
• Take the -closure of S.
• If we’ve reached the end of the input, then accept if and only if at least one state in S is an accepting
state.
• Otherwise, if the next symbol is a, for each state s ∈ S, remove s from S and add to S any states that
s transitions to on a (since this is an NFA, this may be zero, one or more states).
In the above, we talk about sets, but we haven’t seen a set data structure in OCaml (there is one in
the standard library, but it’s a little tricky to work with). Instead, we’ll still use lists with the observation that we can treat a list as a set if we sort it and remove duplicates. We’ve given you a function
norm : state list -> state list that does just this.
6 CS 440
Task 3.1 (Programming, 7 points).
Implement the function
transitions : (state * symbol option * state) list -> symbol option -> state -> state list
transitions n.trans (Some sy) st should return the list of states we can transition to on seeing symbol
sy in state st. transitions n.trans None st should return the list of states with -transitions from st.
For example, if n is the NFA above,
• transitions n.trans None 0 = [1; 3]
• transitions n.trans (Some ’a’) 1 = [2]
• transitions n.trans (Some ’b’) 1 = []
• transitions n.trans None 1 = []
Your output does not need to be normalized, i.e. it may contain duplicates and need not be sorted
(though you can do this if you want).
Next, you’ll implement a function eps_clos that takes the -closure of a set of states, as defined above.
The procedure for this is:
• For each s ∈ S, add to S any states with -transitions from s.
• Repeat until you’ve reached a fixed point, that is, running the above step doesn’t add any more states
to S.
For example, on the NFA above,
• eps_clos [0] = [0; 1; 3]
• eps_clos [1] = [1]
• eps_clos [0; 1] = [0; 1; 3]
• eps_clos [0; 2] = [0; 1; 2; 3]
Note that we don’t ever remove states from S when taking the -closure. The second step is important
to make sure your function doesn’t run into an infinite loop. For example, consider the NFA
start 0 1


When taking the -closure of [0], we can stop as soon as we’ve gotten to [0; 1], but if we’re not careful,
we might try to add 0 to the set again, and then add 1 again, and so on… you should recognize when you’re
not adding any new states and stop. Note that taking the -closure of a set is always guaranteed to terminate
because on each iteration, if we don’t add any new states, we stop and you can only continue adding states
until you’ve added all the states in the NFA, at which point you have to stop. Note that the Purple Dragon
Book gives an algorithm for computing the -closure, but their algorithm is very imperative: we’re trying to
think more functionally.
Task 3.2 (Programming, 12 points).
Implement the function eps_clos : nfa -> state list -> state list described above.
Hint: You may (or may not) find the standard library function List.concat helpful.
7 CS 440
Task 3.3 (Programming, 10 points).
Implement the function nfa_sim: nfa -> state list -> symbol list -> bool.
nfa_sim nfa states input should:
• Let states’ be the -closure of states (we’ve done this part for you)
• If we’re out of input, return true if any state in states’ is an accepting state, otherwise return false.
• Otherwise, look at the next symbol, update the set of states accordingly, and call nfa_sim with the
new set of states.
Hint: You may (or may not) find the standard library function List.exists helpful.
3.2 Testing the NFA Simulator
Run make nfa to compile the whole NFA simulator. If you don’t have make installed, you can run
ocamlc -o nfa parseNFA.ml nfa.ml nfaMain.ml instead.
Then run ./nfa to parse the NFA and input from the given file and run your NFA simulator
on it. The program will print either “ACCEPTED” or “REJECTED” depending on whether your simulator
returns true or false.
3.3 Converting Regular Expressions to NFAs
The file regex.ml has a type definition for regular expressions, a bunch of code for converting regular
expressions to NFAs, a bunch more code for parsing regular expressions, and then a bit of code at the end
that runs a regular expression matcher on an input string and regular expression. Most of this code is already
written. You’ll just write one small function that builds an NFA that recognizes a single symbol.
Task 3.4 (Programming, 3 points).
Implement the function symb_nfa : symbol option -> nfa.
symb_nfa (Some s) should return an NFA over the alphabet [s] that accepts only the input s.
symb_nfa None should return an NFA over the empty alphabet that accepts only the empty string.
Hint: In both cases, the NFA should have two states and one transition.
3.4 Testing the Regular Expression Matcher
Run make regex to compile the regular expression matcher. This depends on your NFA code, so you need
to have done that part already. If you don’t have make installed, you can run
ocamlc -o regex parseNFA.ml nfa.ml regex.ml regexMain.ml instead.
Then run ./regex to check if the given string matches the regular
expression. The program will print either “ACCEPTED” or “REJECTED” depending on whether your
matcher returns true or false.
For example, ./regex “ab*” “abbbbbbb” should print “ACCEPTED” but ./regex “ab*” “b” should
print “REJECTED.” Updated (Typo fixed 3/1)
4 Standard Written Questions
Task 4.1 (Written, 0 points).
How long (approximately, in hours/minutes of actual working time) did you spend on this homework,
total? Your honest feedback will help us with future homeworks.
8 CS 440
Task 4.2 (Written, 0 points).
Who, if anyone, did you collaborate with (and in what way), and what outside sources, if any, did you
consult in working on this homework?
9 CS 440