Over the past several months I've lent myself to perusing various materials related to parsing, lexical analysis, and the-like. It has been an on-off subject for some time, but I started playing with writing my own PEG grammars using a pretty unchallenging utility called PEG.js.

Unfortunately I've hit a snag in my quite nubishly written grammar, and I'm having trouble getting around it. Understandably, most solutions I've seen default to developing context-based implementations, developing multiple grammars for various contexts that can be identified and employed using business logic. However, I wanted to avoid that as much as possible and get as close to a solid context-free grammar that provided an intuitive JSON result for easy traversing.

The original snippet is here, but I'll post below (because everyone hates broken links in forum posts when they click on a search result months or years down the road Razz )

The Problem
The following code produces a very intuitive AST:

Code:
./quux.sh foobar --baz="hello world"

However, the parser chokes on this:

Code:
./appleSauce.sh 2&>1

The major issue currently plaguing me right now is redirection when it begins with an integer. I'm aware of why; the problem is that I'm having trouble rethinking some of the definitions to help the grammar understand that the '2' is associated with the redirection operation.

It produces the following:

Code:
[
   {
      "cmd": "./appleSauce.sh",
      "args": [
         2
      ],
      "div": {
         "redirect": "&>",
         "addend": 1
      }
   }
]


It should produce this:

Code:
[
   {
      "cmd": "./appleSauce.sh",
      "args": [],
      "div": {
         "redirect": "&>",
         "addend": 1,
         "augend": 2
      }
   }
]


You can copy the below grammar into the Parser Generator to test it. Paste the grammar into the left text area, then write a single-line shell command in the top right box; the resulting AST will be displayed in the grey box in the bottom right corner of the page

PEG.js Parser Generator -- Used For Testing Grammars

Grammar

Code:
commands "command"
  = cmd:commandText* { return cmd; }

commandText "command"
  = ws identifier:identifier args:arg* nbws div:cmdDivider ws { return {
      cmd: identifier,
      args: args,
      div: div
  };}

cmdDivider "command separator"
  = redir:redirection? { return redir; }

nbws "whitespace" = [ \t]*
ws "whitespace" = [ \t\n\r]*
rws "whitespace" = [ \t\n\r]+

redirection "redirection"
  = ";" { return ";" }
  / "|" { return "|" }
  / dig:number op:">&" dig2:number { return { redirect: op, augend: dig, addend: dig2 } }
  / dig:number op:">&" add:"-" { return { redirect: op, augend: dig, addend: add } }
  / dig:number op:"<&" dig2:number { return { redirect: op, addend: dig2, augend: dig } }
  / dig:number op:"<&" add:"-" { return { redirect: op, augend: dig, addend: add } }

  / dig:number op:">>" { return { redirect: op, augend: dig } }
  / dig:number op:">&" { return { redirect: op, augend: dig } }
  / dig:number op:">|" { return { redirect: op, augend: dig } }
  / dig:number op:">" { return { redirect: op, augend: dig } }

  / dig:number op:"<<-" { return { redirect: op, augend: dig } }
  / dig:number op:"<<" { return { redirect: op, augend: dig } }
  / dig:number op:"<&" { return { redirect: op, augend: dig } }
  / dig:number op:"<>" { return { redirect: op, augend: dig } }
  / dig:number op:"<" { return { redirect: op, augend: dig } }

  / op:">&" dig:number { return { redirect: op, addend: dig } }
  / op:">&" add:"-" { return { redirect: op, addend: add } }

  / op:"<&" dig:number { return { redirect: op, addend: dig } }
  / op:"<&" add:"-" { return { redirect: op, addend: add } }

  / op:">>" { return { redirect: op } }
  / op:">&" { return { redirect: op } }
  / op:">|" { return { redirect: op } }
  / op:">" { return { redirect: op } }

  / op:"<<-" { return { redirect: op } }
  / op:"<<" { return { redirect: op } }
  / op:"<&" { return { redirect: op } }
  / op:"<>" { return { redirect: op } }
  / op:"<" { return { redirect: op } }

  / op:"&>" dig:number { return { redirect: op, addend: dig } }
  / op:"&>" { return { redirect: op } }

// ----- 3. Values -----

arg "argument"
  = nbws val:(value/variable) { return val; }

value "value"
  = false
    / null
    / true
    / number
    / string
    / identifier

false = "false" { return false; }
null  = "null"  { return null;  }
true  = "true"  { return true;  }

variable "variable"
  = "-"+ varname:variablename ws val:variablesetter? { return { name: varname, value: val }; }

variablename "variable"
  = [A-Za-z][A-Za-z0-9_-]* { return text(); }

variablesetter "variable assignment"
  = "="? ws val:value { return val; }

// ----- 6. Numbers -----

number "number"
  = minus? int frac? exp? { return parseFloat(text()); }

decimal_point "decimal point"
  = "."

digit1_9 "digit"
  = [1-9]

e "e"
  = [eE]

exp "expression"
  = e (minus / plus)? DIGIT+

frac "fraction"
  = decimal_point DIGIT+

int "integer"
  = zero / (digit1_9 DIGIT*)

minus "minus"
  = "-"

plus "plus"
  = "+"

zero "zero"
  = "0"

// ----- 7. Strings -----

identifier "identifier"
  = [~./A-Za-z][A-Za-z0-9_./~-]* { return text(); }

string "string"
  = quotation_mark chars:char* quotation_mark { return chars.join(""); }

char "character"
  = unescaped
  / escape
    sequence:(
        '"'
      / "\\"
      / "/"
      / "b" { return "\b"; }
      / "f" { return "\f"; }
      / "n" { return "\n"; }
      / "r" { return "\r"; }
      / "t" { return "\t"; }
      / "u" digits:$(HEXDIG HEXDIG HEXDIG HEXDIG) {
          return String.fromCharCode(parseInt(digits, 16));
        }
    )
    { return sequence; }

escape "escape"
  = "\\"

quotation_mark "quotation mark"
  = '"'

unescaped "unescaped character"
  = [^\0-\x1F\x22\x5C]

// ----- Core ABNF Rules -----

// See RFC 4234, Appendix B (http://tools.ietf.org/html/rfc4234).
DIGIT "digit" = [0-9]
HEXDIG "hexadecimal digit" = [0-9a-f]i
A somewhat hacky solution has been implemented, but it seems to be working better than anything else I've tried so far.

The first problem I ran into was that I completely failed miserably to provide the redirection definition with the type of redirection I was attempting to perform. DOH! So the "#&>#" style redirection was added.

Lastly, I used a bit of hackery to put a non-capture group that essentially prevented an argument to be placed directly next to an ampersand. My assumption is that as I test more redirections, I may have to add to this list. It's not ideal, but hopefully I can produce something a little more suitable moving forward.

The following:

Code:
./appleSauce.sh testing pew pew 2&>1 ./foobaz.sh

Produces:

Code:
[
   {
      "cmd": "./appleSauce.sh",
      "args": [
         "testing",
         "pew",
         "pew"
      ],
      "div": {
         "redirect": "&>",
         "augend": 2,
         "addend": 1
      }
   },
   {
      "cmd": "./foobaz.sh",
      "args": [],
      "div": null
   }
]


Success! Very Happy

I think a logical next step will be to flesh out the "redirection" operator and break it out into the individual operators (e.g., list terminators vs logical operators vs pipes vs redirection).


Code:
commands "command"
  = cmd:commandText* { return cmd; }

commandText "command"
  = ws identifier:identifier args:(arg)* nbws div:cmdDivider ws { return {
      cmd: identifier,
      args: args.map(v => v),
      div: div
  };}

cmdDivider "command separator"
  = redir:redirection? { return redir; }

nbws "whitespace" = [ \t]*
ws "whitespace" = [ \t\n\r]*
rws "whitespace" = [ \t\n\r]+

redirection "redirection"
  = ";"+ { return ";" }
  / "||"
  / "|"

  / op:"&>" dig:number { return { redirect: op, addend: dig } }
  / op:"&>" { return { redirect: op } }

  / "&&"
  / "&"

  / dig:number op:"&>" dig2:number { return { redirect: op, augend: dig, addend: dig2 } }
  / dig:number op:"&>" add:"-" { return { redirect: op, augend: dig, addend: add } }
  / dig:number op:"&<" dig2:number { return { redirect: op, augend: dig2, addend: dig } }
  / dig:number op:"&<" add:"-" { return { redirect: op, augend: add, addend: dig } }

  / dig:number op:">&" dig2:number { return { redirect: op, augend: dig, addend: dig2 } }
  / dig:number op:">&" add:"-" { return { redirect: op, augend: dig, addend: add } }
  / dig:number op:"<&" dig2:number { return { redirect: op, addend: dig2, augend: dig } }
  / dig:number op:"<&" add:"-" { return { redirect: op, augend: dig, addend: add } }

  / dig:number op:">>" { return { redirect: op, augend: dig } }
  / dig:number op:">&" { return { redirect: op, augend: dig } }
  / dig:number op:">|" { return { redirect: op, augend: dig } }
  / dig:number op:">" { return { redirect: op, augend: dig } }

  / dig:number op:"<<-" { return { redirect: op, augend: dig } }
  / dig:number op:"<<" { return { redirect: op, augend: dig } }
  / dig:number op:"<&" { return { redirect: op, augend: dig } }
  / dig:number op:"<>" { return { redirect: op, augend: dig } }
  / dig:number op:"<" { return { redirect: op, augend: dig } }

  / op:">&" dig:number { return { redirect: op, addend: dig } }
  / op:">&" add:"-" { return { redirect: op, addend: add } }

  / op:"<&" dig:number { return { redirect: op, addend: dig } }
  / op:"<&" add:"-" { return { redirect: op, addend: add } }

  / op:">>" { return { redirect: op } }
  / op:">&" { return { redirect: op } }
  / op:">|" { return { redirect: op } }
  / op:">" { return { redirect: op } }

  / op:"<<-" { return { redirect: op } }
  / op:"<<" { return { redirect: op } }
  / op:"<&" { return { redirect: op } }
  / op:"<>" { return { redirect: op } }
  / op:"<" { return { redirect: op } }

// ----- 3. Values -----

arg "argument"
  = nbws val:(value/variable) ![&><]+ { return val; }

value "value"
  = false
    / null
    / true
    / number
    / string
    / identifier

false = "false" { return false; }
null  = "null"  { return null;  }
true  = "true"  { return true;  }

variable "variable"
  = "-"+ varname:variablename nbws val:variablesetter? { return { name: varname, value: val }; }

variablename "variable"
  = [A-Za-z][A-Za-z0-9_-]* { return text(); }

variablesetter "variable assignment"
  = "="? nbws val:value { return val; }

// ----- 6. Numbers -----

number "number"
  = minus? int frac? exp? { return parseFloat(text()); }

decimal_point "decimal point"
  = "."

digit1_9 "digit"
  = [1-9]

e "e"
  = [eE]

exp "expression"
  = e (minus / plus)? DIGIT+

frac "fraction"
  = decimal_point DIGIT+

int "integer"
  = zero / (digit1_9 DIGIT*)

minus "minus"
  = "-"

plus "plus"
  = "+"

zero "zero"
  = "0"

// ----- 7. Strings -----

identifier "identifier"
  = [~./A-Za-z][A-Za-z0-9_./~-]* { return text(); }

string "string"
  = quotation_mark chars:char* quotation_mark { return chars.join(""); }

char "character"
  = unescaped
  / escape
    sequence:(
        '"'
      / "\\"
      / "/"
      / "b" { return "\b"; }
      / "f" { return "\f"; }
      / "n" { return "\n"; }
      / "r" { return "\r"; }
      / "t" { return "\t"; }
      / "u" digits:$(HEXDIG HEXDIG HEXDIG HEXDIG) {
          return String.fromCharCode(parseInt(digits, 16));
        }
    )
    { return sequence; }

escape "escape"
  = "\\"

quotation_mark "quotation mark"
  = '"'

unescaped "unescaped character"
  = [^\0-\x1F\x22\x5C]

// ----- Core ABNF Rules -----

// See RFC 4234, Appendix B (http://tools.ietf.org/html/rfc4234).
DIGIT "digit" = [0-9]
HEXDIG "hexadecimal digit" = [0-9a-f]i
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement