Hello,
So with recent stuff going on with gCAS, I decided to document the expression evaluator I use in tiDE, and that I plan to use in KnightOS. It is designed to work with minimal memory, and is surprisingly simple.
The first step is to tokenize the expression. For now, we can assume that we are working with 8 bit signed numbers. Each token is an ID, and for literals, the value. This can easily be extended to work with variables and other functions. Here are the basic IDs:
-00: End of Expression
-01: Literal value (followed by value)
-02: +
-03: -
-04: *
-05: /
-06: (
-07: )
-...: Add more as you see fit
For example, "2+6*3" would become 010203010605010300
Then, a simple loop starts:

Code:
As long as the expression contains operators...
Find the next operator to be executed, based on order of operations.
Execute the operator and replace the operator and its operands with the result in the expression.  In the case of parenthesis, extract the expression within and recursively call this routine again.


A very simple and memory efficient algorithm that ideally shouldn't use any more memory than is required to store the expression, and to recursively call itself for parenthesis.

Any feedback on this?
Your sample has two typographical errors and should in fact be 010202010604010300, I believe. I'm a bit confused why the parser would need an intermediate "tokenization" of that form, though; the main difference seems to be that it simply marks off the numerical literals. I'm trying to think if that is a good approach or not; I'll have to keep thinking.
It is faster when you have more complex expressions. You'd have to convert it back and forth from ASCII to actual numbers through the whole execution of the algorithm.
SirCmpwn wrote:
It is faster when you have more complex expressions. You'd have to convert it back and forth from ASCII to actual numbers through the whole execution of the algorithm.
Ah, that's a good point. I think that assuming 8-bit signed numbers only is unrealistic, but I get that that's just a placeholder for now. Aight, I could see the benefit, then, definitely.
Yeah, 8 bit was just so I didn't have to type out a ton of hex, and to simplify the example. It will be much different eventually.
SirCmpwn wrote:
Yeah, 8 bit was just so I didn't have to type out a ton of hex, and to simplify the example. It will be much different eventually.
What about parentheses and order of operations? Have you thought about that yet? Ideally I would think a tree structure would be necessary based on my compiler-writing experience.
The portion of my post in code tags addresses that. Also keep in mind that this algorithm is implemented in tiDE's assembler and works great.
SirCmpwn wrote:
The portion of my post in code tags addresses that. Also keep in mind that this algorithm is implemented in tiDE's assembler and works great.
It seems a tad vague to me:
Code:
Find the next operator to be executed, based on order of operations.
Scanning linearly left to right? right to left? How do you keep track of how deep you are? For "3+(4+(5+6)+7)+8", you need to start from the inside, for instance.
You would search the tokenized expression for each operator in order, and choose them like that. If you find a parentheses, find its pair and send the contents back through the routine recursively.
Here is an example in C# (this is what tiDE uses) that is very similar to this routine, but it skips the tokenizing step.

Code:
/// <summary>
        /// Parses an expression, performing all required math, and
        /// returns the binary equivalent with the specified number of bits.
        /// This handles references and constants as well.  Use this
        /// method to parse code like this:
        /// ld hl, PlotsScreen - (12 * 16h) + 2 + 'd'
        /// This method adds an error and returns an empty value if neccissary
        /// </summary>
        private string ParseExpression(string Expression, int numBits,
            bool offsetFromCurrent = false, int subtractFromValue = 0, bool negateValue = false, bool LittleEndian = true)
        {
            string value = "";
            // How this is done:
            // First, find all constants and replace them with their decimal equivalents
            // Replace all hex/binary/characters with their decimal equivalents
            // Create a stack of operations, push them to the stack according to order of operations
            // Pop each item from the stack and execute it
            // Check that the expression fits the number of available bits
            // If it doesn't, add an error and return the default value
            // Otherwise, return the result

            // TODO: Handle non-literals

            ushort parsedValue = 0;

            if (UseOrderOfOperations)
            {
                parsedValue = ParseSubexpressionOOO(Expression, offsetFromCurrent);
            }
            else
            {
            }

            if (offsetFromCurrent)
                parsedValue = (ushort)(parsedValue - (ushort)OutputBytes.Count);
            parsedValue -= (ushort)subtractFromValue;
            if (negateValue)
                parsedValue = (ushort)-parsedValue;
            value = Convert.ToString(parsedValue, 2);

            if (numBits != -1)
            {
                if (value.Length > numBits)
                    value = value.Substring(numBits);
                while (value.Length < numBits)
                    value = "0" + value;
                while (value.Length > numBits)
                    value = value.Remove(value.Length - 1);
                if (numBits == 16 && LittleEndian)
                    value = value.Substring(8) + value.Remove(8); // Little endian
            }
            return value;
        }

        private int FindClosingParenthesis(string Value, int Opening)
        {
            int numParenthesis = 0, length = 0;
            for (int i = Opening; i < Value.Length; i++)
            {
                if (Value[i] == '(')
                    numParenthesis++;
                if (Value[i] == ')')
                    numParenthesis--;
                if (numParenthesis == 0)
                    return length;
                length++;
            }
            return -1;
        }

        /// <summary>
        /// Parses a subexpression using order of operations
        /// </summary>
        /// <param name="Expression"></param>
        /// <returns></returns>
        private ushort ParseSubexpressionOOO(string Expression, bool offestFromCurrent = false)
        {
            // Recursively parse parenthesis first
            if (Expression.Contains('('))
            {
                int opening = Expression.IndexOf("(");
                ushort value = ParseSubexpressionOOO(Expression.Substring(opening + 1,
                    FindClosingParenthesis(Expression, opening) - 1));
                Expression = Expression.Remove(opening,
                    FindClosingParenthesis(Expression, opening) + 1);
                Expression = Expression.Insert(opening, value.ToString());
            }
            bool negative = false;
            if (Expression.StartsWith("-"))
            {
                negative = true;
                Expression = Expression.Substring(1);
            }
            while (HasMathSymbols(Expression))
            {
                if (Expression.Contains("*"))
                {
                    string value1 = Expression.Substring(0, Expression.IndexOf("*"));
                    string value2 = Expression.Substring(Expression.IndexOf("*") + 1);
                    int startIndex = 0, length = 0;
                    for (int i = value1.Length - 1; i != -1; i--)
                    {
                        if ("*-+/|^&".Contains(value1[i]))
                        {
                            value1 = value1.Substring(i + 1);
                            startIndex = i + 1;
                            break;
                        }
                    }
                    for (int i = 0; i < value2.Length; i++)
                    {
                        if ("*-+/|^&".Contains(value2[i]))
                        {
                            value2 = value2.Remove(i);
                            length = value1.Length + i + 1;
                            break;
                        }
                    }
                    if (length == 0)
                        length = Expression.Length - startIndex;
                    ushort result = (ushort)(ParseValue(value1, offestFromCurrent) * ParseValue(value2, offestFromCurrent));
                    Expression = Expression.Remove(startIndex, length);
                    Expression = Expression.Insert(startIndex, result.ToString());
                    continue;
                }
                if (Expression.Contains("/"))
                {
                    string value1 = Expression.Substring(0, Expression.IndexOf("/"));
                    string value2 = Expression.Substring(Expression.IndexOf("/") + 1);
                    int startIndex = 0, length = 0;
                    for (int i = value1.Length - 1; i != -1; i--)
                    {
                        if ("*-+/|^&".Contains(value1[i]))
                        {
                            value1 = value1.Substring(i + 1);
                            startIndex = i + 1;
                            break;
                        }
                    }
                    for (int i = 0; i < value2.Length; i++)
                    {
                        if ("*-+/|^&".Contains(value2[i]))
                        {
                            value2 = value2.Remove(i);
                            length = value1.Length + i + 1;
                            break;
                        }
                    }
                    if (length == 0)
                        length = Expression.Length;
                    ushort result = (ushort)(ParseValue(value1, offestFromCurrent) / ParseValue(value2, offestFromCurrent));
                    Expression = Expression.Remove(startIndex, length);
                    Expression = Expression.Insert(startIndex, result.ToString());
                    continue;
                }
                if (Expression.Contains("+"))
                {
                    string value1 = Expression.Substring(0, Expression.IndexOf("+"));
                    string value2 = Expression.Substring(Expression.IndexOf("+") + 1);
                    int startIndex = 0, length = 0;
                    for (int i = value1.Length - 1; i != -1; i--)
                    {
                        if ("*-+/|^&".Contains(value1[i]))
                        {
                            value1 = value1.Substring(i + 1);
                            startIndex = i + 1;
                            break;
                        }
                    }
                    for (int i = 0; i < value2.Length; i++)
                    {
                        if ("*-+/|^&".Contains(value2[i]))
                        {
                            value2 = value2.Remove(i);
                            length = value1.Length + i + 1;
                            break;
                        }
                    }
                    if (length == 0)
                        length = Expression.Length;
                    ushort result = (ushort)(ParseValue(value1, offestFromCurrent) + ParseValue(value2, offestFromCurrent));
                    Expression = Expression.Remove(startIndex, length);
                    Expression = Expression.Insert(startIndex, result.ToString());
                    continue;
                }
                if (Expression.Contains("-"))
                {
                    string value1 = Expression.Substring(0, Expression.IndexOf("-"));
                    string value2 = Expression.Substring(Expression.IndexOf("-") + 1);
                    int startIndex = 0, length = 0;
                    for (int i = value1.Length - 1; i != -1; i--)
                    {
                        if ("*-+/|^&".Contains(value1[i]))
                        {
                            value1 = value1.Substring(i + 1);
                            startIndex = i + 1;
                            break;
                        }
                    }
                    for (int i = 0; i < value2.Length; i++)
                    {
                        if ("*-+/|^&".Contains(value2[i]))
                        {
                            value2 = value2.Remove(i);
                            length = value1.Length + i + 1;
                            break;
                        }
                    }
                    if (length == 0)
                        length = Expression.Length;
                    ushort result = (ushort)(ParseValue(value1, offestFromCurrent) - ParseValue(value2, offestFromCurrent));
                    Expression = Expression.Remove(startIndex, length);
                    Expression = Expression.Insert(startIndex, result.ToString());
                    continue;
                }
            }
            if (negative)
                return (ushort)(-(int)ParseValue(Expression, offestFromCurrent));
            return ParseValue(Expression, offestFromCurrent);
        }

        private bool HasMathSymbols(string Expression)
        {
            bool inString = false;
            bool inChar = false;
            foreach (char c in Expression)
            {
                if (c == '"' && !inString)
                    inChar = !inChar;
                if (c == '\'' && !inChar)
                    inString = !inString;
                if ("+-*/|&^".Contains(c) && !inString && !inChar)
                    return true;
            }
            return false;
        }

        private string GetWithinParenthesis(string Value)
        {
            int startIndex = -1, length = 0;
            int numOpen = 0, i = 0;
            foreach (char c in Value)
            {
                if (c == '(')
                {
                    if (numOpen == 0)
                        startIndex = i;
                    numOpen++;
                }
                if (c == ')')
                {
                    numOpen--;
                    if (numOpen == 0)
                        break;
                }
                if (numOpen != 0)
                    length++;
                i++;
            }
            if (startIndex == -1)
                return "";
            return Value.Substring(startIndex, length);
        }
As you said on IRC, that looks overly complex, but I'm not sure how to simplify it. I see a few repeated sections of code that would be very happy to be pulled out into some separate methods, I think...
Looks simpler than gCAS... Razz I was always wondering this, but how does this method handle nested parentheses? (As-in put them in order-of-ops)
AHelper wrote:
Looks simpler than gCAS... Razz I was always wondering this, but how does this method handle nested parentheses? (As-in put them in order-of-ops)

To handle this kind of situation, I would use recursion.
Take this expression:
2*(4+6-(12*3))-2
This is how my algorithm would handle it (this example is a bit complex):


Code:

1) Search for parenthesis - found at index 2
  a) Search for closing parenthesis, and extract 4+6-(12*3)
  b) Recursively run this routine for 4+6-(12*3)
    1) Search for parenthesis - found at index 4
       a) Search for closing parenthesis and extract 12*3
       b) Recursively run this routine for 12*3
          1) Search for parenthesis - none found
          2) Search for operators - found 12*3 according to Order of Operations
            a) Perform 12*3, replace with 36 in the expression
            b) Loop back to 1
          1) Search for parenthesis, operators, none found - done.  Return.
        c) Take the result, substitute for parenthesis within expression: 4+6-36.  Loop back to 1
      1) Search for parenthesis - none found
      2) Search for operators - found 4+6
          a) Perform 4+6, replace with 10 in expression
          b) Loop back to 1
     1) Search for parenthesis - none found
     2) Search for operators - found 10-36
        a) Perform 10-36, replace with -26 in expression
         b) Loop back to 1
     1) Search for parenthesis, operators, none found - done.  Return
  c) Take the result, subsitute for parenthesis within expression: 2*-26-2
  d) Loop back to 1
1) Look for operators - found 2*-26
   a) Perform 2*-26, replace with -52 in expression
   b) Loop back to 1
1) Look for operators, found -52-2
   a) Perform -52-2, replace with -54 in expression
   b) Loop back to 1
1) Look for operators, none found - done.  Return


-54 is returned.
So your parsing will calculate the answer and replace it in a string, or a stack? Or will it be storing the data differently? (gCAS's main memory issue)
It doesn't use much more memory than is required to store the expression string in the first place. As the expression progresses, it just simplifies the tokenized expression string.
Here's a more simple example, showing the evolution of the expression string as the algorithm progresses:
1) 6+4*7-2
2) [Evaluate 4*7] 6+28-2
3) [Evaluate 6+28] 34-2
4) [Evaluate 34-2] 32
5) Done: Expression is simplified to "32"
I see. That's the main difference between gCAS and this parser, it doesn't handle unknown variables, correct?

Typing in (6*x*12*x)/(2*x) in gCAS will give 36*x, etc.
AHelper wrote:
Typing in (6*x*12*x)/(2*x) in gCAS will give 36*x, etc.

That is true, although with some changes, I think you could make it work. Perhaps make [operator][variable] it's own operator?
Also, this will work if the variable is defined, it's just keeping an expression the same in terms of a variable that would cause difficulty.
:-S It may be a long process to get it to check everything, like doing 100!/99! on a ti84 will give an overflow, and most likely for yours, too, but gCAS would be able to simplify it to 100 without doing any math. It may be a good idea to add in this to your function just to avoid possible overflows
AHelper wrote:
:-S It may be a long process to get it to check everything, like doing 100!/99! on a ti84 will give an overflow, and most likely for yours, too, but gCAS would be able to simplify it to 100 without doing any math. It may be a good idea to add in this to your function just to avoid possible overflows

How would you suggest doing so?
(For me, making a tree will help) For yours, you should treat numbers as variables and try to factor things out. 2/4 isn't what I mean, but typing in (2*2)/(8*2) is more in the scope ->2/8.
AHelper wrote:
(For me, making a tree will help) For yours, you should treat numbers as variables and try to factor things out. 2/4 isn't what I mean, but typing in (2*2)/(8*2) is more in the scope ->2/8.

I'm not entirely sure I follow what you are saying here.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 2
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement