This is just a little script I wrote to renumber Khavi's library, because the labels change almost every time I make a major change to it. Basically, you use the standard Awk/gawk invocation to run it (awk -f awk_script input_file > output_file).

However, it doesn't renumber all of the labels in a program. Everything to be renumbered must be after a line containing "StartAwk" and before a line containing "EndAwk". The tags can be used multiple times in a file without problems.

All lines to be renumbered must be matched by the regex "[0-9]+:". Here's an example file:




Code:
1+1
2:
StartAwk
libcall_1:
3:

EndAwk
libcall_3:


The output file will look like this:


Code:
1+1
2:
StartAwk
libcall_2:
4:

EndAwk
libcall_3:




Hope it's useful to someone.

awk_script
Very nice, thanks for sharing! I didn't know that anyone actually wrote Awk scripts anymore other than short snippets to split lines. Smile
Awk is awesome. I use it in the Punix build system to generate source code from a master source file. I guess you could call it a short snippet at only 80 lines long, for large values of "short".

Heck, a couple years ago I wrote a small awk script that was also a valid C program (which also produced the same output when run as either language). It's nice that C and awk have nearly-identical printf() functions. Smile
Very impressive; I didn't realize they were quite that close. Sounds like I need to look more closely at some of the things that awk can do besides splitting things one of these days, then.
christop wrote:
Heck, a couple years ago I wrote a small awk script that was also a valid C program (which also produced the same output when run as either language). It's nice that C and awk have nearly-identical printf() functions. Smile


Only if you really exploited that # is a comment in AWK to wield some crazy macro magic in C.

KermMartian wrote:
Very impressive; I didn't realize they were quite that close. Sounds like I need to look more closely at some of the things that awk can do besides splitting things one of these days, then.


Not really. If you find yourself about to actually write an AWK script, just switch to Python at that point. AWK is really just good for the splitting stuff that everyone uses it for.
Last time I did something with awk I later realized I could have done it with a few "cut -d # -f #" but that could just be that I failed at awk.
Yes, I did use # as a comment in awk and a preprocessor line in C. I only had to use a few simple #define's take make my polyglot program work. Many other things have the same syntax in both languages, such as the control structures and printf(). Here's a basic C/awk program that shows how easily it can be done:
Code:
#include <stdio.h>
#include <stdlib.h>
#define BEGIN int main()

BEGIN {
        printf("Hello world!\n");
        exit(0);
}


For practical use, awk is good for more than just splitting lines. I generally use sed instead for splitting lines. Here's the source code generator script that I mentioned:

Code:
#! /usr/bin/awk -f
# num  name  words  flags
# Example:
# 3    read  5      0

BEGIN {
        maxnum = 0
}

/^[^#]/ && ! /^$/ {
        num=$1;
        name=$2;
        words=$3;
        flags=$4;
        if (NF < 4) {
                printf("Warning: line %d has too few fields; skipping.\n", NR) >"/dev/stderr";
                next
        } else if (NF > 4) {
                printf("Warning: line %d has too many fields; skipping.\n", NR) >"/dev/stderr";
                next
        }
        if (calls[num, "name"] != "") {
                printf("syscall %d (%s) is redefined! (Redefinition at line %d)\n", num, name, NR) > "/dev/stderr";
                err = 1
                exit 1
        }
        calls[num, "name"] = name;
        calls[num, "words"] = words;
        calls[num, "flags"] = flags;
        if (num > maxnum) { maxnum = num; }
}

END {
        if (err) exit;

        # preamble
        print("/*\n * NOTICE: This file is auto-generated.\n * DO NOT MODIFY THIS FILE!\n */\n");
        print("#include \"sysent.h\"");
        print("#include \"punix.h\"");
        print("");

        # prototypes for all syscalls
        printf("void sys_NONE();\n");
        for (i = 0; i <= maxnum; ++i) {
                if (calls[i, "name"] == "") {
                        calls[i, "name"] = "NONE";
                        calls[i, "words"] = "0";
                        calls[i, "flags"] = "0";
                } else {
                        printf("void sys_%s();\n", calls[i, "name"]);
                }
        }
        print("");

        # sysent[] array
        print("STARTUP(const struct sysent sysent[]) = {");
        for (i = 0; i <= maxnum; ++i) {
                printf("\t{ %d, sys_%s, %s },", calls[i, "words"], calls[i, "name"], calls[i, "flags"]);
                if ((i % 5) == 0) {
                        printf("\t/* %d */", i);
                }
                print "";
        }
        print("};");
        print("");
        print("const int nsysent = sizeof(sysent) / sizeof(struct sysent);");
        print("");

        # sysname[] array
        print("STARTUP(const char *const sysname[]) = {");
        for (i = 0; i <= maxnum; ++i) {
                printf("\t\"%s\",", calls[i, "name"]);
                if ((i % 5) == 0) {
                        printf("\t/* %d */", i);
                }
                print "";
        }
        print("};");
}

It takes lines containing the system call number, name, number of argument words, and additional flags, and it outputs an expanded (non-sparse) C array containing the same information. For example, if you give it this input:

Code:
1 exit 1 0
2 fork 0 0
3 read 5 0
4 write 5 0
5 open 4 0
6 close 1 0
#9 link 4 0
#10 unlink 2 0
12 chdir 2 0

you'll get this output:

Code:
/*
 * NOTICE: This file is auto-generated.
 * DO NOT MODIFY THIS FILE!
 */

#include "sysent.h"
#include "punix.h"

void sys_NONE();
void sys_exit();
void sys_fork();
void sys_read();
void sys_write();
void sys_open();
void sys_close();
void sys_chdir();

STARTUP(const struct sysent sysent[]) = {
        { 0, sys_NONE, 0 },     /* 0 */
        { 1, sys_exit, 0 },
        { 0, sys_fork, 0 },
        { 5, sys_read, 0 },
        { 5, sys_write, 0 },
        { 4, sys_open, 0 },     /* 5 */
        { 1, sys_close, 0 },
        { 0, sys_NONE, 0 },
        { 0, sys_NONE, 0 },
        { 0, sys_NONE, 0 },
        { 0, sys_NONE, 0 },     /* 10 */
        { 0, sys_NONE, 0 },
        { 2, sys_chdir, 0 },
};

const int nsysent = sizeof(sysent) / sizeof(struct sysent);

STARTUP(const char *const sysname[]) = {
        "NONE", /* 0 */
        "exit",
        "fork",
        "read",
        "write",
        "open", /* 5 */
        "close",
        "NONE",
        "NONE",
        "NONE",
        "NONE", /* 10 */
        "NONE",
        "chdir",
};

I used awk's associative arrays to do this. Its automatic line-reading loop, pattern matching, and word splitting help too. About one third of the script's lines are for printing source code or errors/warnings, so I'd be surprised if it could be made much shorter in another language like Python.
Python:


Code:
import os,sys
from functools import *

def main():
    inf = sys.stdin
    if len(sys.argv) == 2:
        inf = open(sys.argv[1])
    lines = inf.read().splitlines()
    d = {int(l.split()[0]): mval(l) for l in lines if not l.startswith('#')}
    r = range(0, reduce(lambda x,y: max(x,y), d.keys()) + 1)
    arr = [d.get(i, DEF_ENTRY) for i in r]
    print('''
/*
 * NOTICE: This file is auto-generated.
 * DO NOT MODIFY THIS FILE!
 */

#include "sysent.h"
#include "punix.h"
''')
    floop(F_DECL, set(arr))
    print('\nSTARTUP(const struct sysent sysent[]) = {')
    floop(F_STRUCT, arr)
    print('''};

const int nsysend = sizeof(sysent) / sizeof(struct sysent);

STARTUP(const char *cnost sysname[]) = {''')
    floop(F_NAME, arr)
    print('};')

def mval(l):
    s = l.split()
    return s[1], int(s[2]), int(s[3])
   
def floop(sf, lines):
    for l in lines:
        print(sf.format(*l))

DEF_ENTRY = ('NONE', 0, 0)

F_STRUCT = '\t{{ {1}, sys_{0}, {2} }},'
F_DECL = "void sys_{0}();"
F_NAME = '\t"{0}",'

if __name__ == '__main__':
    main()


Shorter, sexier, cleaner, and just all around better.

More importantly, though, doing it in Python means you don't need to know awk.
Most importantly, my awk script works, and your Python script doesn't:
Code:
Traceback (most recent call last):
  File "mksysent.py", line 47, in <module>
    main()
  File "mksysent.py", line 9, in main
    d = {int(l.split()[0]): mval(l) for l in lines if not l.startswith('#')}
  File "mksysent.py", line 9, in <dictcomp>
    d = {int(l.split()[0]): mval(l) for l in lines if not l.startswith('#')}
  File "mksysent.py", line 34, in mval
    return s[1], int(s[2]), int(s[3])
IndexError: list index out of range

Yours doesn't check for and report lines with too few or too many fields, or error out if any index is duplicated, which mine does. I suspect that's why your script bombs on my real input file. It also doesn't add the index comments every 5 lines in the arrays, which admittedly is not really needed in generated output anyway. Both of these contribute to the line count of my script and would increase your script's size if you also included them. I also could've reduced the line count in my script by defining functions for repeated tasks like you did in yours, but it wasn't a high priority for me to reduce or minimize the line count in a script that's already pretty simple and short as it is.

Awk is old and showing its age somewhat, but it's still suited for tasks like this.
Kllrnohj wrote:


Not really. If you find yourself about to actually write an AWK script, just switch to Python at that point. AWK is really just good for the splitting stuff that everyone uses it for.


Eh, not a big fan of python simply because it's rather obsessive compulsive about formatting and "pythonic" code. It's a great language to read though Razz
christop wrote:
Yours doesn't check for and report lines with too few or too many fields


Fine, here (note, same line count :p)


Code:
import os,sys
from functools import *

def main():
    inf = sys.stdin
    if len(sys.argv) == 2:
        inf = open(sys.argv[1])
    lines = inf.read().splitlines()
    d = {int(l.split()[0]): mval(l) for l in lines if not l.startswith('#') and len(l.split()) == 4}
    r = range(0, reduce(lambda x,y: max(x,y), d.keys()) + 1)
    arr = [d.get(i, DEF_ENTRY) for i in r]
    print('''
/*
 * NOTICE: This file is auto-generated.
 * DO NOT MODIFY THIS FILE!
 */

#include "sysent.h"
#include "punix.h"
''')
    floop(F_DECL, set(arr))
    print('\nSTARTUP(const struct sysent sysent[]) = {')
    floop(F_STRUCT, arr)
    print('''};

const int nsysend = sizeof(sysent) / sizeof(struct sysent);

STARTUP(const char *cnost sysname[]) = {''')
    floop(F_NAME, arr)
    print('};')

def mval(l):
    s = l.split()
    return s[1], int(s[2]), int(s[3])
   
def floop(sf, lines):
    for l in lines:
        print(sf.format(*l))

DEF_ENTRY = ('NONE', 0, 0)

F_STRUCT = '\t{{ {1}, sys_{0}, {2} }},'
F_DECL = "void sys_{0}();"
F_NAME = '\t"{0}",'

if __name__ == '__main__':
    main()


Quote:
or error out if any index is duplicated, which mine does.


Wasn't in the requirements Razz (sounds like mine has an extra feature Wink )

What actually happens with my program is last index wins, eg:


Code:
1 foo 0 0
1 bar 0 0


results in:


Code:
/*
 * NOTICE: This file is auto-generated.
 * DO NOT MODIFY THIS FILE!
 */

#include "sysent.h"
#include "punix.h"

void sys_NONE();
void sys_bar();

STARTUP(const struct sysent sysent[]) = {
        { 0, sys_NONE, 0 },
        { 0, sys_bar, 0 },
};

const int nsysend = sizeof(sysent) / sizeof(struct sysent);

STARTUP(const char *cnost sysname[]) = {
        "NONE",
        "bar",
};


Quote:
I suspect that's why your script bombs on my real input file.


I only tested on the snippet you posted, and didn't spend all that much time on it - which is more likely why it bombs on your "real input"

Quote:
It also doesn't add the index comments every 5 lines in the arrays, which admittedly is not really needed in generated output anyway.


I know, I thought it was stupid so I didn't do it. That's trivially added tbh.

Quote:
Both of these contribute to the line count of my script and would increase your script's size if you also included them. I also could've reduced the line count in my script by defining functions for repeated tasks like you did in yours, but it wasn't a high priority for me to reduce or minimize the line count in a script that's already pretty simple and short as it is.

Awk is old and showing its age somewhat, but it's still suited for tasks like this.


Sure, but you're kind of missing the point. Once you actually have an awk script and not just something in line, you should switch to a better scripting language. Then you don't need to bother remembering all the magic awk variables and crap, and more people will be able to understand it and help. Moreover, the ability to add features increases greatly as you actually have a solid base to work from.

Qwerty.55 wrote:
Eh, not a big fan of python simply because it's rather obsessive compulsive about formatting and "pythonic" code. It's a great language to read though Razz


If you aren't already doing what python forces you to do, you are a terrible coder. No excuse for not indenting your code.
Kllrnohj wrote:
Quote:
or error out if any index is duplicated, which mine does.


Wasn't in the requirements Razz (sounds like mine has an extra feature Wink )

Nope, it's not an extra feature. I put it in my script since duplicate system call numbers is an error and should stop the 'make' command. I didn't list the full requirements (though I did list the source) because I wasn't expecting such fierce one-upmanship in this thread.

Quote:
Quote:
Both of these contribute to the line count of my script and would increase your script's size if you also included them. I also could've reduced the line count in my script by defining functions for repeated tasks like you did in yours, but it wasn't a high priority for me to reduce or minimize the line count in a script that's already pretty simple and short as it is.

Awk is old and showing its age somewhat, but it's still suited for tasks like this.


Sure, but you're kind of missing the point. Once you actually have an awk script and not just something in line, you should switch to a better scripting language. Then you don't need to bother remembering all the magic awk variables and crap, and more people will be able to understand it and help.


The only "magic" awk variables that I had to use are NR (number of records) and NF (number of fields), and those are used only for reporting error conditions (which yours doesn't even do). The BEGIN and END are analogous to main in C, being the entry points of the program. The variables $1, $2, etc, are like positional parameters in the shell. Nothing in the script is radically different from many other scripting languages, and it's quite easily understandable.

On the other hand, your script uses the "magic" Python variable "__name__", and it also uses Perl-esque "compound" statements (for lack of a better name) which reduces both line count and readability, maintainability, and all that good stuff.

Quote:
Moreover, the ability to add features increases greatly as you actually have a solid base to work from.

More features? I don't need more features for this script. It does its job and doesn't need to do anything else (besides possibly add/removing fields as the system evolves). Do one thing and do it well, you know?

For some other tasks I might use Python or Bash or Perl or C, but like I said, this type of task is suitable for awk.

By the way, if I strip out comments and extra lines to match your script's style (but leaving in the warning/error messages, which I think is a good thing), mine comes out to 51 lines, only 4 lines longer than yours.

This is all becoming a pissing match, really.
christop wrote:
The only "magic" awk variables that I had to use are NR (number of records) and NF (number of fields), and those are used only for reporting error conditions (which yours doesn't even do). The BEGIN and END are analogous to main in C, being the entry points of the program. The variables $1, $2, etc, are like positional parameters in the shell. Nothing in the script is radically different from many other scripting languages, and it's quite easily understandable.


Code:
/^[^#]/ && ! /^$/ {


That snippet right there proves you wrong. That's perl level ugly.

Quote:
On the other hand, your script uses the "magic" Python variable "__name__",


Purely for convention, it's entirely unnecessary.

Quote:
and it also uses Perl-esque "compound" statements (for lack of a better name) which reduces both line count and readability, maintainability, and all that good stuff.


I disagree. It's quite readable, very maintainable, and downright sexy. It does reduce line count.

Frankly I was actually having too much fun with those, which is why I kept using them.

Quote:
By the way, if I strip out comments and extra lines to match your script's style (but leaving in the warning/error messages, which I think is a good thing), mine comes out to 51 lines, only 4 lines longer than yours.

This is all becoming a pissing match, really.


A pissing match I'm winning, I'll point out. I didn't exactly strip mine down, either. I have unnecessary line breaks, the unnecessary __name__ == '__main__' convention, etc... Pretty sure I could make it smaller if you *really* want.

It's also off point. All you've proven is that someone actually has written an awk script, not that there is any REASON to write an awk script. Clearly it's no shorter, simpler, or readable.
Kllrnohj wrote:

Code:
/^[^#]/ && ! /^$/ {


That snippet right there proves you wrong. That's perl level ugly.


Those are simple and succinct regular expressions, which predate Perl by many years. Any self-respecting programmer should be able to recognize and decode basic regular expressions like this one.

I actually could have left off the second regular expression because it never matches when the first one does, so it could've been shorter like this:

Code:
/^[^#]/ {

Easy!

Quote:
A pissing match I'm winning, I'll point out.

Hahaha, ok. So you're winning a pissing match. What will be your next empty victory? Spitting the farthest?

You're only "winning" on the line count front. Readability and sexiness are more subjective. If I were trying to compete on line count alone, while not sacrificing readability, I'd also remove the warning/error messages in mine. That would bring it down to 38 lines, or 9 lines shorter than yours. If I also removed the hash-bang line (which isn't strictly necessary, but I would have to invoke the script with the awk command) and blank line below it, it would be 11 lines shorter than yours. But warning and error messages are a Good Thing™, and so is the hash-bang line, so I'm going to leave them in there.
Kllrnohj wrote:

Qwerty.55 wrote:
Eh, not a big fan of python simply because it's rather obsessive compulsive about formatting and "pythonic" code. It's a great language to read though Razz


If you aren't already doing what python forces you to do, you are a terrible coder. No excuse for not indenting your code.


Who says I don't? That doesn't mean I like being forced to do it.
STFU.
  
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement