Learn a little jq, awk and sed

Dear new developer,

You are probably going to be dealing with text files sometime during your development career. These could be plain text, csv, or json. They may have data you want to get out, or log files you want to examine. You may be transforming from one format to another.

Now, if this is a regular occurrence, you may want to build a script or a program around this problem (or use a third party service which aggregates everything together). But sometimes these files are one offs. Or you use them once in a blue moon. And it can take a little while to write a script, look at the libraries, and put it all together.

Another alternative is to learn some of the unix tools available on the command line. Here are three that I consider “table stakes”.

awk

This is a multi purpose line processing utility. I often want to grab lines of a log file and figure out what is going on. Here’s a few lines of a log file:

54.147.20.92 - - [26/Jul/2019:20:21:04 -0600] "GET /wordpress HTTP/1.1" 301 241 "-" "Slackbot 1.0 (+https://api.slack.com/robots)"
185.24.234.106 - - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/02 HTTP/1.1" 200 87872 "http://www.mooreds.com" "DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)"
185.24.234.106 - - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/08 HTTP/1.1" 200 81183 "http://www.mooreds.com" "DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)"

If I want to see only the ip addresses (assuming these are all in a file called logs.txt), I’d run something like:

$ awk '{print $1}' logs.txt
54.147.20.92
185.24.234.106
185.24.234.106

There’s lots more, but you can see that you’d be able to slice and dice delimited data pretty easily. Here’s a great article which dives in further.

sed

This is another line utility. You can use it for all kinds of things, but I primarily use it to do search and replace on a file. Suppose you had the same log file, but you wanted to anonymize the the ip address and the user agent. Perhaps you’re going to ship them off for long term storage or something. You can easily remove this with a couple of sed commands.

$ sed 's/^[^ ]*//' logs.txt |sed 's/"[^"]*"$//'
- - [26/Jul/2019:20:21:04 -0600] "GET /wordpress HTTP/1.1" 301 241 "-"
- - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/02 HTTP/1.1" 200 87872 "http://www.mooreds.com"
- - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/08 HTTP/1.1" 200 81183 "http://www.mooreds.com"

Yes, it looks like line noise, but this is the power of regular expressions. They’re in every language (though with slight variations) and worth learning. sed gives you the power of regular expressions at the command line for processing files. I don’t have a great sed tutorial I’ve found, but googling shows a number.

jq

If you work on the command line with modern software at all, you have encountered json. It’s used for configuration files and data transmission. Sometimes you get an array of json and you just want to pick out certain attributes of it. Tools like sed and awk fail at this, because they are used to newlines separating records, not curly braces and commas. Sure, you could use regular expressions to parse simple json, and there are times when I’ve done this. But a far better tool is jq. I’m not as savvy with this as with the others, but have used it whenever I’m dealing with an API that delivers json (which is most modern ones). I can pull the API down with curl (another great tool) and parse it out with jq. I can put these all in a script and have the exploration be repeatable.

I did this a few months  ago when I was doing some exploration of an elastic search system. I crafted the queries with curl and then used jq to parse out the results so that I could make some sense of this. Yes, I could have done this with a real programming language, but it would have taken longer. I could also have used a gui tool like postman, but then it would not have been replicable.

sed and awk should be on every system you run across; jq is non standard, but easy to install. It’s worth spending some time getting to know these tools. So next time you are processing a text file and need to extract just a bit of it, reach for sed and awk. Next time you get a hairy json file and you are peering at it, look at jq. I think you’ll be happy with the result.

Sincerely,

Dan

9 thoughts on “Learn a little jq, awk and sed

  1. In a galaxy far far away….

    Larry Wall wondered why he needed to learn 3 pretty bad languages, sh, awk, sed…. and devised perl as the Grand Unifying Language.

    Perl sadly borrowed too much from it’s inspirations, and wasn’t much more readable.

    The Matz came along and resolved to borrow the best from perl and scheme and ….. and make something more powerful than them all, yet more readable.

    It’s called Ruby.

    And yes, you can do everything in Ruby, in one line if you must, that you can do in bash, awk, sed, jq, perl…. in a more powerful and maintainable form.

    All this has been available for decades, why are we (still) bashing (pun intended) our heads against the Lowest Common Denominator?

    Like

    1. Saw this on lobste.rs as well https://lobste.rs/s/obfzfg/learn_little_jq_awk_sed#c_bk0fgs and there’s some good discussion there. I know ruby and perl and think they’re both great languages. Sometimes when I’m just doing some quick logfile or data file investigation or transmogrification, it feels easier to use awk/sed/jq rather than reach for a full featured language. I haven’t tested it, but it feels more efficient to string together pieces of a pipeline than to build a full program. YMMV, of course.

      Like

    1. Love the suggestion! There are definitely times when it makes sense to reach for a more full featured language, and perl is always there. It definitely is a superset of awk and sed out of the box. Haven’t touched perl in a few years, but it looks like json support isn’t available unless you install a module, which is a bit more effort than downloading jq. But perl is definitely an option worth exploring.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s