Learn a little jq, awk and sed

Dear new developer,

You are probably going to be dealing with text files sometime during your development career. These could be plain text, csv, or json. They may have data you want to get out, or log files you want to examine. You may be transforming from one format to another.

Now, if this is a regular occurrence, you may want to build a script or a program around this problem (or use a third party service which aggregates everything together). But sometimes these files are one offs. Or you use them once in a blue moon. And it can take a little while to write a script, look at the libraries, and put it all together.

Another alternative is to learn some of the unix tools available on the command line. Here are three that I consider “table stakes”.

awk

This is a multi purpose line processing utility. I often want to grab lines of a log file and figure out what is going on. Here’s a few lines of a log file:

54.147.20.92 - - [26/Jul/2019:20:21:04 -0600] "GET /wordpress HTTP/1.1" 301 241 "-" "Slackbot 1.0 (+https://api.slack.com/robots)"
185.24.234.106 - - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/02 HTTP/1.1" 200 87872 "http://www.mooreds.com" "DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)"
185.24.234.106 - - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/08 HTTP/1.1" 200 81183 "http://www.mooreds.com" "DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)"

If I want to see only the ip addresses (assuming these are all in a file called logs.txt), I’d run something like:

$ awk '{print $1}' logs.txt
54.147.20.92
185.24.234.106
185.24.234.106

There’s lots more, but you can see that you’d be able to slice and dice delimited data pretty easily. Here’s a great article which dives in further.

sed

This is another line utility. You can use it for all kinds of things, but I primarily use it to do search and replace on a file. Suppose you had the same log file, but you wanted to anonymize the the ip address and the user agent. Perhaps you’re going to ship them off for long term storage or something. You can easily remove this with a couple of sed commands.

$ sed 's/^[^ ]*//' logs.txt |sed 's/"[^"]*"$//'
- - [26/Jul/2019:20:21:04 -0600] "GET /wordpress HTTP/1.1" 301 241 "-"
- - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/02 HTTP/1.1" 200 87872 "http://www.mooreds.com"
- - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/08 HTTP/1.1" 200 81183 "http://www.mooreds.com"

Yes, it looks like line noise, but this is the power of regular expressions. They’re in every language (though with slight variations) and worth learning. sed gives you the power of regular expressions at the command line for processing files. I don’t have a great sed tutorial I’ve found, but googling shows a number.

jq

If you work on the command line with modern software at all, you have encountered json. It’s used for configuration files and data transmission. Sometimes you get an array of json and you just want to pick out certain attributes of it. Tools like sed and awk fail at this, because they are used to newlines separating records, not curly braces and commas. Sure, you could use regular expressions to parse simple json, and there are times when I’ve done this. But a far better tool is jq. I’m not as savvy with this as with the others, but have used it whenever I’m dealing with an API that delivers json (which is most modern ones). I can pull the API down with curl (another great tool) and parse it out with jq. I can put these all in a script and have the exploration be repeatable.

I did this a few months  ago when I was doing some exploration of an elastic search system. I crafted the queries with curl and then used jq to parse out the results so that I could make some sense of this. Yes, I could have done this with a real programming language, but it would have taken longer. I could also have used a gui tool like postman, but then it would not have been replicable.

sed and awk should be on every system you run across; jq is non standard, but easy to install. It’s worth spending some time getting to know these tools. So next time you are processing a text file and need to extract just a bit of it, reach for sed and awk. Next time you get a hairy json file and you are peering at it, look at jq. I think you’ll be happy with the result.

Sincerely,

Dan

13 thoughts on “Learn a little jq, awk and sed

  1. In a galaxy far far away….

    Larry Wall wondered why he needed to learn 3 pretty bad languages, sh, awk, sed…. and devised perl as the Grand Unifying Language.

    Perl sadly borrowed too much from it’s inspirations, and wasn’t much more readable.

    The Matz came along and resolved to borrow the best from perl and scheme and ….. and make something more powerful than them all, yet more readable.

    It’s called Ruby.

    And yes, you can do everything in Ruby, in one line if you must, that you can do in bash, awk, sed, jq, perl…. in a more powerful and maintainable form.

    All this has been available for decades, why are we (still) bashing (pun intended) our heads against the Lowest Common Denominator?

    Like

    1. Saw this on lobste.rs as well https://lobste.rs/s/obfzfg/learn_little_jq_awk_sed#c_bk0fgs and there’s some good discussion there. I know ruby and perl and think they’re both great languages. Sometimes when I’m just doing some quick logfile or data file investigation or transmogrification, it feels easier to use awk/sed/jq rather than reach for a full featured language. I haven’t tested it, but it feels more efficient to string together pieces of a pipeline than to build a full program. YMMV, of course.

      Like

    1. Love the suggestion! There are definitely times when it makes sense to reach for a more full featured language, and perl is always there. It definitely is a superset of awk and sed out of the box. Haven’t touched perl in a few years, but it looks like json support isn’t available unless you install a module, which is a bit more effort than downloading jq. But perl is definitely an option worth exploring.

      Like

  2. Haven’t seen jq, will check it out, but I find that I’m able to do a lot with python one-liners. I took the example in jq’s manpage and did it in python:

    python -c ‘import sys,json;f=json.load(open(sys.argv[1]));print sum([x[“price”] for x in f[“objs”]])’ <(echo '{"objs": [{"price": 1},{"price":3}]}')

    Certainly more verbose than jq, but portable.

    Like

    1. That’s a great point. Python (or perl or ruby) is going to be present on almost every system. These full fledged scripting solution are, as you say, a bit more verbose. They’re also a bit heavier, but more powerful. I recommend jq because I want to avoid someone having to dive into the deep end of the python (or perl or ruby) standard library when all they want to do is extract some JSON.

      Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.