Learn a little jq, awk and sed

Dear new developer,

You are probably going to be dealing with text files sometime during your development career. These could be plain text, csv, or json. They may have data you want to get out, or log files you want to examine. You may be transforming from one format to another.

Now, if this is a regular occurrence, you may want to build a script or a program around this problem (or use a third party service which aggregates everything together). But sometimes these files are one offs. Or you use them once in a blue moon. And it can take a little while to write a script, look at the libraries, and put it all together.

Another alternative is to learn some of the unix tools available on the command line. Here are three that I consider “table stakes”.

awk

This is a multi purpose line processing utility. I often want to grab lines of a log file and figure out what is going on. Here’s a few lines of a log file:

54.147.20.92 - - [26/Jul/2019:20:21:04 -0600] "GET /wordpress HTTP/1.1" 301 241 "-" "Slackbot 1.0 (+https://api.slack.com/robots)"
185.24.234.106 - - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/02 HTTP/1.1" 200 87872 "http://www.mooreds.com" "DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)"
185.24.234.106 - - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/08 HTTP/1.1" 200 81183 "http://www.mooreds.com" "DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)"

If I want to see only the ip addresses (assuming these are all in a file called logs.txt), I’d run something like:

$ awk '{print $1}' logs.txt
54.147.20.92
185.24.234.106
185.24.234.106

There’s lots more, but you can see that you’d be able to slice and dice delimited data pretty easily. Here’s a great article which dives in further.

sed

This is another line utility. You can use it for all kinds of things, but I primarily use it to do search and replace on a file. Suppose you had the same log file, but you wanted to anonymize the the ip address and the user agent. Perhaps you’re going to ship them off for long term storage or something. You can easily remove this with a couple of sed commands.

$ sed 's/^[^ ]*//' logs.txt |sed 's/"[^"]*"$//'
- - [26/Jul/2019:20:21:04 -0600] "GET /wordpress HTTP/1.1" 301 241 "-"
- - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/02 HTTP/1.1" 200 87872 "http://www.mooreds.com"
- - [26/Jul/2019:20:20:50 -0600] "GET /wordpress/archives/date/2004/08 HTTP/1.1" 200 81183 "http://www.mooreds.com"

Yes, it looks like line noise, but this is the power of regular expressions. They’re in every language (though with slight variations) and worth learning. sed gives you the power of regular expressions at the command line for processing files. I don’t have a great sed tutorial I’ve found, but googling shows a number.

jq

If you work on the command line with modern software at all, you have encountered json. It’s used for configuration files and data transmission. Sometimes you get an array of json and you just want to pick out certain attributes of it. Tools like sed and awk fail at this, because they are used to newlines separating records, not curly braces and commas. Sure, you could use regular expressions to parse simple json, and there are times when I’ve done this. But a far better tool is jq. I’m not as savvy with this as with the others, but have used it whenever I’m dealing with an API that delivers json (which is most modern ones). I can pull the API down with curl (another great tool) and parse it out with jq. I can put these all in a script and have the exploration be repeatable.

I did this a few months  ago when I was doing some exploration of an elastic search system. I crafted the queries with curl and then used jq to parse out the results so that I could make some sense of this. Yes, I could have done this with a real programming language, but it would have taken longer. I could also have used a gui tool like postman, but then it would not have been replicable.

sed and awk should be on every system you run across; jq is non standard, but easy to install. It’s worth spending some time getting to know these tools. So next time you are processing a text file and need to extract just a bit of it, reach for sed and awk. Next time you get a hairy json file and you are peering at it, look at jq. I think you’ll be happy with the result.

Sincerely,

Dan