Understanding Unix/Linux Pipes

I only remember a little about my first experiences with Linux. I remember burning some half-a-gig ISO to my flash drive, struggling to figure out how to reach my bios, eventually setting my bios config to boot from my USB, and seeing mountains of text flooding my charcoal screen. I felt like there was this ocean of possibilities at my finger tips that I hadn’t the dexterity to properly wield, but I was going to anyway. Over the beginning years I learned more and more, but for some reason the pipe character, yes the | symbol, seemed too difficult to learn. This seemed difficult mainly due to the pipes used in the massive “one liners” that I saw peppered around the internet,  it was intimidating. Due to this pipe fear I avoided learning the usefulness of piping which severely limited my ability to succeed in the terminal.

Lets fast forward to today. I use pipes constantly, which has greatly extended my capability and effectiveness. I also have way more fun in the terminal now; I may take a little more time writing a massive one liner than  I would have with using multiple, separate, commands, but it’s a way that I hone my skills, keep things interesting, and challenge myself.

Now, with that out of the way, lets give an informal definition to Unix/Linux (Bash) pipes.

A pipe directs the output of one command to the input of another.

I acknowledge that this is a pretty vague summary, but I want to simplify this concept to help offer a solid foundation for others to build on. Let’s take a look at the basic piping structure.

a | b

That’s not so intimidating. Let’s explain this with a really simple example.

So let’s give ourselves an a and a b. One very common command is the cat command. This is used to read text from a file and print it as output to the terminal. So if a file named numbers.txt contained the text..

0000000001 0000000002 0000000003 0000000004
0000000005 0000000006 0000000007 0000000008
0000000009 0000000010 0000000011 0000000012
0000000013 0000000014 0000000015 0000000016
0000000017 0000000018 0000000019 0000000020

.. running..

cat numbers.txt

would print the above file contents as output to the terminal. Ugh, how many lines is that? Frankly it’s much too much to count. Why not take advantage of the wc command which can provide a word count of a file. The normal structure for the command is..

wc numbers.txt

..running that would give us the output..

5 20 220 numbers.txt

Meaning 5 lines, 20 words, 220 characters, and then the input file name. Let’s use cat as our a and wc as our b. So The goal here is to take the output of cat and direct is as input to wc so an input file doesn’t need to be specified in the wc command, but rather a chunk of text is passed that can be processed on the fly.

cat numbers.txt | wc

That above command would return..

5 20 220

.. to the terminal. So rather than having to specify a file, this pipe passes a chunk of text from the first to be used as input in place of a file for wc. That might not seem like it saves you much time but what if you wanted to pass something that wasn’t a file? What if you wanted to pass something else like the output of a command? Let’s replace the a with another command, how about echo? A simple use of this command is..

echo “Hello World”

..which would display..

Hello World

.. as output in the terminal. So now that we have our “chunk of text” from echo let’s plug that into our a.

echo “Hello World” | wc

This example outputs..

  1 2 12

..into the terminal. One weird way that helped me grasp this concept is to mentally replace the | with a as to point the output of a to the input of b, or to think of the current line as a tube flowing from left to right with text. Let’s try some more useful commands. Say we want to print the first row of text in numbers.txt one way would be like so..

awk ‘{print $1}’ numbers.txt

..which outputs..

0000000001
0000000005
0000000009
0000000013
0000000017

Another way to do this with pipes is to cat a file and to awk the desired data out. Here’s one way to do that abvoe command with pipes.

cat numbers.txt | awk ‘{print $1}’

That above command will output the same as the stand-alone awk from before. Yes the pipe calls two separate programs which is more resource intensive than just one called program, but the pipe option is much more scalable. Let’s offer a c to this cat | awk command, hey how about using wc again?

cat numbers.txt | awk ‘{print $1}’ | wc

You may be thinking, “Man you can’t just throw in a c like that!” well let me explain. This stumped me for quite some time when I first saw this, “I’ll stick with my a’s and b’s” but over time I found a simple way to grasp the concept.

a + b * c

Some of you a few years out of math class would need to think about the order of operations for a few seconds. Now think what would occur first, for me, I want to be brutally specific instead of relying on my foggy memory, so I would put parenthesis around what I wanted to be done first just to be sure of my formula.

(a + b) * c

So now we know that a + b will be done first, and that the output of a + b will be multiplied by c. I understand that we’re not here to MATH, let’s replace that non-letter junk with awesome Unix Pipes!

a | b | c

So let’s think about that, the output of a will go into b, then the output of b will go into c. For one last drawn out explanation look at this..

(a | b) | c

.. now this is just a concept, please don’t type that into your terminal. Think of it like a formula. So a longer piped command can be thought of like this..

(((((a | b) | c) | d) | e) | f )| g

.. to help you recognize that how your pipes will word, it will “solve from left to right”. Now let’s use a couple pipes. Say we want to print the contents of numbers.txt to the terminal but we want to remove all the 0s in the print. A common way to do such a task would be to use the sed (stream editor) command like so.

sed ‘s/0//g’ numbers.txt

This would output the below text.

1 2 3 4
5 6 7 8
9 1 11 12
13 14 15 16
17 18 19 2

That same output can be done with pipes like so..

cat numbers.txt | sed ‘s/0//g’

We don’t have to do it that way but, like a language, there are various ways to reach the desired output based on the user and their comfort. I want to use the most pipes as humanly possible right now so I will use the cat way. Now let’s use the same command as before but only print the second word in each line.

cat numbers.txt | sed ‘s/0//g’ | awk ‘{print $2}’

This command prints..

2
6
1
14
18

Now let’s use that same command but only print lines that contain the number 1 in them. The command grep is a solid fit to do this, though it isn’t the only way. So with our current command, let’s add a grep to pull the data that we want.

cat numbers.txt | sed ‘s/0//g’ | awk ‘{print $2}’ | grep ‘1’

What happens here is that the last 3 lines are printed from the above output because they contain the number 1 in them.

1
14
18

Now wouldn’t it be nice to add these all up? You could try by hand but good luck. I’ll solve this enigma using pipes. One command we could use is bc aka Basic Calculator, but that requires input like this..

echo “1 + 2 + 3+4” | bc

..which will print..

10

..to the terminal as output. So how do we get..

1
14
18

..to look like..

1+14+18

..so it can be used with bc? Well the paste command is an option. Paste can merge lines as well as edit them. So let’s merge those three lines and put a plus between each letter.

cat numbers.txt | sed ‘s/0//g’ | awk ‘{print $2}’ | grep ‘1’ | paste -sd’+’

All this paste command does is merge the three input lines (s parameter) and delimit the lines with a + sign (d parameter). So the above command will print..

1+14+18

Now let’s find the sum of these three numbers by using the bc command like I mentioned before. All the bc command will do here is take a string and interpret it for us as a math formula which it will then calculate.

cat numbers.txt | sed ‘s/0//g’ | awk ‘{print $2}’ | grep ‘1’ | paste -sd’+’ | bc

..which will print..

33

..to the terminal as output. For good measure let’s wrap this up by writing the output of this command to a file, there are many ways to do that, but not every way uses pipes. The tee command sure uses pipes though. Tee takes an input and writes the input as an output to the terminal as well as to a file. So this example..

echo ‘books’ | tee backpack.txt

..would print..

books

..to the terminal and would create a file named backpack.txt which has the content books.

Finally we will write the output of our long command to a file just so we don’t forget what that total was of those few summed numbers.

cat numbers.txt | sed ‘s/0//g’ | awk ‘{print $2}’ | grep ‘1’ | paste -sd’+’ | bc | tee add.txt

This will print..

33

..to the terminal and will also write the text 33 to add.txt.

Though that isn’t the most useful command I have ever written it hopefully helps explain the use/usefulness of pipes. Lots of commands that are written with pipes may be poorly written and not resource friendly, but they get the job done. In fact, this other poorly written command can do almost exactly the same thing and it hasn’t a single pipe.

awk ‘{gsub(“0″,””);SUM+=$2; print$2;}END{print SUM}’ numbers.txt > add.txt

The only difference with this command is that no text is displayed on the screen, the output is being redirected to add.txt instead of being redirected AND displayed, which from my knowledge cannot be done without pipes/being some sort of a redirection genius. So why show that the command really doesn’t need pipes? Well, for one it does need a pipe to replicate the behavior of the tee command, and for another, the pipe-less single awk way took over twice as long to write. Writing this command took longer because I wasn’t as familiar with awk and all its functionality. I’m not an awk master. It was a good lesson for myself to use awk in such a way to learn more about it. So why use pipes if one command is really all that is technically needed? Well, like the language metaphor that was used before, it’s easier to stick with what you are comfortable with to get the idea across. Using multiple programs with pipes instead of one long command to get the job done 30 minutes faster can lead quicker project completions and more effectiveness. Passing a..

| grep “word in a line I really need to search for”

.. at the end of a command instead of going back to the drawing board and limiting yourself to single commands can be so much quicker and reaches the same conclusion. In using pipes I also learned many more commands, instead of searching for

“Bash sum numbers in awk?”

..which may severely limit answers to the ones only using awk..

“Bash sum numbers”

..may return 10 times more answers, some using sed, perl, python, awk, bc, etc. The diversity of commands used can aid in learning over time and can speed up current task. The ability to process text in infinite ways can help you see farther into the ocean of Unix/Linux possibilities. The ability to manipulate text and then iterate over it can save hard drives from being cluttered with 100s of files that were only used one time and then forgot about. The ability to string things together can immediately give a user more ways to use the tools in their tool belt so they can get much more done.

On a personal note, I really hope this helped you learn pipes. The second I learned them my ability to do what I wanted skyrocketed, and I hope that you have that experience as well.

Thank you for hearing what I had to say,

Str

One thought on “Understanding Unix/Linux Pipes

Leave a comment