Unix 101
Posted on October 30, 2007
Filed Under Mac, Unix
In the old days, when people walked to school uphill both ways in the snow, the only way into your computer was by using a Command Line Interface, otherwise known as a console. Ah, just you and a blinking cursor. No wallpaper, no stupid, cryptic icons… that’s how computers were meant to be used!
The command line is still around, even today.
On a Mac running OS X, you can boot into “single-user” mode to do emergency repairs, by holding down the Command and S keys as it boots. Of course, that won’t help you much unless you know a little bit about Unix.
SSH is another example — it gives you secure (command-line only) access to a remote machine. It’s the system administrator’s best friend.
Unix (or Linux) is very powerful, and very flexible. The best book of tips and tricks I’ve seen to date is the size of a phone book, so obviously we won’t be getting into that kind of detail… but we can provide a little framework so that your other reference material will make sense to you, perhaps a little sooner than it would have otherwise.
“So why do I have to log in, anyway?”
The concept of “logging in” should, I hope, make some sense; you have to give your “user name” and a password to connect to your account. That brings up the first and most important difference between Unix and the “PC” operating system you may have used:
- UNIX IS A MULTI-USER SYSTEM
This has deeper significance than merely being asked to log in. Windows 98 might ask you log in, but it doesn’t really mean it. (If you don’t believe me, next time you start up Win98 press the “Esc” key when it asks for your password.) Windows NT and 2000 try to be much more like Unix — if you don’t log in, you don’t use the machine.
But why do you have to log in? Simply put, it’s to protect you from the consequences of your own mistakes. In DOS you are the only user. You can say a file is “read-only,” but you can always change your mind and undo that protection. (And so can any random virus…) In Unix. there is the concept of an administrator, the so-called “superuser” or “root,” who can protect you from shooting yourself in the foot. Root decides WHO can read or write any file. The system needs to know who you are, so it knows whether or not you are “root”.
Here’s another difference that isn’t so surprising, but takes a little getting used to:
- UNIX IS CASE-SENSITIVE
Kevin is not the same as KEVIN or kevin or kEvin… get the picture? Maybe it took you a day or two to get used to the idea that your computer (actually the MS-DOS inside it) treats them all the same way. To me, it seemed natural that Kevin and KEVIN would be two different words. Well, in Unix, they are different.
Commands
It’s called a Command Line Interface because you type in commands. Learning what those commands are takes a bit of time but as you perform tasks, they quickly become quite natural.
For instance, to display a text file, we would use the “cat” command:
$ cat file
“cat” is the command; “file” is its parameter, the thing to which we want to apply the command. “cat” got its name because it can be used to add one thing to another, or concatentate them, but as a side effect, it can also display the content of files onto your console.
Before we introduce a long list of commands, though, we need to understand a bit more about where we will be using these commands, and what we will be using them on.
Unix as a tough nut
Unix, or any operating system, is a program that makes a computer work. You might say it gives the computer its personality. Specifically, it is in charge of starting and stopping other programs, and providing access to the resources of the machine such as the disk storage. It makes it possible for you (a human) to use the resources of the computer (a machine).
Unix is divided into two major pieces, called the kernel and the shell. The kernel is the machine-oriented part of the operating system; the shell is the part that you deal with as a user. You give commands to a shell, and it talks to the kernel to make things happen. You may have more than one shell, which will have slightly different “personalities.”
Shell scripts
A “shell script” is a text file that contains Unix commands, the same ones that you would key in at the command line. (If you know what a .BAT file is in DOS, you’ve got the idea.) It is interpreted by a command shell. In the DOS world, the shell is the program COMMAND.COM which is written to every bootable floppy; in XP, you can get this kind of shell by typing “cmd” in the Run box.
If you are a real old-time hacker, you may have encountered 4DOS, a third-party replacement shell for DOS. In the Unix world, you have even more choices, but they fall into two major families: the Bourne shell family and the C shell family. The default shell on most Linux distributions is the Bourne-Again Shell, or bash.
There are enough differences between shells that you need some way to make sure you’re using the one you expect, especially if you are writing a script that may do a lot of important stuff. The way to do that is to start your shell script with a “hash bang” (#!) line like this one (sometimes called a “shebang,” for sharp + bang):
#!/bin/sh
This says that this script is to be run by the Bourne shell that resides in a specific directory, (/bin in this case). This insures the script will run as expected, even if your default shell is the C shell or even something else. We’ll get to directories in just a minute.
Input and output redirection
In Unix, a file and a device are treated the same way — they are both viewed as “a stream of bytes.” Commands will accept input from EACH OTHER as easily as from your keyboard.
In order to get maximum flexibility, Unix commands try very hard to present “standard output” that can be piped into other commands as “standard input.” You usually will not see headings on standard output, or any acknowledgement from a successful command, because both of these would interfere with this “pipeline” concept.
Remember the “cat” command?
$ cat file
You can use “cat” with redirection to append “myfile” to the end of “anotherfile”:
$ cat myfile >> anotherfile
We’ll come back to this example in a moment.
We can create a new text file called myfile by typing it in through the console keyboard:
$ cat > myfile
This is a line of text I want to go into myfile.
^D
Try this for yourself. See how the terminal waits for you to enter text? Of course, you need to be able to tell it when you’re done. The Control-D character (^D) terminates an “input stream.”
If you terminate your input stream at the command prompt, the default behavior of most Unix terminals is to log you off.
WARNING:
Because ‘cat > myfile’ creates a new file, it will replace an existing file with that name. In other words, if you have an existing file named ‘myfile’ it will be destroyed!
To append to a file, you can enter:
$ cat >> myfile
This is a line of text I want to ADD to myfile.
^D
Note the DOUBLE >> redirection symbol. Compare this to the example above, where we appended myfile to anotherfile. In this case, your keyboard entry takes the place of “myfile”.
A file is just a stream of bytes. Unix doesn’t care where the stream of bytes comes from.
We’ve talked about files; where do those files live? On disks, in directories. Writing that stream of bytes to a disk makes it permanent, and giving it a name means we can find it again when we want it.
File and directory functions
A Unix machine is built around a filesystem, which starts at a root from which it grows and branches like a corporate organization chart. Perhaps a family tree is a better example, with children descending from parents. The root is a directory, and directories can hold other directories or files.
/
Animals/
Wild/
Birds/
Rabbits/
Thumper
Jessica
Deer/
Bambi
Elephants/
Jumbo
Dumbo
Farm/
Cows/
Bessie
Elsie
Horses/
Flicka
Black_Beauty
Pets/
Dogs/
Spot
Rover
Cats/
Fluffy
In this example, individual animals are files, and directories are classes or types of animals. Any of the directories (Pets, Dogs, Cats) could be a completely separate physical disk, but mounted and visible within the overall “Animals” filesystem. The “hierarchy” from general to specific provides a logical scheme; this is entirely a made-up example, but it shows the way Unix people think about this sort of thing. The full path to “Fluffy” would be /Animals/Pets/Cats/Fluffy
To add more drives, it’s a matter of creating a “mount point” — which is just a directory — and using the mount command to make the new device part of the existing filesystem tree. That’s getting into advanced territory; your version of Unix may provide “automount” to handle floppies or CD-ROMs automagically, but some old-timers prefer to do a manual mount of their removable media. Why? Because MS-DOS/Windows makes assumptions — perhaps too many — every time you stick a floppy into your floppy drive.
Unix could do that too, but a Unix admin’s gut feeling says that “if you let the computer do things without being told to do them, sometimes it will do the wrong thing. I’d rather have it wait until I tell it to mount that floppy.”
Here are some commands related to filesystems (the $ is the command prompt, and the # marks a comment):
$ pwd # ("Present Working Directory" - where am I?)
/home/kevin # You are here
$ cd # change directory ("go $HOME")
# Examples using filesystem commands:
$ cd / # go to the ROOT directory
$ cd /bin # do this, then do a "pwd" to see
$ pwd # ...our new location:
/bin # we are in "bin" just below root (/).
$ echo $HOME # HOME is defined as the directory you go
/home/kevin # to when you say "cd" with no value.
# $HOME gives us the value of the variable
# named HOME.
$ echo HOME # See the difference between HOME and $HOME?
HOME
$ cd # Unlike DOS, 'cd' with no parameter
$ pwd # ALWAYS takes you "HOME".
/home/kevin
$ cd bin # This time use "bin" with no leading "/":
$ pwd # Where are we now?
/home/kevin/bin
The root of the filesystem tree is identified by the single character “/” which is pronounced “slash.” Do not confuse it with the Microsoft backslash, “\”. Note the difference between an absolute path, starting from the root of the filesystem, and a relative path, which does not start with “/”. The cd command “moves” you from directory to directory, like moving between rooms in a house. A path that starts with a dot (period) is relative to the current directory, so “cd .” does exactly nothing. (”Move me from where I am to, uh, where I am.”) A double-dot (..) refers to the directory above the current one. Thus:
$ pwd /home/kevin/marsupial/wombat $ cd .. $ pwd /home/kevin/marsupial $ cd .. $ pwd /home/kevin
You do NOT have to cd into a directory in order to run commands that refer to it — these two examples do exactly the same thing, but notice the wasted steps in the first one.
# Example one: $ pwd /home/silly $ cd /var/tmp $ ls thisfile thatfile otherfile $ cd $ # Example two: $ pwd /home/silly $ ls /var/tmp thisfile thatfile otherfile $
Another form of redirection: PIPES
The character for “pipe” is a vertical line or “virgule” — on my keyboard, it’s the uppercase of the backslash, “\”. In Unix, it is used to symbolize “piping” data from one place to another. Here is an example:
$ ps -ef | grep kevin
“List every active process on the system in detail, (process status, every, full) and pipe the results into the search utility grep; i.e. tell me what processes if any have the string ‘kevin’ associated with them. Or more simply, ‘find all of kevin’s jobs.’”
Okay, you’re wondering why you have to jump through that hoop just to “show Kevin’s jobs” — right? Why isn’t there a nice command like “SHOW USERS”? The answer is not just because the Unix developers hated to type.
The Unix philosophy is to avoid creating another program when you can combine ones you already have. That’ especially true of overly complex programs that try to be all things to all people. Instead, Unix commands are very small, and very simple, and ideally each one does just one thing, really fast. They are also supposed to be written so that they can be hooked together with the “plumbing” syntax we have just seen.
Every properly-written Unix program has one input and two outputs, called stdin (standard input), stdout (standard output), and stderr (standard error). When you connect two commands with a “pipe” you are connecting the standard output of the first program to the standard input of the second. Why do errors have their own channel? That way they can be sent off to the side and they won’t make a mess of your results.
Note: If you’re following along on a Mac, “ps -ef” won’t work. Mac OS X is based on BSD Unix, which uses the syntax “ps -aux” for the command “process status of All jobs and Users eXecuting.” Linux uses the POSIX syntax, which was based on System V Unix, and its syntax is “ps -ef” to “process status for Everyone in Full.” This is the only difference between the two that consistently trips me up….
More Unix plumbing
Run the command ls -l to get a “long” directory listing. (That’s a lowercase letter L as in long. If you typed the number 1 instead, you would get a one-column listing.)Note that the output displayed by this command does not include a header to tell you what each column means. This seems inconsiderate, but there is a reason for it: Displaying a header would get in the way of reusing that information as input to another command. If you need to know what each column means, you can use the “manual” to find out — by typing man ls
The command “more” passes through whatever it sees on its standard input, but pauses after each “n” lines, where “n” is the height of your display screen:
$ more file
So, if the output from ls -l is too long to fit on one screen, you might want to pipe it through the more command, like so:
ls -l | more
SOME UNIX COMMANDS
Unix What they should have called it
---- ------------------------------------------------
man HELP (displays a section of the 'manual,' as
in "Read The Fine Manual" or RTFM.)
mv f1 f2 RENAME (MoVe the name entry)
rm file DELETE (ReMove)
rm -i file DELETE with prompt to confirm (ReMove -with Inform)
who SHOW USERS (See note *)
ps -e SHOW SYSTEM
xd -c file DUMP and display ASCII (-c=char) values
ls DIRECTORY/SIZE/OWNER (LiSt of files)
ls -l DIR/PROT/SIZE/DATE/OWNER (LiSt -long format)
*Yes, other operating systems do have a “SHOW USERS” command — I got this list from my years on VMS, a multi-user system that gave Unix a pretty good run for its money back in the 1980s.
WE’RE NOT IN REDMOND ANYMORE, TOTO
This brings us to another important difference between Unix and DOS: Unix is a multi-process system. A Unix machine can run many programs at the same time, unlike DOS, which only does one thing at a time well. (You can have “TSR” programs under DOS, but that is not nearly the same thing as having a true multiprocessing operating system.) You can have one job feeding data to another at the same time.
Unix is also a multi-user system. The users on a Unix machine are divided into three groups. YOU, the “owner” of your account; your GROUP, other people who belong with you for some reason; and the rest of the WORLD.
You’ll hear a lot about setting “permissions” on files. What does that mean?
First, the idea of permission is tied up with ownership, and you can’t really describe one without the other. For that reason, you will be using two tools, not one, to control who can see and change and delete your files: chmod (change mode) and chown (change owner).
Everyone on a Unix machine is a member of one or more “groups.” A group is just a way of saying “these people belong together in some way.” By default, Red Hat Linux makes a new group for each user you create, so you start out in a “group of one.” Other systems place all regular users into a group called “users.” The root user can move users into different groups, and yes, there is a way to belong to more than one group at the same time.
Now, let’s think about how this relates to files and directories. What can you do with a file? You can read it, you can write to it, and if it’s a program you can run it. You can also decide whether to allow members of your group or the “world” to do the same. That’s decided by the “mode” of the file, also known as its “permissions.”
The “long” directory listing, ls -l, displays that information in the format: drwxrwxrwx
Whoa! How do you read that? It’s not so bad. It’s always in the same layout, for starters. Also, the position of each letter has a meaning.
A “d” in the first position indicates the file is a directory. The letters that follow indicate WHAT KIND of access has been granted, while the position of each letter indicates WHO has that access.
The positions are “dxxxyyyzzz” where xxx=owner yyy=group zzz=world
The types of access are:
"r" = read "w" = write "x" = execute
So a permission string of -rwxr-x— means it’s an ordinary file (because there is no leading ‘d’); the owner can read, write, or execute it (rwx); members of the same group can read or execute it but not write to it (r-x), and the rest of the world cannot access it at all (—). Those three values for owner, group, and world are actually three bits which can be set ON or OFF, and those values from all OFF to all ON and any value in between can be expressed very nicely as a number between 0 and 7. Watch:
0 0 0 = 0 base 8
0 0 1 = 1 base 8
0 1 0 = 2 base 8
0 1 1 = 3 base 8
1 0 0 = 4 base 8
1 0 1 = 5 base 8
1 1 0 = 6 base 8
1 1 1 = 7 base 8
Oldtimers like to think of the “mode” this way, as a three-digit “Base 8″ or “octal” number. For example, setting the “world” values to rwx can be represented as an octal 7 (binary 111), while setting them to rw- would be a 6 (binary 110). Setting a file’s protection to 666 means anyone can write to it, which fits nicely with the Bibilical mnemonic for “evil! evil! evil!”
The command “chmod 750 filename” would set the file’s protection to rwxr-x— exactly as in our example above. That’s a common protection for directories; the owner has full access, the group can browse, but strangers are locked out.
[For more about binary numbers, check out Introduction to the TCP/IP LAN]
See “man chmod” for more about setting permissions on a file. There is no permission bit to “hide” a file, and to really protect a file, you need to move it to a private directory. The “ls” command won’t list files whose names begin with a dot (for example, “.htaccess”) but that’s mostly to avoid clutter, not actually hide them. Adding the switch -a (for “all”) to the ls command will show your dot files right along with all the regular ones.
The “execute bit” has a slightly different meaning if the file in question is a directory. (Yes, directories are files. In Unix, everything is a file… and a file is just a stream of characters.) The execute bit is used to provide some control over whether a directory is “searchable.” You cannot get a directory listing for a directory if its execute bit is turned off. But if you have read access to a file within that directory, you can still read that file by specifying its exact name.
As a general rule, though, you will want your directories to be “executable.”
This reuse of the execute bit may seem a little odd, but it’s also quite practical. You can’t execute a directory, but it would be wasteful to add an extra bit to every file to serve as a “searchable directory” flag. By taking the context into account (”is this a plain file that might be executable, or is it a directory that we might want to search?”) Unix is able to reuse that bit instead of wasting it. Over a few million files, saving a bit per file adds up! This idea of using an item differently depending on its context is sometimes called “overloading.”
What about deleting files? Delete permission is controlled by the w (write) flag on the directory, not the file — anyone who has write access to the directory can delete any file IN the directory. Think of it this way; the file exists only as long as the directory provides a “link” to it. If you can write into the directory, you can erase or write over that link, and when you do, the file becomes just another block of free space.
Running a script
To run a script, it has to be an executable file (chmod +x scriptname) and it has to be in a directory that is in your $PATH (mv scriptname $HOME/bin) If both of those conditions are met, then simply typing the name of the script at the command prompt will run it.
Let’s look at some more commands…
- cat is like TYPE, but also CREATE or APPEND, when used with output redirection (> and >>).
- more display the input stream one screenfull at a time; press the space bar for next page, ‘q’ to quit; enter ‘/text’ with no quotes to find the string ‘text’
- grep ‘grep pattern file’ is like ‘FIND “pattern” file’ in DOS. grep is short for General Regular Expression Print.
- file “file filename” tells you what kind of file “filename” is. Trying to display a binary file can easily lock up your terminal; use ‘file’ first to find out whether it’s a text file.
- reset If you forget to do “file somefile” and try to ‘cat’ the wrong type of file, your console may become very confused. Why? There are special binary characters that can be used to change your terminal settings. Some of the stuff in a non-text file may have the same effect but in a random, nonsense way (”Start displaying everything backwards using Gujarati characters”). You may be able to recover from that by typing “reset” and hitting enter… if you’re a touch typist, that is. If that doesn’t work, you’ll need to log out and log back in.
COMMAND-LINE SWITCHES
command -switch
command +switch
You’ll note that some of the commands above had “switches” — either a letter that is set off by a hyphen, or a word that is set off by a pair of hyphens. Some commands even have positive (on) and negative (off) switches: … but most just use the hyphen as a delimiter.
Note that Unix command switches do not use /. That is used to separate directories, not to identify the parameters of a command. Unix directories are never, ever separated by a backslash \. The purpose of the backslash is to change the meaning of the character that follows, usually to “quote” it.
grep -v pattern file
In this case, the -v stands for “inVert the test”. This command will find all lines in “file” that DON’T have an instance of “pattern”.
This command will list all the processes that are NOT being run by “root”:
ps -ef | grep -v root
More about switches and getting help
The greatest weakness of Unix, in my opinion, is that there is no single authority or standard to determine what switches will be recognized by any single program. For example, the use of -v with grep (above) is unique to grep. How do we cope with this chaos?
The safest way to find out how to run a command on your machine is to look at the man page for that command (by typing man command ). If you’re not sure what command you want, the command “apropos” or “man -k” will list all the man pages that contain a certain string in their title. man -k search would list all the commands that relate to “search”ing. If there are too many, don’t forget you can pipe the output through more to scroll, or even use “grep -v” to remove unwanted words:
apropos search | grep -v Tcl
A riskier way to get information is to run the command with the switch “–help” If it doesn’t have a built-in help display, the fact that it doesn’t understand “–help” should make it display an “invalid switch” error and a “usage” line… but that won’t always save you from a destructive command like rm. Always have some idea what a command does before you run it!
Finding things and running them
To find executables, Unix uses a PATH, much like MS-DOS. You may not be familiar with this, because MS-DOS (and Microsoft Windows) always included the CURRENT directory by default. This meant you could run any program by “going” to the directory where it was installed. (In fact, this is one reason that the concept of using cd and “going to” a directory matters.)
Unix does not do this, unless you include “current directory” explicitly as the dot symbol (”.”). There are good reasons not to do that, but they all boil down to “Don’t run that! You don’t know where it’s been!” You don’t want to run something that just happens to be lying around in any old directory.
You’ll hear people say “Windows is only exploited by viruses because it’s so popular.” That happens not to be the case; this is just one example of a fundamental difference in approach that makes Unix a tougher nut to crack than Windows.
To see your path, you can type:
$ echo $PATH # Note the lowercase command, UPPERCASE
# variable, and a leading $ to get its
# contents.
To find out: Run this command:
"What's your HOME directory?" $ echo $HOME "What's in your dot-profile?" $ more .profile
Environment Variables
Those UPPERCASE variables with leading $ are called environment variables. By convention, they are written in UPPERCASE. Ones you define yourself will work in lowercase or mixed case, but certain ones used by Unix itself, such as PATH, have to be in uppercase. Following that model for yours is merely a good idea.
In DOS you’d create them with a SET command. In Unix (except in the C-shell) you just assign them with an = and no spaces. It would look something like this (The $ at the start of the line is the command prompt; the lines that have no prompt are the output of the echo command):
$ CRITTER=beast # no $ on the assignment
$ export CRITTER # it doesn't "count" until you export it
$ echo $CRITTER # leading $ means "The value of"
beast
$ echo CRITTER
CRITTER # see the difference?
$ echo $CRITTER # how about now?
beast
At any given time, there are lots of “processes” running on a Unix machine. Every command you run exists as a “child” of your own login process. The export command is important because it makes your environment variables visible to your child processes. The example above would have worked even without doing the export, because we are remaining within a single interactive process… but in almost every other case, you have to use export to get the desired effect. You especially need it if you want to set an environment variable at the command prompt and then use it inside a script.
Directories in the PATH
By convention, system executables reside in a directory called /bin (the directory named bin located immediately under the root directory). /sbin is a very special directory; it holds programs that are critical to the running of the system. Other programs that are more in the nature of “applications” reside in /usr/bin and /usr/local/bin — and if you have private scripts or programs, they could go into a bin directory under your $HOME directory. You would then add $HOME/bin to your PATH by adding a line like this to your .profile file:
PATH=$PATH:$HOME/bin # Keep the current value of PATH and add my
# personal bin directory to it, as the last
# place to look for executables.
“Hidden” files
Files that begin with a period (like .profile) are used by various system utilities. They don’t appear by default on directory lists, mostly for neatness’ sake. You can see them by doing:
ls -a # '-a' stands for 'all'
Aliases
“Unix is user-friendly; it’s just picky about who its friends are.” One way to make Unix more friendly is to make up your own commands by giving a comman line an “alias.” For instance, if you have trouble remembering to type “ls -l” consistently, you can create your own “dir” command like so:
alias dir="ls -l"
This assumes you’re using something with a bit more moxie than the old Bourne shell — you need bash or the Korn shell to support aliases. Please resist the urge to define so many aliases that you’ve created your own language. You’re better off learning the same Unix that (almost) everyone else uses. Dialects or flavors of Unix are more alike than they are different.
In conclusion…
Power delights, and absolute power is absolutely delightful. Enjoy the power that understanding Unix gives you, and please use it only for Good!
Recommended Reading
- Unix for Dummies IDG Books - I tease them, but this is a good book
- Unix Power Tools O’Reilly and Associates / Random House - The “phone book” I mentioned above
- The Unix Development Environment Kernigan & Pike / Prentice Hall - By the guys who invented Unix.
- Unix System Security Rik Farrow/Addison Wesley (ISBN 0-201-57030-0)
The Unix Philosophy
Write programs that do one thing and do it well.
Write programs that work together.
Write programs that handle text streams, because that is a universal interface.
Comments
Leave a Reply
You must be logged in to post a comment.