Wednesday, November 28, 2007

Excursions With Find, Xargs, and Perl

It's a common sysadmin task to want to change permissions on all the files and subdirectories under a top-level directory. You could just use the '-R' switch to chmod, but what if your files and directories need different permissions? One scenario that comes up is with shared directories - you have a directory tree that has to be writable by users in a specific group. To do this you want to set the group ID bit on all the directories, so that files created by an individual user are always writable by the entire group (this is numeric permission mode 2775). We want regular files to just have permissions 664.

Find

So we first need a way to differentiate files and directories - one easy way is with the find command, which as a bonus will also recurse into subdirectories for us. Here's our first crack at a solution - let's assume we have changed to the top-level directory we are interested in already:

find . -type f -exec chmod 664 {} \; find . -type d -exec chmod 2775 {} \;
A word of warning - don't try something like "find . -type f | chmod 664 *" - chmod will ignore its standard input and change the permissions on all the files in the current directory. This is easily fixable by just re-running chmod, but it would be a disaster if you were trying to delete only certain files or directories. Anyway, in the command above, the "-type f" and "-type d" output just files and just directories, respectively. The "-exec" will execute the given command on each file or directory produced by find. The special construct "{}" is a placeholder for the current argument, as output by find. These commands will work, but they are very slow on large directory trees, since the chmod is operating on one file or directory at a time. We could try to improve the speed by feeding the entire output of find to chmod:

chmod 664 $(find . -type f) chmod 2775 $(find . -type d)

Xargs

These last two commands will work fine until we have more than a few dozen files or directories in total - if we do, we'll get the error "/bin/chmod: Argument list too long". That's a cue that we should be using xargs, a very useful command that will submit its input in manageable chunks to the specified command. Here is our next try:

find . -type f | xargs chmod 664 find . -type d | xargs chmod 2775
This is better - the errors about the command line being too long will go away, and this will work, most of the time. But what happens if we have directories or filenames with spaces, quotes or other special characters in them? This comes up quite a bit when you have transferred files from Windows filesystems - the end result will be that xargs will mangle its input, and the command will fail with an error like xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option. The error leads us in the right direction, the solution is to use a couple of options to find and xargs that go together: -print0 and -0.

-print0: Find option that prints the full filename to standard output, terminated by a null character instead of a newline.

-0: Xargs option that says input is terminated by a null character, rather than a newline, and all characters are treated literally (even quotes or backslashes).

Here is our final attempt with find and xargs:

find . -type f -print0 | xargs -0 chmod 664 find . -type d -print0 | xargs -0 chmod 2775
This will work for us all the time, no matter what special characters comprise file or directory names.

Perl

There are some versions of find that don't support the "-print0" switch. On these systems, you may be able to use a Perl solution:

perl -MFile::Find -e 'find sub { -f && chmod 0664, $_; \ -d && chmod 02775, $_ },"."'
The find procedure exported by the the File::Find module takes two arguments - a callback subroutine and a list of directories. It will recursively descend into the list of supplied directories (in this case just the current directory "."), and run the callback subroutine on each file or directory found. The subroutine in this case is an anonymous one, given by "sub { -f && chmod 0664, $_; -d && chmod 02775, $_ }". It first tests whether the current argument is a regular file, if it is it performs the required "chmod 664". It then tests whether the current argument is a directory, and as you might expect, performs the required "chmod 2775". The variable "$_" represents the current argument, in this case whatever the current file or directory name is. Note also that the numeric permissions must always have a leading zero so that the Perl interpreter knows they are octal numbers.

This solution has the advantage of working on any Unix system that has Perl installed, since File::Find is a core Perl module.

I was curious about how fast each solution ran, here are the timings on a directory tree with 9105 files and 370 directories: time find . -type f -exec chmod 664 {} \; real 0m15.687s user 0m5.676s sys 0m9.877s time find . -type f -print0 | xargs -0 chmod 664 real 0m0.132s user 0m0.036s sys 0m0.080s time perl -MFile::Find -e 'find sub { -f && chmod 0664, $_; },"."' real 0m0.151s user 0m0.080s sys 0m0.056s time perl -MFile::Find -e 'find sub { -f && chmod 0664, $_; \ -d && chmod 02775, $_ },"."' real 0m0.160s user 0m0.064s sys 0m0.076s

The Perl solution was surprisingly fast, very much comparable to the xargs solution. When you consider that the last Perl solution timed tests for both files and directories at once, it is faster than running two xargs commands in a row.

Friday, November 23, 2007

Article Roundup

Some humor from WTF-d00d.com - Bourne shell server pages. Classic:

The basic idea behind all server page technologies is this: rather than writing code that generates an HTML document on-the-fly by writing it out as a series of print statements, you start with a "skeleton" HTML document and embed the code right inside it. Voila! Instead of having a tangled, unreadable, unmaintainable mess of HTML embedded in source code, you have a tangled, unreadable, unmaintainable mess of source code embedded in HTML.

Bourne Shell Server Pages are ordinary ASCII text files, with the special extension .shit, which denotes "Shell-Interpreted Template." The result of invoking the page compiler on a .shit file, is, naturally, a shell script.

and yet...the minimalist in me thinks this might be a good idea...

Didier Stevens wanted to see if people would click on an ad that offered to infect them with a virus. Short version, they did.

Mark Pilgrim expresses his frustrations with Amazon's new ebook reader and DRM.

More humor - what if Gmail had been designed by Microsoft?.

Finally, you can run multiple HTTPS sites off of one IP address with OpenSSL and TLS extensions. You can also do this with mod_gnutls.

Saturday, November 17, 2007

Great Firefox Extension - It's All Text!

I just came across a great Firefox extension called "It's All Text!". Any HTML textarea you see while browsing gets a little edit button on the bottom right corner - clicking it launches your favorite editor (the frst time you use it, it brings you to the preferences screen). For me, that's GNU Emacs.

To use it with Emacs, just add (server-start) to your .emacs and use /usr/bin/emacsclient as your editor in the preferences dialog. Now when you click on the 'edit' button, you'll get a new, empty Emacs buffer to type in. When you are done, type C-x # to close the buffer and get back to the browser. You'll see the contents of the Emacs buffer in the text window. Made a mistake? Clicking 'edit' a second or subsequent time will copy whatever is in the textarea into your editor once again.

Friday, November 16, 2007

Article Roundup

This Code Goes to Eleven asks if adding namespaces to PHP can save it. That question presupposes that PHP is in need of saving - for better or worse, I think PHP is far too widely used at this point to be in danger of extinction. But yes, the lack of proper namespaces in PHP is a royal pain for anything outside of a trivial script.

You have to worry when Bruce Schneier wonders if the NSA put a backdoor in the new PRNG standard.

Gene Simmons is an idiot. Perhaps he should speak to one of these gentlemen. They seem to be doing quite well, despite the evil music downloaders.

Emacs fans who still like to read printed manuals, the GNU Emacs manual for the latest version 22 is finally out in paperback.

TechRepublic talks about alternative Linux desktops.

OFB.biz has a good series of articles on desktop FreeBSD.

Thursday, November 15, 2007

Do You Run X on Linux or Unix Servers?

I very infrequently install X11/Xorg on any servers, unless I'm doing an install for a client and they ask for it. My most common server install is a base installation of Debian stable that weighs in at about 300MB. I always thought there was no need for a graphical display on a server, for the standard reasons:

  • The X server uses resources better devoted to key server processes
  • There are security implications to having the additional libraries and binaries on a system
  • The command line is much more efficient when you need to get something done

Of course, you can leave out the X server, and just install the needed X clients. SSH works great with its built-in X forwarding. But you still have a potential security problem to deal with on the server itself - local privilege escalation from an insecure X binary, for example.

It seems things have been changing lately. Memory and CPU are more plentiful, so resources are not as much of a concern as they were even five years ago. Default installs from the commercial Linux vendors install a full-blown graphical desktop, as much as they still offer the choice of a minimal installation. Security will always be an issue, but SELinux and AppArmor ease the concerns for buffer overflows and privilege escalation. And there are some useful graphical tools with features that would be hard to replicate from a shell - Red Hat's virtual machine manager comes to mind. I still refuse to install X on servers, mainly because I'm habituated to years of shell use (hell, even on my desktops I spend a disproportionate amount of time in a terminal or Emacs buffer). There just seems to be less reason not to install X these days, apart from personal preference.

So I'm wondering, do you install X on your servers, or recommend it for your clients or employer? If so, why?

Tuesday, November 13, 2007

Can a small business afford not to run Linux?

There's an interesting article at ITWire on whether or not a small business afford not to run Linux. The conclusion of the author is that small business should be running Linux, both on the desktop and server. One part of the article caught my eye:

I copped some flack from the Windows crowd for some comments in the prequel to this story in which I expressed my dismay at how slow my highly configured computer ran under Windows Home Server...Apparently, this was my fault according to those who serve Redmond. I should have configured and optimised my computer correctly, chosen my security package more wisely, so I was just an idiot and a dumbass who didn't know what he was talking about...Believe it or not, like most people who use computers for work, I don't have time to fiddle around to optimise my computers and network.


This is understandable, small-business owners don't have time to waste tweaking server and desktop settings to get something usable. They want something that works out-of-the-box. Next comes this:
...how come when I partitioned my disk and installed a dual boot Ubuntu 7.10 system without any special tweaking, only then, when I had Linux up and running, did my computer give me the sort of performance I expected from the hardware?

This was the surprising part - for years, Windows advocates have picked on Linux for the need to configure and tweak it - largely true. It's only been the last year or so that Linux distributions like Ubuntu and Fedora have garnered enough hardware and video support to make installation and configuration pretty painless. Witness the automatic printer configuration in Ubuntu 7.10, for example, or the automatic X-configuration that happens now under Xorg.

Monday, November 12, 2007

Wasting TIme With Web 2.0? Say it Isn't So...

Boy, is this ever true. Of course, you don't need Google and Web 2.0 to waste time. Those that like to organize can spend hours doing just that, whether it be digital or paper-based organization (the funny thing is, the standard response to "I don't have enough time!" is usually to get organized). Those of us who are geeks can spend days "organizing" our Linux desktops just so...or fine-tuning our Emacs/Vim configuration to "be more efficient"... or blogging...which reminds me, I better get back to work.

Sunday, November 11, 2007

Article Roundup

I'm not sure why the fact that Hushmail is giving user's data to the Feds is surprising to anyone. First of all, they are complying with a court order. Secondly, if you're using HushMail's servers and trusting their java applets, don't expect too much. If you really want security, send mails through MixMaster only after encrypting them manually from your laptop replete with encrypted swap and drive partitions, while sitting inside a Faraday cage.

Linux.com brings us Basic presentations with LaTeX Beamer.

From Red Hat Magazine, splitting tar acrchives on the fly.

John Wiegley has written a great article on using Emacs as a day-planner with org-mode.

Free Operating Systems: Plenty of Choices Here, People

Apparently, there is some disagreement about whether or not Gobuntu is a 'free enough' operating system. I often wonder about these disputes, there are plenty of truly free operating systems for the taking. Debian without the non-free or contrib repositories would be quite free enough for even the most ardent Free Software advocate. Same with Fedora, which has steadfastly refused to ship with support for proprietary audio or video formats, for example. Likewise for OpenBSD, with it's principled stance on binary firmware and free software in general. If you're that angry about it, why not devote your time and energy to operating systems that already fit the bill?

Thursday, November 08, 2007

Using cURL for FTP over SSL file transfers

I recently helped a client work through some errors while trying to transfer a file over a secure FTP connection (FTP over SSL) with cURL. If you haven't used curl, it is a great tool that lends itself to scripted data transfers quite nicely. I'll quote from the curl website:
curl is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS and FILE. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos...), file transfer resume, proxy tunneling and a busload of other useful tricks.


Anyway, using curl with FTP over SSL is usually done something like this: curl -3 -v --cacert /etc/ssl/certs/cert.pem \ --ftp-ssl -T "/file/to/upload/file.txt" \ ftp://user:pass@ftp.example.com:port Let's go over these options:
  • -3: Force the use of SSL v3.
  • -v: Gives verbose debugging output. Lines starting with ">" mean data sent by curl. Lines starting with "<" show data received by curl. Lines starting with "*" display additional information presented by curl.
  • --cacert: Specifies which file contains the SSL certificate(s) used to verify the server. This file must be in PEM format.
  • --ftp-ssl: Try to use SSL or TLS for the FTP connection. If the server does not support SSL/TLS, curl will fallback to unencrypted FTP.
  • -T: Specifies a file to upload

The last part of the command line ftp://user:pass@ftp.example.com:port is simply a way to specify the username, password, host and port all in one shot.

How FTP Works

Before I get to the problem, I need to explain a bit about how FTP works. FTP operates in one of two modes - active or passive. In active mode, the client connects to the server on a control port (usually TCP port 21), then starts listening on a random high port and sends this port number back to the server. The server then connects back to the client on the specified port (usually the server's source TCP port is 20). Active mode isn't used much or even recommended anymore, since the reverse connection from the server to the client is frequently blocked, and can be a security risk if not handled properly by intervening firewalls. Contrast this with passive mode, in which the client makes an initial connection to the server on the control port, then waits for the server to send an IP address and port number. The client connects to the specified IP address and port and then sends the data. From a firewall's perspective, this is much nicer, since the control and data connections are in the same direction and the ports are well-defined. Most FTP clients now default to passive mode, curl included.

The problem

Now, a problem can arise when the server sends back the IP address from a passive mode request. If the server is not configured properly, it will send back it's own host IP address, which is almost always a private IP address and different from the address the client connected to. Usually a firewall or router is doing Network Address Translation (NAT) to map requests from the server's public IP address to the server's internal IP address. When the client gets this IP address from the server, it is trying to connect to a non-routable IP address and the connection times out. How do you know when this problem has manifested itself? Take a look at this partial debug output from curl:

... > PASV < 227 Entering Passive Mode (172,19,2,90,41,20) * Trying 172.19.2.90...
Here the client has sent the PASV command, which asks the server for a passive data connection. The server returns a string of six decimal numbers, representing the IP address (first four digits) and port (last two digits). Here the IP address is 172.19.2.90 - a non-routable IP address as per RFC 1918. When the client tries to connect to this address, it will fail.

The solution...sort of

In 1998 RFC 2428 was released, which specified 'Extended Passive Mode', specifically meant to address this problem. In extended passive mode, only the port is returned to the client, the client assumes the IP address of the server has not changed. The problem with this solution is that many FTP servers still do not support extended passive mode. If you try, you will see something like this:
> EPSV * Connect data stream passively < 500 'EPSV': command not understood. * disabling EPSV usage > PASV < 227 Entering Passive Mode (172,19,2,90,41,20) * Trying 172.19.2.90...

...and we're back to the same problem again.

The Real Solution

Curl has a neat solution to this problem, requiring two additional options. The first is --disable-epsv, which prevents curl from sending the EPSV command - it will just default to standard passive mode. The second is --ftp-skip-pasv-ip, which tells curl to ignore the IP address returned by the server, and to connect back to the server IP address specified in the command line. Let's put it all together:
curl -3 -v --cacert /etc/ssl/certs/cert.pem \ --disable-epsv --ftp-skip-pasv-ip \ --ftp-ssl -T "/file/to/upload/file.txt" \ ftp://user:pass@ftp.example.com:port
If this succeeds, you'll see something like this:

* SSL certificate verify ok. ... < 226- Transfer complete - acknowledgment message is pending. < 226 Transfer complete. > QUIT < 221 Goodbye.
The final 226 Transfer complete is the sign that the file was transferred to the server successfully.

Tuesday, November 06, 2007

Happy Birthday VAX!

From Yahoo! news, the VMS operating system just turned 30 years old. Amazing that there are so many VAXen still in use today:
Gareth Williams, associate director of the Smithsonian Astrophysical Observatory Minor Planet Center since 1990, has been tracking the 400,000 orbits of known asteroids and comets in the solar system using a cluster of 12 VAXes, from offices on the Harvard University campus. The Deutsche Börse stock exchange in Frankfurt runs on VMS. The Australian Stock Exchange runs on it. The train system in Ireland, Irish Rail, runs on it, as does the Amsterdam police department. The U.S. Postal Service runs its mail sorters on OpenVMS, and Amazon.com uses it to ship 112,000 packages a day. It has "a very loyal installed base of customers," says Ann McQuaid, general manager of OpenVMS at HP, who shows no signs of wanting to give it up.


I haven't sat in front of a VAX terminal in years; the last time was in the late eighties when I was a CS student at UMASS, Amherst. It was a VAX 11-780, which I did C programming on. I still recall the VAX lab being reserved for junior and senior-year students only, as it was light-years ahead of the horrific Cyber mainframe freshman CS and Engineering students were subjected to.

Sunday, November 04, 2007

BusyBox Developers settle GPL Lawsuit against Monsoon Multimedia

You had to know this was going to happen, although it would have been nice if the GPL were finally proven in court. Perhaps the fact that it has not speaks to the thought that went into the GPL's development over the years. One thing I've noticed about these lawsuits - they are rare. Most cases are settled without an actual legal filing. It's safe to say that free software authors want to write code, not file lawsuits, so I'm guessing the number of frivolous GPL lawsuits is close to non-existent. Clearly the defendant's attorneys are always recommending a settlement rather than a (likely) loss in court.

Update: You can read some interesting commentary on this case at the following blogs:


Sunday Morning Humor

Some light reading, thanks to the alt.sysadmin.recovery Manpage Collection.

Saturday, November 03, 2007

Two Useful Firefox Extensions

I came across Customize Google the other day, a great Firefox extension that improves your Google experience, especially if you are concerned about privacy. Some highlights - it links to competitors and to the Wayback Machine in search results, anonymizes your Google user ID, removes click-tracking, and forces Google apps to use https URL's.

I've also been using the Better Gmail extension for a while, which duplicates some of CustomizeGoogle's Gmail features, plus lots more.

Friday, November 02, 2007

FlickOff: Escaping the Clutches of Web 2.0

A nice article from the latest Linux Gazette on escaping Flickr. The article details how to have pictures from a camera phone auto-posted to a web gallery. I can relate, I had used Yahoo photos for a few years - Yahoo discontinued it earlier this year and provided a migration path to Flickr. I took it, but later deleted my account after being nagged to pay for their pro account. In the end I just set up Gallery on my own web server.