Thursday, September 27, 2007

GNU Screen

I use GNU screen a lot. Nothing beats it for keeping SSH connections open to multiple servers, and it has some killer features if you spend a lot of time at a shell prompt. There is a very good introductory article up at Red Hat Magazine, a guide to GNU Screen. I particularly like their customization of the status line.

One customization I've had in my .screenrc for some time is to replace the normal command-key prefix (ctrl-a) with a single key (backquote), it's much faster: escape ``
If you ever need to type a single backquote (like when you are editing a shell script), you just have to hit the backquote key twice in a row.

I would only add to the article that on Debian or Ubuntu, install screen with apt-get install screen.

Article Roundup

From HowtoForge, Speeding Up Perl Scripts With SpeedyCGI On Debian Etch (a great site, BTW). Simpler and with fewer features than mod_perl, but good for running legacy Perl CGI scripts faster.

Here is a presentation on the new regex engine in coming in Perl 5.10.

From C. Titus Brown, a blog post on writing Python code that doesn't suck. Most of it applies to any language. This interested me because as a Perl coder, I'm constantly dealing with the tired "Perl code is soooo hard to read...but Python, on the other hand...". Really, you can write sucky code in any language.

Linux.com gives us Implementing Quotas to Restrict Disk Space Usage.

Apparently, OpenOffice is a hit on WalMart PC's.

The Simple Dollar gives us 30 Essential Pieces Of Free (and Open) Software for Windows. Missing were Cygwin and WinSCP - the first things I install on any Windows box I get stuck at.

Tuesday, September 25, 2007

What it Means to be a Hacker

A decent article for a change about what it means to be a hacker. You don't usually see commentary like this from the main-stream-media, to them, "hacker" is usually synonymous with the teenage script-kiddie. This about sums it up:

By focusing on the bad apples, Priest says, Madigan was glossing over DefCon's true spirit: smart people getting together to mess around with technology.

"Middle America thinks we're stealing your social security numbers, raping your children and breaking into your bank account," he says. "The reality is, we are the ultimate explorers. We see a technology, and we want to know how it works."

I've been to DefCon once (0x0c), so I can definitely back-up that sentiment. While there was the occasional network nuisance, they were not given any attention. More of a "if you know what you are doing, you won't be bothered by these idiots".

Snowed by SCO

Dan Lyons of Forbes.com admits he was wrong about SCO. If only we could get all journalists to be this honest...

The truth, as is often the case, is far less exciting than the conspiracy theorists would like to believe. It is simply this: I got it wrong. The nerds got it right.

SCO is road kill. Its lawsuit long ago ceased to represent any threat to Linux. That operating system has become far too successful to be dislodged. Someday soon the SCO lawsuits will go away, and I will never have to write another article about SCO ever again. I can't wait.

Saturday, September 22, 2007

Why Linux Hasn't "Made It" to a Desktop Near You

There is a column over at DesktopLinux.com titled 13 reasons why Linux won't make it to a desktop near you. The author's main premise is that Linux will never truly infiltrate the consumer desktop because Linux isn't a "normal" product in the sense that you can easily market and brand it, and even if it were, it is far too complex and there are too many choices for consumers. I think he's way off-base. I've talked about this issue before. What's killing desktop Linux is Microsoft's lock on the OEM market.

Here's a quote from the article:
Even basic things like partitioning, windows managers, file managers and software update processes are not standardized across our shortlist of user-friendly Linux distros. To varying degrees, you will strike problems getting Linux set up correctly if your PC has an LCD screen that is large or wide, or if you have a fancy graphics card (NVIDIA or ATI) or you want to set up WI-FI or play video clips out of the box.

And if you're installing Linux on the same hard drive as Windows XP, you'll need to create a new partition or two. That's a knee trembler for simple users, a leap of faith of the white knuckle kind. It's a good idea to make full backups before you do this, yet the process can be quite straightforward. For example, Ubuntu offers to shrink your Windows partition to your chosen size and to create the additional partitions you need automatically.

It's not that it's hard, just that it's unfamiliar. Linux doesn't know about C, D and E drives and Windows will show up as sda1/dev or hda1/dev in the partitioning table. What's missing is a simple explanation of these basics, and none of the Linux desktops provide that. You're traveling in a foreign country and you have trouble reading the road signs, and there's no helpful traffic cop to be found. It spoils your trip.

Comparing the ease of use of Windows with Linux by saying "Linux is too difficult to install" misses the point - that few users ever have to install Windows. Their PCs come pre-loaded with the operating system. You could replace "Linux" with "Windows" in the above excerpt with the terms reversed and it would still make sense, viewed from the eyes of a non-Windows user.

Similarly, the issue of "too much choice" is a meaningless. Another quote:

On closer inspection, you find that there are 500 versions of the product. When you try to understand the subtle differences between them, you become confused. Your enthusiasm starts to flag.

If say, Ubuntu Linux came pre-installed on consumer laptops, the issue of choice is now "which model laptop do I buy?". Yes, you might have different laptop manufacturers offering different distributions of Linux, but most consumers won't use that fact to decide which laptop to buy. They will primarily look at the hardware support or the company's reputation, not the technical particulars.

My point above about replacing "Linux" with "Windows" above pertains to most of the article, really. Those of us who have been comfortably using Linux desktops for years read articles like this and immediately see the problem - the articles are always written from the point of view of a Windows user. It's an example of confirmation bias - you have a belief that Linux will never make inroads into the consumer desktop, so you look for theories that affirm this belief ("Linux is too complicated", "The support sucks", "Who can install Linux, anyway?"), and ignore that fact that "YourOS" suffers from the same problems.

There are some good points in the article, but they don't really impact Linux's future on the desktop. For example:

When you discover that some of the designers have made deals with their biggest competitor, the last drop of your enthusiasm drains away.

Obviously a poke at Novell, but I don't see Novell's deal with Microsoft impacting the OEM market for consumer PCs. Novell is after the business desktop.

In the end, I still contend that Microsoft's OEM agreements and monopolistic practices are what is preventing desktop Linux from taking hold. This is the simplest explanation, and I'm not sure it will change anytime soon. The Dell/Ubuntu offering is a good start, but you won't see these laptops in stores, and they are not linked from Dell's main site that the average consumer is likely to buy from.

Don't Mess With Your Sysadmin

A funny reminder not to mess with your sysadmin. Reminds me of the BOFH stories.

Thursday, September 13, 2007

Counting Words in Files With HTML Markup

I write blog posts with HTML markup, and I sometimes want to get a fairly accurate word count of my posts. By accurate I mean that HTML tags themselves as well as quoted values are not counted as words. There are a lots of utilities and scripts that do word counting, from the venerable Unix 'wc' to an elisp subroutine in the FSF's An Introduction to Programming in Emacs Lisp. The ones I looked at all suffered from the same problem - they counted markup as 'words'. If there was some way to strip out or ignore markup, the various methods of word counting would work.

First I tried a few ready-made utilities. The Unix text-mode browser lynx has a 'dump' option that will output formatted text content from a given html file (lynx -dump -nolist foo.html), however, it outputs formatted text, and some of the formatting markup is itself counted as a word by the 'wc' utility. w3m is similar in its output, so has the same problems. I found a Debian package called unhtml that seemed to do what I wanted, but after experimenting a bit with it, I found that it could not handle multiple opening and closing tags on the same line (it counted them as one tag, meaning any real words in that line were skipped). Thinking I might have to write my own utility, I set out to not reinvent the wheel and did a CPAN search - and had success on the first hit. After a few tests I found that HTML::Strip did indeed handle multiple tags on a line as well as HTML comments and values properly.

The next step was to write a wrapper around HTML::Strip for command line use. After a bit of hacking, I came up with unhtml.pl. From the script header:
Script that strips HTML tags from text. It uses HTML::Strip to do the real work; this is a wrapper around that module that allows you to specify command line arguments - standard input/output is assumed if no args are given. If only one arg is given, it is assumed to be the input pathname.

Requires HTML::Strip (perl -MCPAN -e 'install HTML::Strip' as root on any Unix-based OS will work).

Examples (the following have equivalent results):

unhtml.pl < foo.html > foo.txt
unhtml.pl foo.html > foo.txt
unhtml.pl foo.html foo.txt


I also needed a way to integrate this into Emacs, here is an elisp snippet you can put in your .emacs (don't forget to modify the path to the script):
(defun word-count nil "Count words in region" (interactive) (shell-command-on-region (point) (mark) "/home/dmaxwell/bin/unhtml.pl | wc -w")) (global-set-key "\C-c=" 'word-count)

As a bonus, it also handles XML and SGML properly. To use it while editing, just type C-x= to get a word count of the current region (use C-xh to make the region the entire buffer), minus HTML tags.