Saturday, June 10, 2006

Shiny and New: Emacs 22

I finally upgraded to Emacs 22 a few weeks ago, and now I'm wishing I'd braved it sooner. Technically it's not released yet; I'm working from a build of a cvs snapshot from a month or so ago. But the Emacs dev team works pretty hard to make sure it has problem-free builds on a whole slew of platforms, so just following their instructions has a pretty good chance of working for you.

It's worth the effort. Truly. Reading through its NEWS file, there's just tons and tons of new functionality. It's going to take me some time, maybe a few weekends, just to absorb it all.

Personally, though, I think there are two features that by themselves justify the entire effort of upgrading: the Unicode and UTF-8 support, and the enhanced replace-regexp command.

International At Last

It's been a very long wait for Unicode and UTF-8 support, and now that I have it, I could never go back. There isn't much to say about it, except that it works. Seamlessly. It used to be hard to get international characters into and out of Emacs, because it had its own custom way of dealing with them. Now it's a snap.

In fact — here, I'll show ya. If you type C-h h, it brings up the HELLO file, which contains greetings in a variety of languages. Here's some Chinese: 中文,普通话,汉语. Here's some Korean: 안녕하세요, 안녕하십니까. Here's some Russian: Здравствуйте!

I'm not doing anything special; I'm just copying the strings out of the HELLO buffer and into my html buffer, and saving the file. I added the content-type header line in this HTML file, and all the characters just show up effortlessly in Firefox.

If you can't see them in your browser, well... Firefox is free. Or it might be a font problem on your system. As far as I'm concerned, any problems you may have in viewing them is no longer the fault of my Emacs session, which makes me Happy. Speaking as a developer who needs to internationalize every program I write, I can't begin to tell you how useful it's been to have seamless editing of utf-8 encoded files for the past month.

Right there, that feature alone is worth the upgrade.

But wait, there's more... Even though Emacs 22 has a bunch of noteworthy and exciting new features, blah blah blah, I'm going to blithely ignore them all today and focus with single-minded zeal on just one feature. It has a teeny tiny entry in the NEWS file; it's barely mentioned, really. I'm sure it was a thousand times less work than the UTF-8 support, but even so, it might well be strong enough on its own to justify the (moderate) pain of upgrading from Emacs 21.

Take a look at my examples and see if you agree!

Replacement Super-Powers

Emacs 22 sports an amazing new editing feature that's had me drooling in anticipation since I first heard about it, maybe six or eight months ago. As you can well imagine, that's a lot of drool.

And what might the feature be, you ask? Well, they've enhanced M-x {query-}replace-regxp to accept lisp expressions to be evaluated in the replacement string.

That might not seem like a big deal, so let's run through some examples, from simple to very fancy.

Example: Changing Case in Replacement Strings

Have you ever wanted to change the case of certain letters in the replacement string for M-x replace-regexp? It used to be a real pain; you either had to write a Lisp function or fix them all by hand. Now it's trivial.

As a simple demonstration, let's say you have a list of names that you need capitalized, like so:

bob
sue
ralph
alice
jimmy
preston
billy joe jim bob

It's a contrived example, since Emacs already has M-x capitalize-region. Or you could use C-u 10 M-x capitalize-word. But let's try it with the new replace-regexp evaluation feature to see how it works.

It's just like a normal M-x {query-}replace-regexp, but you'll prefix any lisp expressions in the replacement string with the sequence `\,' (i.e., a backslash and a comma). In this case, we match the whole word, and invoke the Emacs-Lisp function `capitalize' to capitalize the word we just matched:

M-x replace-regexp
Replace regexp: \(\w+\)
Replace regexp with: \,(capitalize \1)

and we wind up with each word capitalized, just like we wanted:

Bob
Sue
Ralph
Alice
Jimmy
Preston
Billy Joe Jim Bob

Unlike the capitalize-{word/region} commands, which have hardwired behavior, the {query-}replace-regexp commands give us tremendous flexibility. For instance, we could have capitalized the last letters of the names instead, by splitting each word into two regexps, with the second regexp matching just the last character. Then it's a simple matter to reconstruct the word with the last letter capitalized:

M-x replace-regexp
Replace regexp: \(\w+\)\(\w\)
Replace regexp with: \1\,(capitalize \2)

..to get the reverse capitalization we wanted:

boB
suE
ralpH
alicE
jimmY
prestoN
billY joE jiM boB

For a somewhat more realistic example, let's say you've defined some "getter" functions in a Java class, like so:
  public Relative father() { return this.father; }
public Relative mother() { return this.mother; }
public Relative sister() { return this.sister; }
public Relative brother() { return this.brother; }
public Relative auntie() { return this.auntie; }
public Relative uncle() { return this.uncle; }
...
and your code reviewer wants you prepend the word "get" to each of them.

Well, this is a classic refactoring situation ("rename method"), but you're going to have to invoke the refactoring manually for each method, and you may have hundreds of them.

If these methods have been around a while, and they're being referenced by many external callers, then you're safest using a refactoring tool. You may even consider writing a one-off refactoring script, perhaps in Jython or Mozilla Rhino, that makes programmatic use of either your IDE's refactoring APIs or a lower-level tool such as ANTLR or JavaCC.

Whew! That's going to be a lot of work, no matter how you slice it. And if you've published your APIs externally, then you're screwed; all you can do is @deprecate the old names and hope people stop using them someday.

But that's why you get your code reviews done early, right? In many real-world situations, you're performing a rename-method on a new class that has no external callers yet. And in those situations, Emacs 22 will get the job done far faster than a refactoring IDE can.

In this case, we'd just do a straightforward replacement with capitalization, similar to the one in our last example, like so:
M-x replace-regexp
Replace regexp: \(public Relative \)\(\w\)\(\w+\)
Replace regexp with: \1get\,(capitalize \2)\3
Et voilà: <-- (p.s.: C-x 8 ` a gives you that neat 'à' character.)
  public Relative getFather() { return this.father; }
public Relative getMother() { return this.mother; }
public Relative getSister() { return this.sister; }
public Relative getBrother() { return this.brother; }
public Relative getAuntie() { return this.auntie; }
public Relative getUncle() { return this.uncle; }
...
Even if you do most of your coding in the comfy confines of a visual IDE, it can be awfully handy to keep Emacs around for your fine-grained text surgery.

Now we can move on to some more interesting examples, so you can feel you got your money's worth out of today's blog entry. But first...

A Note About Emacs Regexps

Emacs regular expression syntax is very old, predating Perl 5's fancier regex syntax by almost a decade. Perl's regexp enhancements are the defacto standard, supported in virtually all major programming languages. Unfortunately, nobody has ever seen fit to retrofit poor Emacs with an alternate Perl-compatible regexp syntax, so Emacs regexps are now nonstandard and a bit awkward. Here are a few of the noteworthy differences:
  • You have to escape the (, ), {, }, and | metacharacters. That is, they're not metacharacters by default — without the backslash, they match themselves.

  • There's no '\d' shortcut for the [0-9] character class.

  • There are no lookahead or lookbehind assertions.

  • There are no direct equivalents for Perl's {n}?, {n,}?, {n,m}?, /i, /m, /s, /x, \G, or (?# ...) constructs.
But Emacs still supports some of the constructs you've come to expect:
  • You can specify exactly N repetitions with `\{N\}', and between N and M repetitions (inclusive) with `\{N,M\}'. E.g. [0-9]\{3\}-[0-9]\{4\} matches a 7-digit phone-number in the format xxx-xxxx.

  • You can have backreferences to previously matched groups in the regular expression. E.g. \(re\).*\1 matches words with "re" appearing at least twice, like carefree and première.

  • You can specify "shy" groups that don't record the match with \(?: ... \).
There are also some Emacs-specific enhancements, such as matchers for entries in the mode-specific syntax tables. The Info pages have more details on Emacs regular expressions.

If you plan to be more than a casual Emacs user, you should study the regexp syntax carefully, because there are many useful commands in Emacs that operate on regular expression matches. The better you know the Emacs-specific syntax, the more productive you'll be.

I suppose before we move on to the next example, I should preemptively answer one of the most frequently asked questions about Emacs regexps.

Q: How do I embed a newline in a regexp I'm typing into the minibuffer?

A: You use the key sequence C-q C-j. The C-q invokes the Emacs `quoted-insert' command, which basically says "insert the next character literally, without invoking any commands with it." C-j (i.e., control-j) is how a newline character is represented in Emacs.

C-q is a useful general-purpose Emacs command. Whenever you want to insert a character (in the minibuffer or a regular buffer), and it's just refusing to go in, C-q <char> will almost always do the trick.

Example: Numbering Lists

With our new replacement-with-evaluation feature, it becomes straightforward to create numbered lists. Emacs 22 has introduced a new backreferencing metacharacter, `\#', which counts the number of replacements we've done so far in the current command. So even without using any Lisp, we already have one way to make numbered lists.

Let's see... we'll need a short list of words as an example. How about all the words in /usr/share/dict/words that don't end in [a-z]? Easy enough to find out. We M-x find-file /usr/share/dict/words (only if you're on a Unix system, of course, and the location varies), and then M-x list-match [^a-z]$. Ah, perfect — our *Occur* buffer shows 32 matches:

   1987:Bogotá
5243:Fabergé
9772:Mallarmé
12044:Paraná
12499:Poincaré
16956:abbé
19923:appliqué
20932:attaché
23704:blasé
26223:café
26511:canapé
29314:cliché
31431:consommé
38981:décolleté
42995:fiancé
43623:flambé
44996:frappé
48317:habitué
58328:macramé
58898:manqué
62514:naiveté
65243:outré
66710:passé
71609:protégé
73675:recherché
76387:risqué
76847:roué
77811:sauté
82455:soufflé
89055:touché
96268:émigré
96274:études

They're prefixed by their line number, but we can make that disappear during the replacement. Let's turn them into a numbered list. First copy the matches into a new, writable buffer, then M-C-< to go to the top of the list, and then:
M-x replace-regexp
Replace regexp: \(.+:\)
Replace regexp with \#. 
Boom!
0. Bogotá
1. Fabergé
2. Mallarmé
3. Paraná
4. Poincaré
5. abbé
6. appliqué
7. attaché
8. blasé
9. café
10. canapé
11. cliché
12. consommé
13. décolleté
14. fiancé
15. flambé
16. frappé
17. habitué
18. macramé
19. manqué
20. naiveté
21. outré
22. passé
23. protégé
24. recherché
25. risqué
26. roué
27. sauté
28. soufflé
29. touché
30. émigré
31. études

Oooh, but only Computer Science students like lists numbered from zero. So let's M-x undo (using C-/ of course — everyone uses C-x C-u, but that's way too many keystrokes for something as common as Undo!) and use a tiny bit of Lisp to start the numbering at 1.

It so happens that Emacs-Lisp defines a function called 1+, which increments a number. So we can just wrap the `\#' in our replacement string with that function, like so:

\,(1+ \#).  <-- (There's a trailing space after the ".")

The result is just what we wanted:
1. Bogotá
2. Fabergé
3. Mallarmé
4. Paraná
5. Poincaré
6. abbé
7. appliqué
8. attaché
9. blasé
...

The lisp 1+ function operates on numbers, not strings, so you might have expected it to barf with a wrong-type-argument error. Our example works because the \# metacharacter returns a count of matches so far, which is a number, not a string value. In our next example, we'll have to do the conversion ourselves.

Example: Re-numbering Lists

We can use Lisp-code snippet similar to our previous one to renumber an existing list. Let's say we want to insert a word in a numbered list, like so:
1. Bogotá
2. Fabergé
3. Flambé     <-- (We inserted this one. Nice word, eh?)
3. Mallarmé
4. Paraná
5. Poincaré
6. abbé
7. appliqué
8. attaché
9. blasé
...

Easy to fix in Emacs22: place the cursor just after the word we inserted, and M-x replace-regexp `^\([0-9]+\)' with `\,(1+ (string-to-int \1))'. The result:
1. Bogotá
2. Fabergé
3. Flambé
4. Mallarmé
5. Paraná
6. Poincaré
7. abbé
8. appliqué
9. attaché
10. blasé
...

This time we used a numbered backreference (\1), which always returns a string. Don't be fooled by the fact that we appear to be matching a number: the regexp `^\([0-9]+\)' matches a string containing numeric digits, and we have to do a type conversion (using the Emacs-Lisp builtin function string-to-int) if we want to increment it.

I hope by now you're beginning to suspect that knowing a little Emacs-Lisp can help you immensely with your editing tasks. Believe it! (Not surprisingly, knowing a lot of Emacs-Lisp helps even more.)

Example: Alphabetically Numbered Lists

Let's say we have a list of 26 or fewer items, and we want to "number" it with A, B, and C rather than 1, 2, and 3.

Well, shoot. The list we've been using has 32 items. I'd prefer a list of 26 (or so) for this example.

Let's see... we can write a wee Lisp function to group the words in /usr/share/dict/words by their ending letters a-z:
(cl-prettyprint
(save-excursion
(set-buffer "words")
(loop for c from ?a to ?z
collect (let ((i 0)
(tail (string c)))
(beginning-of-buffer)
(while (re-search-forward (concat tail "$") nil t)
(incf i))
(cons tail i)))))

We just evaluate this snippet in our *scratch* buffer (by typing C-j after the last paren). It crunches the words buffer and produces:
(("a" . 1625)
("b" . 167)
("c" . 772)
("d" . 8331)
("e" . 7190)
("f" . 191)
("g" . 7401)
("h" . 991)
("i" . 457)
("j" . 4)
("k" . 784)
("l" . 2041)
("m" . 920)
("n" . 4347)
("o" . 718)
("p" . 450)
("q" . 5)
("r" . 4279)
("s" . 44857)
("t" . 4454)
("u" . 140)
("v" . 46)
("w" . 247)
("x" . 182)
("y" . 5519)
("z" . 124))

Hmmm... nothing really promising. We only get 9 words if we combine the ones ending in "q" and "j". A minor tweak to our search function will show us words ending with a doubled letter:
(cl-prettyprint
(save-excursion
(set-buffer "words")
(loop for c from ?a to ?z
collect (let ((i 0)
(tail (concat (string c) (string c))))
(beginning-of-buffer)
(while (re-search-forward (concat tail "$") nil t)
(incf i))
(cons tail i)))))

Evaluating it gives us:
(("aa" . 2)
("bb" . 3)
("cc" . 1)
("dd" . 5)
("ee" . 136)
("ff" . 75)
("gg" . 4)
("hh" . 0)
("ii" . 4)
("jj" . 0)
("kk" . 0)
("ll" . 291)
("mm" . 2)
("nn" . 38)
("oo" . 30)
("pp" . 4)
("qq" . 0)
("rr" . 13)
("ss" . 1276)
("tt" . 58)
("uu" . 1)
("vv" . 0)
("ww" . 0)
("xx" . 0)
("yy" . 0)
("zz" . 9))

Looks like we have more options with this list. Maybe if we just take the ones with a count of 5 or less... looks like 26 of them. Perfect!

We could write a little more code to extract the words matching our criteria, but it's clearly going to be fastest to eyeball it. So we call M-x list-matching-lines with the regexp \(aa\|bb\|cc\|dd\|gg\|ii\|mm\|pp\|uu\)$, and we get our list of 26 words:
   2145:Bragg
3436:Cobb
5662:Fromm
6284:Gregg
6317:Grimm
6675:Hawaii
8025:Judd
8290:Kellogg
8411:Kidd
8509:Knapp
8645:Krupp
8689:Kwanzaa
8804:Lapp
12549:Pompeii
15425:Todd
16257:Webb
16641:Yacc
17702:add
21427:baa
39145:ebb
39372:egg
46305:genii
62411:muumuu
64283:odd
72801:radii
78139:schlepp

And what a fine bunch of words they are. Just try doing that exercise in Java or C++ sometime.

In any case, now we have a list for our example, and we want to number it alphabetically. So we need to replace all the cruft up through each ':' with a counter converted to an alphabet character.

As we saw in our little function that produced this word list, Emacs uses `?c' syntax to represent characters, and internally they're just ints. So we just add the character `?a' to our `\#' counter this time, to loop through the characters 'a' to 'z':
M-x replace-regexp
Replace regexp: ^\(.+:\)
Replace regexp with: \,(+ ?a \#))

And, ladies and gentlemen... behold!
97) Bragg
98) Cobb
99) Fromm
100) Gregg
101) Grimm
102) Hawaii
103) Judd
104) Kellogg
105) Kidd
106) Knapp
107) Krupp
108) Kwanzaa
109) Lapp
110) Pompeii
111) Todd
112) Webb
113) Yacc
114) add
115) baa
116) ebb
117) egg
118) genii
119) muumuu
120) odd
121) radii
122) schlepp

D'oh!!!! I forgot to convert the counter back to a character. Haha. Oops.

After a quick C-/ to undo the operation, we can just change the replacement regexp to `\,(string (+ ?a \#))) ', and we finally have our alphabetically-enumerated word list:
a) Bragg
b) Cobb
c) Fromm
d) Gregg
e) Grimm
f) Hawaii
g) Judd
h) Kellogg
i) Kidd
j) Knapp
k) Krupp
l) Kwanzaa
m) Lapp
n) Pompeii
o) Todd
p) Webb
q) Yacc
r) add
s) baa
t) ebb
u) egg
v) genii
w) muumuu
x) odd
y) radii
z) schlepp

Or we could get capital letters by using `\,(upcase (string (+ ?a \#)))) ' instead.

Note that M-x replace-regexp has its own command history list, so you can just use up-arrow to fetch old regexps you've entered, and tweak them in place. Easier than re-entering them from scratch every time.

Some Even Snazzier Examples

So far we've used this amazing little new feature to generate (and renumber) various lists, and to change the capitalization of the replacement text on the fly (in two different ways). Both very practical and useful transformations.

In our next example, we'll assume you're working on a Java-based Web application, because your company is too lame to let you use Ruby on Rails. Hypothetically speaking, of course.

Suppose you have some JSP files containing references to various static images, e.g. <img src="images/foo_bar.gif">, and you decide you want to change them to calls into Java code to fetch the image URLs as the page is composed. So "images/foo_bar.gif" needs to change to (say) <%= StaticImageManager.FOO_BAR_GIF.getUrl()%>.

Well, clearly no fancy-pants refactoring IDE on the planet is going to be able to help you with this. If you're an Eclipse or IntelliJ or Visual Studio user, get ready for some carpal tunnel while you manually change every instance.

However, if you've followed the examples so far, you know it's trivial in Emacs 22:
M-x replace-regexp
Replace regexp: "images/\([a-z_]+\)\.\(gif\|jpg\)"
Replace regexp with: <%= StaticImageManager.\,(upcase (concat \1 "_" \2)).getUrl() %>

and they're all fixed in the blink of an eye.

But you knew that by now. This example wasn't more complex than the others, just a little longer.

It starts to get even more interesting if you permit side effects in your lisp expressions. That is to say, persistent changes to the world, whether it's Emacs variables, your buffer configuration, or even your filesystem. You have to be a bit more careful, but you can use the new replace-regexp eval feature as a powerful interactive scripting engine.

Our last example will be opening files. Often you'll find yourself looking for files using the Unix `find' command in a shell. But what if you want to open the files it turned up?

Again, it's a slightly contrived example, because it's already possible to use Unix shell commands and the "emacsclient" program to instruct Emacs to open the files you find. But it should suffice to show you what we mean by "side-effecting replacements".

A simple example should work. Let's go to our installed emacs lisp directory. Mine's /usr/share/emacs/22.0.50/lisp, which I found by looking at my `load-path' variable in my *scratch* buffer. There are various subdirectories, including textmodes/, progmodes/, and others.

To have Emacs open (say) all the elisp files beginning with the letter `x', we M-x shell, cd to /usr/share/emacs/22.0.50/lisp, and use the `find' command:
/usr/share/emacs/22.0.50/lisp>find . -name "x*.el"
./progmodes/xscheme.el
./term/x-win.el
./term/xterm.el
./obsolete/x-apollo.el
./obsolete/x-menu.el
./x-dnd.el
./xml.el
./xt-mouse.el

If you select the lines naming the 8 files above, then M-x replace-regexp will operate just in the selected region. Opening the selected files is then one easy command:
M-x replace-regexp
Replace regexp: .+
Replace regexp with: \,(find-file-noselect \&)

The files are silently opened in the background when you execute this "replacement" command. We could alternately have used `find-file' to watch them opened noisily in the foreground, but when opening lots of files I personally prefer to open them in the background. (This also makes them appear at the bottom of your buffer-list.)

Note that we used a new metacharacter here, `\&', which grabs the entire string that matched. That means we were able to omit the grouping parens. We also relied on the fact that Emacs regexps are, by default, anchored to the beginning and end of the line, so `.+' matches exactly one line, not counting the newline character. Pretty convenient!

After the replacement, the lines in your *shell* buffer are replaced with the return values of the calls to `find-file{-noselect}', which in this case is just the name of the file. But we don't really care, since it's a shell buffer; we were doing it purely for the side effect and not for the replacement.

Armed with your new-found knowledge, replace-regexp and query-replace-regexp should become some of the most powerful tools in your editing toolchest. The more experience you have with Emacs regular expressions, and with Emacs Lisp, the more bang for your buck you'll get out of this enhancement.

Example: Heading-Tag Promotion/Demotion

Oh, OK, fine. One laaaaaast example, because I just ran into it as I was putting in my final edits. You know how most browsers like to render <h1> tags in 10-foot tall letters? So we all start with <h2> or even <h3> tags and work down from there? (Those of us too lazy to muck with CSS overly much, that is. Which is most of us.)

Well, I started my blog entry today with <h1> tags, and decided to bump them all up a number (and thus down in size.)

I used to do this operation with N successive replacements, with N ranging from 2 to 5. You replace the <h5>'s with <h6>, then the <h4>'s with <h5>, and so on. No more of that hooey for me! I can renumber them all with a single replacement. You guessed it:
M-x replace-regexp
Replace regexp: <\(/?\)h\([0-9]\)>
Replace regexp with: <\1h\,(1+ (string-to-int \2))>
Zoom, zoom, zoom! All fixed in one swell foop. That's just awesome.

There's also a new query-replace-regexp-eval function, but it's not all that different from what we've talked about here, so you can read about it when you upgrade.

You are going to upgrade now, right? Well, it's your choice, it's your time. Gotta spend time to save time, as the old saying (almost) goes.

But I think it was worth it.

How do I learn this Emacs-Lisp doohickey, anyway?

It's really not too hard. Honest. Especially since I'm going to tell you some things that will make it much easier on you, because Emacs Lisp is pretty different from other languages you're used to. If you keep these points firmly in mind, learning it will be a snap, really. They're all things I wish someone had told me when I started learning elisp.

It'll Always be Useful

First, recognize that Emacs Lisp isn't going anywhere. Emacs is not going to magically become programmable in Python or Ruby or JavaScript or Perl overnight. (Or, God save us, Java or C++ or C# — all fine languages, to be sure, but they all suck at scripting.)

And judging from the last decade's pace of innovation in editing and coding environments, it will be many years before any editor begins to approach Emacs in the things Emacs does well. [The one noteworthy exception is VIM, which is also very powerful by all accounts, though I have no experience with it. If you have already developed a preference for vi over emacs, then you may experience greater happiness pursuing expertise with VIM. Psh.]

There do, in fact, exist packages that make it possible to write Emacs extensions in Python (PyMacs), Ruby (El4r), and Perl (EPL). But they're far from seamless: they're hard to install and they're hard to learn. They will only appeal to you (maybe) if you're a truly die-hard programmer in one of those languages, and you already know a fair amount of emacs-lisp, because they're closely tied to the elisp programming model. You will still need to know about buffers, overlays, markers, plists, symbols, and all the other Emacs-Lisp abstractions. And you'll have to deal with the sometimes complex mapping between language X and Emacs-Lisp.

Plus, if you want to share your extensions with your friends, they'll have to go through the install process as well. I just wouldn't go there, not if you're trying to learn how to get better with Emacs. If you're an expert in both languages, then sure. The package authors could use your help.

In the meantime, Emacs isn't going anywhere, and Emacs-Lisp isn't going anywhere, not for several decades at least, so it will benefit you to learn them deeply. It will never be obsolete knowledge. You might as well start learning it now, and reap the benefits now.

It's Kinda Like XML

You can think of Emacs Lisp as being very much like XML. There are some differences that will become apparent as you use it, but thinking of it as XML will help a lot.

In most programming languages, it's good style to avoid deeply indented code. With XML, indentation depth is entirely a function of your data domain; you don't generally think about restructuring it to avoid indentation. (Think of XHTML, for instance, in which the nesting can become arbitrarily deep.) With XML, you're building a tree structure, and it's easy to see it that way. Your XML processing tools help you manage navigating your way around complex documents.

With Lisp, you're also building an explicit tree structure. That means it's going to be indented very differently from your C/Java/Perl/Python code. The indentation is less something you decide, and more something that's decided for you based on the approach you take to the problem.

The perpetual indentation used to drive me nuts, but I've come to appreciate its advantages. I won't go into them here, but given what you know about XML, I'm sure you can imagine a few benefits without too much effort.

It Gets Better With Practice

If you do it enough, eventually you'll enjoy programming in Emacs-Lisp, no matter how much you hate it initially. You'll probably never love it, and you'll pine for a more powerful Lisp dialect, and for features from other languages you know. But like any other programming language, it becomes way more fun as you go from beginner to expert.

Programmers really hate new syntax. Most people find it harder to learn new syntax than to learn new Design Patterns or APIs or frameworks. So you'll initially dislike Emacs-Lisp's syntax; it's virtually guaranteed. Fortunately, it doesn't really have much in the way of syntax; almost everything follows the exact same s-expression form. So you should get past the syntax pretty quickly, and in a few weeks you'll start liking it just fine.

It's Oddly "Zippy"

Emacs-Lisp uses a radically different programming model from other languages. There's a strong (nearly 1:1) correspondence between what you can do in the editor and what you can do in the language. Writing elisp code is very much like scripting your actions in the editor.

For instance, to get the length of the current line (assuming the function doesn't exist), your code will remember where you are, then move the cursor to the beginning of the line, get the buffer position, jump the cursor to the end of the line, get the buffer position there, subtract the two buffer positions, return the cursor to where you were, and then finally return the value. That's a lot of moving around! [You can also use point-at-bol and point-at-eol, but in general, you still zip around a lot.]

As another example, if you wanted to determine whether any lines in the buffer started with a particular regexp, then you'd go look at them! You don't call a function that returns a list of lines in the buffer (though you could write one). You just save your position, go to the beginning of the buffer, and call next-line and looking-at to check each line against the regexp. It's like you've got a little worker-bee version of yourself, doing automatically what you could have done by hand (albeit much more slowly) using editor commands.

Over the years, they've piled up thousands of shortcut functions, and much of the time you wind up programming "normally" by invoking functions on data structures, like in other languages. But it really helps to remember that little worker bee that zips around like Feynman's lonely electron. Your coding will go more smoothly if you keep it in mind.

You can Learn From its Peers

Lastly, it's useful to know that Emacs-Lisp has a lot in common with two other famous Lisp dialects: Common Lisp and Scheme. It's arguably closer to Common Lisp, and in many ways it's inferior to both of them, but learning a little about Common Lisp or Scheme will improve your Emacs-Lisp coding dramatically. And, as it happens, there are far more books published about Common Lisp and Scheme than there are about Emacs Lisp. I'll list a few of my favorites here.

Emacs-Lisp Books

You should start with Richard M. Stallman's book. He wrote Emacs, and he wrote the Gnu Emacs Manual. It's a classic, and probably remains the best book on Emacs to date. It's a good idea to read it just to get an overview of all the things Emacs can do out of the box. Otherwise it'll be hard to know what kinds of Emacs-Lisp programs you can write.

Next, you'll want the all-time classic, Mastering Regular Expressions, by Jeffrey Friedl. Don't leave home without it.

Starting with Emacs 22, the Emacs-Lisp Reference Manual comes bundled with the distribution. I don't know if it's sold in hardcopy anymore, but it's chock-full of critically important information, so you'd do well to read it. (And re-read it periodically. It's a lot of information.)

I do have a couple of books on Emacs Lisp:

An Introduction to Programming in Emacs Lisp (Robert Chassell)
Writing GNU Emacs Extensions (Bob Glickstein)

They're not bad, but I honestly never got much from them. However, your mileage may vary. Go to Amazon, peek through them a bit, and decide for yourself whether they'll be helpful.

Common Lisp Books

There are lots — lots and lots — but these are the ones I personally found most directly relevant to helping me learn Emacs-Lisp:

ANSI Common Lisp (Paul Graham)
On Lisp (Paul Graham) — out of print, but available as a PDF. I printed it out and bound it at FedEx/Kinko's.

I own (and have read) essentially all of the other books on Common Lisp in print today, and the two above got me the furthest towards Emacs-Lisp proficiency. I'm not counting Peter Norvig's AI books, since their focus is AI, not Lisp. They're awesome books, though; I recommend them both highly.

And if you're actually trying to learn Common Lisp to use it (as opposed to applying what you can of it to Emacs), then you'd better get a copy of Peter Siebel's Book. It's essential.

Scheme Books

Again, lots to choose from, and Scheme books tend to be more didactic, so I found they had a bigger impact in terms of ingraining the core ideas of Lisp. Listed in decreasing order of mind-opening wow-ness:

Structure and Interpretation of Computer Programs — a good candidate for the "Best Computer Science Book Ever" award.

The Little Schemer — I worked through every single exercise twice: once in Scheme, once in Emacs-Lisp. Ditto for the sequel, The Seasoned Schemer, and I just started on the brand-new third volume, The Reasoned Schemer.

The Scheme Programming Language — good book, though a bit heavy going, as it's long on concepts and short on explanations. I found it well worth wrestling through, though.

Scheme is a wonderful language, and worth learning in its own right.

Caveats

If Emacs 22 goes on a horrible disk-eating rampage, don't blame me.

As one might expect of alpha software, it's got some bugs and glitches. I've never had anything really scary happen. It hasn't corrupted my data so far, and it has only crashed once or twice (i.e., far less often than the supposedly "stable" releases of XEmacs). But it occasionally does surprising things, like minimizing the entire frame if I try to 'q' (quit) certain read-only windows, or suddenly bringing up a completions buffer when I'm typing along in fundamental mode. They're rare enough not to have bothered me much.

They've also broken backwards-compatibility with several in-house functions and modes, so I've had to do some work to rewrite them. If you rely on proprietary Emacs-Lisp software as part of your job, and you're not proficient with elisp yourself, then you should make sure you have a local guru available before you upgrade to Emacs 22, or some stuff may stop working for you.

This blog entry consists, as usual, of only my very own whimsical opinions. I don't speak for my employer, nor for anyone else's employer, nor for any of the authors cited here, nor for the most excellent development teams working on Emacs, XEmacs, VIM, Eclipse, IntelliJ, Visual Studio, Firefox, and Ruby on Rails. I just speak for me.

If you didn't like this article, please be sure to run over to Reddit and call me stupid there. I'm sure people would hate to miss an opportunity to hear how you've taken such a boldly prominent and decisive stand on some random guy's personal blog.

And if you did like the article — well, go play with Emacs 22!

21 Comments:

Blogger Jarno said...

I've noticed that you have a clear preference for GNU/Emacs over XEmacs. As someone unfamiliar with the differences of these incarnations of Emacs, I'd love to know your reasons to choose specifically GNU/Emacs. I mean, for me, a casual Emacs user, they seem pretty much the same.

Would you care to point out your personal reasons for your choice? I have recently begun to teach myself Emacs-Lisp, so it's not just a question of which has the "best looking flash screen" or anything like that. (I'm also pretty neutral on the "political" side, so I don't care whether it's GNU or not.)

I do realize that discussing the relative merits and weaknesses of GNU/Emacs and XEmacs might evoke harsh reactions. But I take it that you're pretty much used to it. ;-)

3:17 AM, June 11, 2006  
Blogger Steve Yegge said...

XEmacs is great and cool and spiffy and all that. I make sure all my elisp files work under FSF and XEmacs, and I do fire up XEmacs occasionally.

I just find, and maybe it's my personal bad luck, that it crashes all the time. It's been like that since I was trying to use it back on Sun SparcStations. Every year I try using it again, on a new platform, and every year I give up because it crashes. GNU Emacs pretty much never ever crashes (by comparison).

I have various friends that have noticed the same thing, independently, so I'm fairly sure they don't have an if-stevey-crash clause in their code.

I think it comes down to your personal crash tolerance. Mine is low. Restarting Emacs is painful. I leave it running for weeks at a time, with hundreds of buffers. But if you fire up a new XEmacs session every day, then you probably don't mind if it crashes occasionally.

11:13 AM, June 11, 2006  
Blogger Phil said...

I'm so glad to see you've written more about Emacs--when you switched to Blogger I thought you might turn your focus away from what I believe to be the best material on your older Drunken Blog Rants.

I'll admit I have this strange fascination with Emacs blog postings--I guess I'm always trying to find out more about what other people find useful in it. I'm actually kind of surprised I don't see more of it. When I learn something nifty and new, I get excited about it and want to share.

9:08 PM, June 11, 2006  
Blogger Phil said...

Also, regarding stability and the occasional odd behaviour, I've found the emacs-snapshot package that's included with Ubuntu to be completely well-behaved; it treats me as well as Emacs 21 did.

9:11 PM, June 11, 2006  
Blogger Zed said...

I'm a fan of Emacs 22's Xft support.

11:18 AM, June 12, 2006  
Blogger Marc said...

One of my favorite new things in Emacs 22 is the improved macro support. Like for instance, there's a function called kmacro-end-or-call-macro which in my Aquamacs is bound by default to <F4>, intelligently combining two operations on one key. This is a clever way to save key bindings (and the memory required of the user to remember them).

-Marc

10:41 AM, June 13, 2006  
Blogger Chris Parker said...

man, you just compared emacs to XML.

2:33 PM, June 15, 2006  
Blogger chris smith said...

A Note About Note About Emacs Regexps:
The reason for needing to escape all of the control characters in an Emacs regexp is that the Emacs Lisp interpreter sees them prior to the regular expression engine.
You're embedding a DSL, regexps, into another language Emacs Lisp. So you have to pay the 'slash tax'.
If Emacs Lisp picked up a Python triple-quote, you could short-circuit the Emacs Lisp.
For serious work, I'd bring in the excellent PyMacs, and just farm the regex work out to Python, where the syntax is more robust and less cumbersome.

9:40 AM, June 22, 2006  
Blogger Steve Yegge said...

Well, sort of. What you really mean is if it picked up the Python r"raw string" syntax (note the preceding 'r'). You still have to escape backslashes in triple-quoted strings. And the r-strings are a little kludgy, syntax-wise. I'd rather see a paired delimiter, e.g. /some-regexp/ (the way it's done in Perl, Ruby and JavaScript -- a substantial user base), or even better, a formal reader-macro system implemented in elisp so you can simply define your own syntactic constructs for this sort of thing.

I mention PyMacs in the article, and there's a very serious objection, which isn't entirely PyMacs's fault. The problem is that Emacs lacks a JAR-like distribution mechanism via archive files. So if you write a bunch of PyMacs code, you have to tell your users to install PyMacs, and most casual Emacs users lack the sophistication to do even that much. If Emacs introduced "EAR" (emacs archive) files, it would dramatically simplify using languages other than elisp.

Honestly, though, a few fixes to elisp would go a long way towards making it more likeable. Regexps are soooo common; it's amazing they haven't updated the regexp capabilities and made it simpler to embed them in elisp code.

11:50 AM, June 22, 2006  
Blogger glroman said...

Another great use for regexp in Emacs is dired mode. I routinely use this on both Unix and Windows platforms to select and rename groups of files.

C-/ is undo? Thanks -- I've been using C-_ for 25 years!

BTW, great blog.

9:15 PM, September 13, 2006  
Blogger glroman said...

Another great use for regexp in Emacs is dired mode. I routinely use this on both Unix and Windows platforms to select and rename groups of files.

C-/ is undo? Thanks -- I've been using C-_ for 25 years!

9:21 PM, September 13, 2006  
Blogger Aberdeen said...

Sorry for the insignificant detail, but I don't understand the "études" line. How doesn't it finish with [a-z]?

8:40 AM, September 21, 2006  
Blogger mishoo said...

Hey there,

I'm Googling for half an hour to figure out how could I reference to a \(matched\) group outside the function. That is, I want to do a (re-search-backward "\\(fo+\\)" and somehow access what would I do in Perl with $1 (whatever matched). Any hints?

Thanks.
-Mihai

3:33 AM, December 05, 2006  
Blogger mishoo said...

Nevermind, I just found it: http://www.delorie.com/gnu/docs/elisp-manual-21/elisp_577.html

:D

3:36 AM, December 05, 2006  
Blogger klang said...

Damnit! The replace super powers have made me ditch emacs-20.7.1 and upgrade. 20.7.1 has been with me on win95, win2K and winXP. Installed from the same tar-ball for 6 years!

2:48 AM, February 16, 2007  
Blogger giovanni said...

Could it be possible to get the super-power of replace-regexp also in EMACS 21, maybe just upgrading the replace.el file?

Thanks,
Giovanni

2:24 AM, March 05, 2007  
Blogger william said...

Hi All:
I am new to emacs, is there a way to do the search-replace in the rectangle. Or copy from one rectangle and paste to another rectangle? Thanks

William

7:08 PM, March 31, 2007  
Blogger Cymen said...

SICP is online now too:

http://mitpress.mit.edu/sicp/full-text/book/book.html

Thanks for this introduction. I've been on a minimalist streak for some time using vim but I'm game for seeing the other side of the fence and this was an interesting start.

2:37 PM, July 02, 2007  
Blogger lata said...

Loved your writing style; if there were more articles like this, I'd have to say less to convince people as to why I'm using 25+ year old editing technology.

7:04 AM, July 07, 2007  
Blogger Kristian said...

I got some problem with (1 + \#) in replace-regexp functions.

Maybe I missed something but (+ 1 \#) works as well

Great post!

/kritstian

4:08 AM, September 27, 2007  
Blogger Sunnan said...

Kristian, it's 1+ without the space to call the function with the name 1+
not a 1 followed by a +

6:32 AM, October 08, 2007  

<< Home