Saturday, October 8, 2011

Work the Shell - Mad Libs Generator, Tweaks and Hacks

By Dave Taylor In Linux Journal

We continue building a Mad Libs tool and slowly come to realize that it's a considerably harder problem than can be neatly solved in a 20-line shell script.
Last month, I ended with a script that could take an arbitrary set of sentences and randomly select, analyze and replace words with their parts of speech with the intention of creating a fun and interesting Mad Libs-style puzzle game. With a few tweaks, giving it a simple few sentences on party planning, we get something like this:
If you're ((looking:noun)) [for] a fun ((way:noun)) 
[to] celebrate your next ((birthday:noun)) how 
((about:adjective)) a pirate-themed costume
party? Start by sending ((invitations:noun)) in the 
form of ((a:noun))  ((treasure:noun)) 
{map} with {X} ((marking:noun)) {the} ((location:noun)) 
[of] your house, then {put} {a} sign on the ((front:noun)) 
((door:noun)) [that] ((reads:noun)) "Ahoy, mateys" {and}
((fill:noun)) [the] ((house:noun)) [with] ((lots:noun)) 
of ((pirate:noun)) ((booty:noun))


In the current iteration of the script, it marks words chosen but discarded as being too short with {}, words where it couldn't unambiguously figure out the part of speech with [] and words that have what we defined as uninteresting parts of speech with <>.
If we display them as regular words without any indication that they've been rejected for different reasons, here's what we have left:
If you're ((looking:noun)) for a fun ((way:noun)) 
to celebrate your next ((birthday:noun)) how 
((about:adjective)) a pirate-themed costume party?
Start by sending ((invitations:noun)) in the form of 
((a:noun)) buried ((treasure:noun)) map with X 
((marking:noun)) the ((location:noun)) of your
house, then put a sign on the ((front:noun)) 
((door:noun)) that ((reads:noun)) "Ahoy, mateys" 
and ((fill:noun)) the ((house:noun)) with
((lots:noun)) of ((pirate:noun)) ((booty:noun))

Next, let's look at the output by simply blanking out the words we've chosen:
If you're ___ for a fun ___ to celebrate your next 
___ how ___ a pirate-themed costume party? Start 
by sending ___ in the form of ___ buried ___ map 
with X ___ the ___ of your house, then put a sign on 
the ___ ___ that ___ "Ahoy, mateys" and ___ the ___ 
with ___ of ___ ___.

It seems like too many words are being replaced, doesn't it? Fortunately, that's easily tweaked.
What's a bit harder to tweak is that there are two bad choices that survived the heuristics: “a” (in “form of a buried treasure map”) and “about” (in “how about a pirate-themed costume party?”). Just make three letters the minimum required for a word that can be substituted? Skip adjectives?
For the purposes of this column, let's just proceed because this is the kind of thing that's never going to be as good as a human editor taking a mundane passage of prose and pulling out the potential for amusing re-interpretation.
Prompting for Input
The next step in the evolution of the script is to prompt users for different parts of speech, then actually substitute those for the original words as the text passage is analyzed and output.
There are a couple ways to tackle this, but let's take advantage of tr and fmt to replace all spaces with carriage returns, then reassemble them neatly into formatted text again.
The problem is that both standard input and standard output already are being mapped and redirected: input is coming from the redirection of an input file, and output is going to a pipe that reassembles the individual words into a paragraph.
This means we end up needing a complicated solution like the following:
/bin/echo -n "Enter a ${pos}: " > /dev/tty
read newword < /dev/tty
echo $newword


We have to be careful not to redirect to /dev/stdout, because that's redirected, which means that a notation like &>1 would have the same problem of getting our input and output hopelessly muddled.
Instead, it actually works pretty well right off the bat:
$ sh madlib.sh < madlib-sample-text-2
Enter a noun: Starbucks
Enter a adjective: wet
Enter a adjective: sticky
Enter a noun: jeans
Enter a noun: dog
Enter a noun: window
Enter a noun: mouse
Enter a noun: bathroom
Enter a noun: Uncle Mort


That produced the following result:
If you're (( Starbucks )) for a fun way to celebrate 
your (( wet )) birthday, how (( sticky )) a pirate-themed 
costume (( jeans )) Start by sending invitations in the 
(( dog )) of a buried treasure map with X marking the 
(( window )) of your house, then put a (( mouse )) on 
the front (( bathroom )) that reads "Ahoy mateys" and fill 
the house with lots of pirate (( Uncle Mort ))

Now let's add some prompts, because if you're like me, you might not immediately remember the difference between a verb and an adjective. Here's what I came up with:
verb: an action word (eat, sleep, drink, jump)
noun: a person, place or thing (dog, Uncle Mort, Starbucks)
adjective: an attribute (red, squishy, sticky, wet)

Instead of just asking for the part of speech, we can have a simple case statement to include a useful prompt:
case $pos in
  noun ) prompt="Noun (person, place or thing: 
  ↪dog, Uncle Mort, Starbucks)" ;;
  verb ) prompt="Verb (action word: eat, 
  ↪sleep, drink, jump)" ;;
  adjective ) prompt="Adjective (attribute: red, 
  ↪squishy, sticky, wet)" ;;
  * ) prompt="$pos" ;;
esac
/bin/echo -n "${prompt}: " > /dev/tty

One more thing we need to add for completeness is to detect when we have plural versus singular, particularly with nouns. This can be done simply by looking at whether the last letter of a word is an s. It's not 100% accurate, but for our purposes, we'll slide with it being pretty good:
plural=""
if [ "$(echo $word | rev | cut -c1)" = "s" ] ; then
  plural="Plural ";
fi

Then, just modify the prompt appropriately:
/bin/echo -n "$plural${prompt}: " > /dev/tty
But, There Are Problems
Looking back at what we've done, however, there are a couple problems. The most important is that although we have a tool that identifies part of speech, it's not particularly accurate, because it turns out that many words can be identified properly based only on their use and context. A grammarian already will have identified some of the problems above! Even more than that, I suspect that however much we hack the script to make smarter word selections and identify context, the fact is that creating a really great Mad Libs involves human intervention. Given an arbitrary sentence, there are words that can be replaced to make it funny, and others that just make it incomprehensible.
Now, it wouldn't be too much to have a somewhat less ambitious program that understood a Mad Libs type of markup language and prompted as appropriate, reassembling the results after user input. Perhaps “The in stays mainly in the plain”, which turns into:
Noun (person, place or thing):
Noun (a place):

But, that I will leave as (ready for it?) an exercise for the reader!
Note: Mad Libs is a registered trademark of Penguin Group USA.
Dave Taylor has been hacking shell scripts for a really long time, thirty years. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

No comments: