Let’s see how to implement (and have fun with!) a command-line tool to find rhyming words.
Quick Tutorial
1) Get the rhyme-grep/
folder and enter it:
$ git clone https://github.com/massimo-nazaria/rhyme-grep.git
$ cd rhyme-grep/
2) Make rhymp.sh
executable:
$ chmod +x rhymp.sh
3) Find words that rhyme with “leaves” in Leaves of Grass by Walt Whitman:
$ ./rhymp.sh "leaves" leaves-of-grass.txt
Output:
4) Let’s print only the rhyming words by using the -o
argument, and at the same time remove possible duplicates with sort -fu
:
$ cat leaves-of-grass.txt | ./rhymp.sh "leaves" -o | sort -fu
believes
heaves
perceives
receives
recitatives
sleeves
thieves
That’s quite funny (and useful)!
How it Works
Rhyme-Grep extracts the following data from the CMU Pronouncing Dictionary:
- The English pronunciation (namely a list of phonemes) of the input word; as well as
- The list of dictionary words that have the same pronounciation phonemes as the input word starting from their primary accent.
Then it runs Grep to search for the found rhyming words in the input text.
ALGORITHM OVERVIEW
Let’s see how to search for words that rhyme with leaves in leaves-of-grass.txt
.
Step 1: Input word pronunciation
Extract from the CMU dictionary the pronounciation of the word leaves, which is denoted by the list of phonemes L IY1 V Z
.
Note the primary accent falls on the phoneme IY
, which is marked by 1
in the list.
Step 2: List of rhyming dictionary words
Extract from the dictionary the list of words that rhyme with leaves, namely the words whose pronunciation ends with IY1 V Z
.
Step 3: Rhyming words from the input text
Search for the rhyming words within leaves-of-grass.txt
.
Let’s say the rhyming word are:
- eves,
- perceives.
Run Grep as follows:
$ cat leaves-of-grass.txt | grep -E -wi --color "eaves|perceives"
Implementation
For additional info, please get the code from GitHub and have fun playing with it!
Rhyme-Grep was initially inspired by Semantic Grep: a word2vec-based tool that searches for words with similar meanings to a given word.