If you don't know what acronyms are, it is simply a words formed from the initial letters of others. For instance, the words "pot" and "top" are acronyms; so are the words "slow" and "owls".
When I read it, I thought "This could be done with unix tools and pipes". Actually, I didn't find a way of doing it efficiently with native unix tools. The reason being that there's no horizontal sort implementation in unix. You could write something like this:
for i in `cat dict | sed 'p' | sed 'N;s/\n/*-/g'`; do echo $i |
cut -d "*" -f 2 | sed "s/\(.\)/\1-/g" | tr "-" "\n" | sort |
tr -d "\n"; echo "$i" | cut -d "*" -f 2 ; done | sort
But it won't be very efficient and it's very ugly. I didn't spend too much time thinking of a better way to do it unix-style. If you have one, please post it on the comments =)
Instead, I wrote a small python script
#!/usr/bin/env python
import sys
if __name__ == '__main__':
f = open(sys.argv[1])
for line in f.xreadlines():
line = line.strip()
print '%s-%s' % (''.join(sorted(line)), line)
and I executed:
./expand.py dict2 | sort
which runs in about 0.47 seconds. Then, we could add another step to pretty print the acronyms, but I wasn't interested in that.
Unfortunately, the only thing that remains 'unix' about this solution is the use of sort and one pipe. This post is not so much to show my findings but to look for a better solution. Does anyone have a suggestion?
|
No comments:
Post a Comment