... we have opinions about everything!

Category: Regular Expressions

What’s in a word? (\w regexp shorthand class)

Post author By Admin
Post date January 4, 2018
No Comments on What’s in a word? (\w regexp shorthand class)

Well not just letters of the alphabet it seems.

Take the case of the logstash pattern WORD:

WORD \b\w+\b

but the shorthand character class \w matches [a-zA-Z0-9_] – notice the digits and underscore! So WORD is not really a WORD!

REALWORD \b[a-zA-Z]+\b

would be better … although I suppose things might be different in Unicode. But generally log files may be Unicode but frequently the data itself is still effectively ASCII.

Share this: