Is there an equivalent of java.util.regex for “glob” type patterns?

By | July 12, 2019

Question

Is there a standard (preferably Apache Commons or similarly non-viral) library for doing “glob” type matches in Java? When I had to do similar in Perl once, I just changed all the “.” to “.“, the “*” to “.*” and the “?” to “.” and that sort of thing, but I’m wondering if somebody has done the work for me.

Similar question: Create regex from glob expression

Solution

There’s nothing built-in, but it’s pretty simple to convert something glob-like to a regex:

public static String createRegexFromGlob(String glob)
{
    String out = "^";
    for(int i = 0; i < glob.length(); ++i)
    {
        final char c = glob.charAt(i);
        switch(c)
        {
        case '*': out += ".*"; break;
        case '?': out += '.'; break;
        case '.': out += "\."; break;
        case '\': out += "\\"; break;
        default: out += c;
        }
    }
    out += '$';
    return out;
}

this works for me, but I’m not sure if it covers the glob “standard”, if there is one 🙂

Update by Paul Tomblin: I found a perl program that does glob conversion, and adapting it to Java I end up with:

    private String convertGlobToRegEx(String line)
    {
    LOG.info("got line [" + line + "]");
    line = line.trim();
    int strLen = line.length();
    StringBuilder sb = new StringBuilder(strLen);
    // Remove beginning and ending * globs because they're useless
    if (line.startsWith("*"))
    {
        line = line.substring(1);
        strLen--;
    }
    if (line.endsWith("*"))
    {
        line = line.substring(0, strLen-1);
        strLen--;
    }
    boolean escaping = false;
    int inCurlies = 0;
    for (char currentChar : line.toCharArray())
    {
        switch (currentChar)
        {
        case '*':
            if (escaping)
                sb.append("\*");
            else
                sb.append(".*");
            escaping = false;
            break;
        case '?':
            if (escaping)
                sb.append("\?");
            else
                sb.append('.');
            escaping = false;
            break;
        case '.':
        case '(':
        case ')':
        case '+':
        case '|':
        case '^':
        case '$':
        case '@':
        case '%':
            sb.append('\');
            sb.append(currentChar);
            escaping = false;
            break;
        case '\':
            if (escaping)
            {
                sb.append("\\");
                escaping = false;
            }
            else
                escaping = true;
            break;
        case '{':
            if (escaping)
            {
                sb.append("\{");
            }
            else
            {
                sb.append('(');
                inCurlies++;
            }
            escaping = false;
            break;
        case '}':
            if (inCurlies > 0 && !escaping)
            {
                sb.append(')');
                inCurlies--;
            }
            else if (escaping)
                sb.append("\}");
            else
                sb.append("}");
            escaping = false;
            break;
        case ',':
            if (inCurlies > 0 && !escaping)
            {
                sb.append('|');
            }
            else if (escaping)
                sb.append("\,");
            else
                sb.append(",");
            break;
        default:
            escaping = false;
            sb.append(currentChar);
        }
    }
    return sb.toString();
}

I’m editing into this answer rather than making my own because this answer put me on the right track.

Rate this post

Leave a Reply

Your email address will not be published. Required fields are marked *