Wednesday, April 4, 2012

A Regular Expression HashMap Implementation in Java

Below is an implementation of a Regular Expression HashMap. It works with key-value pairs which the key is a regular expression. It compiles the key (regular expression) while adding (i.e. putting), so there is no compile time while getting. Once getting an element, you don't give regular expression; you give any possible value of a regular expression. 

As a result, this behaviour provides to map numerous values of a regular expression into the same value. The class does not depend to any external libraries, uses only default java.util. So, it will be used simply when a behaviour like that is required. 

  1. import java.util.ArrayList;
  2. import java.util.HashMap;
  3. import java.util.regex.Pattern;
  4. /**
  5.  * This class is an extended version of Java HashMap
  6.  * and includes pattern-value lists which are used to
  7.  * evaluate regular expression values. If given item
  8.  * is a regular expression, it is saved in regexp lists.
  9.  * If requested item matches with a regular expression,
  10.  * its value is get from regexp lists.
  11.  *
  12.  * @author cb
  13.  *
  14.  * @param <K> : Key of the map item.
  15.  * @param <V> : Value of the map item.
  16.  */
  17. public class RegExHashMap<K, V> extends HashMap<K, V> {
  18.     // list of regular expression patterns
  19.     private ArrayList<Pattern> regExPatterns = new ArrayList<Pattern>();
  20.     // list of regular expression values which match patterns
  21.     private ArrayList<V> regExValues = new ArrayList<V>();
  22.     /**
  23.      * Compile regular expression and add it to the regexp list as key.
  24.      */
  25.     @Override
  26.     public V put(K key, V value) {
  28.         regExPatterns.add(Pattern.compile(key.toString()));
  29.         regExValues.add(value);
  30.         return value;
  31.     }
  32.     /**
  33.      * If requested value matches with a regular expression,
  34.      * returns it from regexp lists.
  35.      */
  36.     @Override
  37.     public V get(Object key) {
  38.         CharSequence cs = new String(key.toString());
  40.         for (int i = 0; i < regExPatterns.size(); i++) {
  41.             if (regExPatterns.get(i).matcher(cs).matches()) {
  43.                 return regExValues.get(i);
  44.             }
  45.         }
  46.         return super.get(key);
  47.     }
  48. }


  1. You need to also Remove the patterns or this could become a memory leak.

    1. Memory leak can not seem to be possible in this design. Can you explain the problem more detailed?

  2. You are mixing the data structures. Hashmap is to be a data structure whose complexity is almost constant i.e. O(1), considering collisions arising from hash function. If you put a for statement inside the get method, the access time complexity turns into O(n), which is the same complexity of an ordinary list. It would be better use an apart Map for data and another one for save compiled regex patterns (like a pool) and use them combined to accomplish the desired task. Data structures are serious science. Don't invent a new one without serious research.

    1. You are right about O(1), but this is not a standard HashMap. RegExp part of HashMap requires O(n) of course. This implementation provides an easy and encapsulated way of mapping expressions of patterns to values. Reducing O(n) complexity is another problem and there may be many different solutions about that.

  3. Why is this even called a Map implementation or more precisely a HashMap. This has nothing to do with a Map structure. You could have just as easily looked at and study their implementation prior to attempting this. There are probably some uses for your code but it should not muddy the waters for those looking to learn, but thank you for your effort.

    1. Is doesn't completely work as a Java HashMap in the underlying structure as stated before, but its usage is the same as a HashMap for the user because of encapsulation.
      And what is more to the point, it maps keys to values.

  4. The get method would return the value for the first matched key which seems to be a bug as if there could be more matching keys in the regExPatterns list that would get skipped out. Thus extending a HashMap doesn't seem to be a good idea. Also was wondering how super.get(key) is supposed to work if the overriding put method is not adding the key/values to the parent.

  5. It would be better use an apart Map for information and another one for preserve collected regex styles and use them mixed to achieve the preferred process.

  6. The concept of a map from my point of view is to have random access, what I mean is that you could guess where is something without checking all the collection.

    What I think you want is to recover from a collection an instance that match a Regexp so, maybe you could use a List of Objects and a Comparator where the compare sentence uses a Regexp . Then you could do a Collections.binarySearch() to find the target value.

    An example of use could be to handle events from a state machine so you could have some concrete error handlers and one generic one (so if the concrete one match then it will handle the event but if there is no specific handler then the general handler will manage the event)... something like error.login handler and an error.* handler.