Delete a row from a table based on the value [Accumulo]

Question

I have a table that is currently set up like so:

rowId : colFam: colQual -> value

in001 : user : name -> erp
in001 : user : age -> 23
in001 : group : name -> employee
in001 : group : name -> developer

I can't seem to think of a way to delete one of the group entries, or change it for that matter. Hypothetically I want to remove the row with employee, because I am now a manager. Adding is obvious, but I can't seem to figure out how to access employee since the 2 groups have the same colFam and colQual .

I know mutation.putDelete(colFam,colQual) but that doesn't apply here since the result of that would be deleting both. Or if I was to just scan each row and get the key value pairs back like

for(Entry<Key,Value> e: scanner){
    e.getValue().toString() // atleast I can access it here
}

But even then, how to know what to delete? Is it just a flaw in my design of the tables?

Answer 1

While Accumulo's Key-Value schema does allow you to do this, it's problematic as you've found. The original intent of the value is that it can change over time, with versions of that Value being uniquely identified by the timestamp portion of the Key (assuming all other parts of the Key are equivalent). By turning off the VersioningIterator, you can keep a historical record of the Values for a Key.

The most common approach to this problem is to use some serialized data structure to store all "group names" in one value. A simple approach is a CSV "employee,developer", and your update would then be "employee,developer,manager". You can get fancier by tools like Hadoop Writable, Google Protocol Buffers, or Apache Thrift (or many others) to get a more compact representation, easier programmatic access, and backwards compatibility.

Answer 2

It is possible to delete exactly the row

in001 : group : name -> employee

by using: compact and a custom filter which excludes exactly this value from compaction. (Not tested but should work.) Use:

IteratorSetting config = new IteratorSetting(10, "excludeTermFilter", ExcludeTermFilter.class);
config.setTermToExclude("group","name","employee");
List<IteratorSetting> filterList = new ArrayList<IteratorSetting>();
filterList.add(config);
connector.tableOperations().compact(tableName, startRow, endRow, filterList, true, false);

with the according values and this custom filter (based on GrepIterator ):

public class ExcludeTermFilter extends Filter {    
  private byte termToExclude[];
  private byte columnFamily[];
  private byte columnQualifier[];
  @Override
  public boolean accept(Key k, Value v) {
    return !(match(v.get(),termToExclude) &&
             match(k.getColumnFamilyData(),columnFamily) &&
             match(k.getColumnQualifierData(),columnQualifier) 
            );
  }

  private boolean match(ByteSequence bs, byte[] term) {
    return indexOf(bs.getBackingArray(), bs.offset(), bs.length(), term) >= 0;
  }

  private boolean match(byte[] ba, byte[] term) {
    return indexOf(ba, 0, ba.length, term) >= 0;
  }

  // copied code below from java string and modified    
  private static int indexOf(byte[] source, int sourceOffset, int sourceCount, byte[] target) {
    byte first = target[0];
    int targetCount = target.length;
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset; i <= max; i++) {
      /* Look for first character. */
      if (source[i] != first) {
        while (++i <= max && source[i] != first)
          continue;
      }

      /* Found first character, now look at the rest of v2 */
      if (i <= max) {
        int j = i + 1;
        int end = j + targetCount - 1;
        for (int k = 1; j < end && source[j] == target[k]; j++, k++)
          continue;

        if (j == end) {
          /* Found whole string. */
          return i - sourceOffset;
        }
      }
    }
    return -1;
  }

  @Override
  public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
    GrepIterator copy = (GrepIterator) super.deepCopy(env);
    copy.termToExclude = Arrays.copyOf(termToExclude, termToExcludelength);
    copy.columnFamily = Arrays.copyOf(columnFamily, termToExcludelength);
    copy.columnQualifier = Arrays.copyOf(columnQualifier, termToExcludelength);
    return copy;
  }

  @Override
  public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException {
    super.init(source, options, env);
    termToExclude = options.get("etf.term").getBytes(UTF_8);
    columnFamily = options.get("etf.family").getBytes(UTF_8);
    columnQualifier = options.get("etf.qualifier").getBytes(UTF_8);
  }

  /**
   * Encode the family, qualifier and termToExclude as an option for a ScanIterator
   */
  public static void setTermToExclude(IteratorSetting cfg, String family, String qualifier, String termToExclude) {
    cfg.addOption("etf.family", family);
    cfg.addOption("etf.qualifier", qualifier);
    cfg.addOption("etf.term", termToExclude);
  }
}

Answer 3

Or, you can use a different schema

rowId : colFam: colQual -> value

in001 : user : name -> erp 
in001 : user : age -> 23
in001 : group/0 : name -> employee
in001 : group/1 : name -> developer

Or maybe

rowId : colFam: colQual -> value

in001 : user : name -> erp 
in001 : user : age -> 23
in001 : group : 0/name -> employee
in001 : group : 1/name -> developer

This is, for 'has-many' relationships you introduce a key for each one (either in colFamily or colQualifier) allowing you to manipulate each of them independently.

Delete a row from a table based on the value [Accumulo]

Question

3 answers

solution1
1 2015-08-05 16:50:53

solution2
1 2015-08-19 11:20:07

solution3
0 2015-12-29 06:09:01

Delete a row from a table based on the value [Accumulo]

Question

3 answers

solution1 1 2015-08-05 16:50:53

solution2 1 2015-08-19 11:20:07

solution3 0 2015-12-29 06:09:01

solution1
1 2015-08-05 16:50:53

solution2
1 2015-08-19 11:20:07

solution3
0 2015-12-29 06:09:01