简体   繁体   中英

How to query an embedded database stored in hdfs from a Mapreduce job?

I am trying to query the GeoLite Database from Hadoop MapReduce mapper to resolve country of an IP address. I tried two approaches:

1.Using File only works in local file system and I receive a file not found exception

File database = new File("hdfs://localhost:9000/input/GeoLite2-City.mmdb"); // <<< HERE
DatabaseReader reader = new DatabaseReader.Builder(database).build();

2.Using streams, but I get this error during runtime

Error: Java Heap Space

Path pt = new Path("hdfs://localhost:9000/input/GeoLite2-City.mmdb");
FileSystem fs = FileSystem.get(new Configuration());

FSDataInputStream stream = fs.open(pt);
DatabaseReader reader = new DatabaseReader.Builder(stream).build();

InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
CityResponse response = null;
try {
    response = reader.city(ipAddress);
} catch (GeoIp2Exception ex) {
    ex.printStackTrace();
    return;
}

My question: is how to query geolite database from mapper in Hadoop?

I solved it by the distributed Cache method, by caching the GeoLite Database file to every mapper in the MapReduce job.

    @Override
      public void setup(Context context)

      {
        Configuration conf = context.getConfiguration();

        try {

          cachefiles = DistributedCache.getLocalCacheFiles(conf);

          File database = new File(cachefiles[0].toString()); //

          reader = new DatabaseReader.Builder(database).build();

        } catch (IOException e) {
          e.printStackTrace();
        }

      }
public void map(Object key, Text line, Context context) throws IOException,
      InterruptedException {

                     .....

InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
      CityResponse response = null;
      try {
        response = reader.city(ipAddress);
      } catch (GeoIp2Exception ex) {
        ex.printStackTrace();
        return;
      }
                     ......

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM