I am trying to query the GeoLite Database from Hadoop MapReduce mapper to resolve country of an IP address. I tried two approaches:
1.Using File
only works in local file system and I receive a file not found exception
File database = new File("hdfs://localhost:9000/input/GeoLite2-City.mmdb"); // <<< HERE
DatabaseReader reader = new DatabaseReader.Builder(database).build();
2.Using streams, but I get this error during runtime
Error: Java Heap Space
Path pt = new Path("hdfs://localhost:9000/input/GeoLite2-City.mmdb");
FileSystem fs = FileSystem.get(new Configuration());
FSDataInputStream stream = fs.open(pt);
DatabaseReader reader = new DatabaseReader.Builder(stream).build();
InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
CityResponse response = null;
try {
response = reader.city(ipAddress);
} catch (GeoIp2Exception ex) {
ex.printStackTrace();
return;
}
My question: is how to query geolite database from mapper in Hadoop?
@Override
public void setup(Context context)
{
Configuration conf = context.getConfiguration();
try {
cachefiles = DistributedCache.getLocalCacheFiles(conf);
File database = new File(cachefiles[0].toString()); //
reader = new DatabaseReader.Builder(database).build();
} catch (IOException e) {
e.printStackTrace();
}
}
public void map(Object key, Text line, Context context) throws IOException,
InterruptedException {
.....
InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
CityResponse response = null;
try {
response = reader.city(ipAddress);
} catch (GeoIp2Exception ex) {
ex.printStackTrace();
return;
}
......
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.