http://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds
The link shows how to change txt file into RDD, and then change to Dataframe.
So how to deal with binary file ?
Ask for an example ,Thank you very much .
There is a similar question without answer here : reading binary data into (py) spark DataFrame
To be more detail, I don't know how to parse the binary file .for example , I can parse txt file into lines or words like this:
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
});
It seems that I just need the API that could parse the binary file or binary stream like this way:
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.bin").map(
new Function<String, Person>() {
public Person call(/*stream or binary file*/) throws Exception {
/*code to construct every row*/
return person;
}
});
EDIT: The binary file contains structure data (relational database 's table,the database is a self-made database) and I know the meta info of the structure data.I plan to change the structure data into RDD[Row].
And I could change every thing about the binary file when I use FileSystem
's API ( http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html ) to write the binary stream into HDFS .And The binary file is splittable. I don't have any idea to parse the binary file like the example code above . So I cann't try anything so far.
There is a binary record reader that is already available for spark (I believe available in 1.3.1, atleast in the scala api).
sc.binaryRecord(path: string, recordLength: int, conf)
Its on you though to convert those binaries to an acceptable format for processing.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.