在Scala中導入avro架構

Question

我正在編寫一個簡單的推特程序，我正在使用Kafka閱讀推文，並希望使用Avro進行序列化。 到目前為止，我剛剛在Scala中設置了twitter配置，現在想要使用此配置閱讀推文。

如何導入我的程序中tweets.avsc文件中定義的以下avro架構？

{
    "namespace": "tweetavro",
    "type": "record",
    "name": "Tweet",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "text", "type": "string"}
    ]
}

我在網上看了一些示例，其中顯示了一些類似於import tweetavro.Tweet導入Scala中的模式，以便我們可以像使用它一樣使用它

def main (args: Array[String]) {
    val twitterStream = TwitterStream.getStream
    twitterStream.addListener(new OnTweetPosted(s => sendToKafka(toTweet(s))))
    twitterStream.filter(filterUsOnly)
  }

  private def toTweet(s: Status): Tweet = {
    new Tweet(s.getUser.getName, s.getText)
  }

  private def sendToKafka(t:Tweet) {
    println(toJson(t.getSchema).apply(t))
    val tweetEnc = toBinary[Tweet].apply(t)
    val msg = new KeyedMessage[String, Array[Byte]](KafkaTopic, tweetEnc)
    kafkaProducer.send(msg)
  }

我在pom.xml使用以下插件並遵循相同的操作

<!-- AVRO MAVEN PLUGIN -->
<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>1.7.7</version>
  <executions>
    <execution>
      <phase>generate-sources</phase>
      <goals>
        <goal>schema</goal>
      </goals>
      <configuration>
        <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
        <outputDirectory>${project.basedir}/src/main/scala/</outputDirectory>
      </configuration>
    </execution>
  </executions>
</plugin>


<!-- MAVEN COMPILER PLUGIN -->
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-compiler-plugin</artifactId>
  <configuration>
    <source>1.7</source>
    <target>1.7</target>
  </configuration>
</plugin>

完成所有這些后，仍然我不能做import tweetavro.Tweet

anayone可以幫忙嗎？

謝謝！

Answer 1

你也可以使用avro4s 。 根據模式定義案例類（或生成它）。 我們稱之為Tweet 。 然后創建一個AvroOutputStream ，它將從案例類推斷出架構，並用於序列化實例。 然后我們可以寫入一個字節數組，並將其發送給kafka。 例如：

val tweet: Tweet= ... // the instance you want to serialize

val out = new ByteArrayOutputStream // we collect the serialized output in this
val avro = AvroOutputStream[Tweet](out) // you specify the type here as well
avro.write(tweet)
avro.close()

val bytes = out.toByteArray
val msg = new KeyedMessage[String, Array[Byte]](KafkaTopic, bytes)
kafkaProducer.send(msg)

Answer 2

我建議使用Avrohugger。 就Avro的Scala案例類而言，它是塊上的新孩子，但支持我需要的一切，我真的很喜歡它不是基於宏的，所以我實際上可以看到生成的內容。

維護者非常棒，可以使用並非常接受貢獻和反饋。 它不是也可能永遠不會像官方Java代碼那樣功能豐富，但它將滿足大多數人的需求。

目前，它缺少對聯合（除可選類型之外）和遞歸類型的支持。

SBT插件非常有效，如果您想快速了解它對Avro架構的作用，可以使用新的Web界面：

https://avro2caseclass.herokuapp.com/

更多細節在這里：

https://github.com/julianpeeters/avrohugger

Answer 3

您應該首先將該模式編譯為類。 我不確定Scala中是否有可用於生產的Avro庫，但您可以為Java生成一個類並在Scala中使用它：

java -jar /path/to/avro-tools-1.7.7.jar compile schema tweet.avsc .

根據您的需要更改此行，您應該獲得此工具生成的tweetavro.Tweet類。 然后，您可以將其放入項目中，並以您剛才描述的方式使用。

更多信息在這里

upd：僅供參考，Scala中似乎有一個庫，但我以前從未使用它

在Scala中導入avro架構

問題描述

3 個解決方案

解決方案1
3 2015-12-17 11:12:03

解決方案2
2 2015-08-13 21:36:05

解決方案3
1 已采納 2015-08-05 06:55:51

在Scala中導入avro架構

問題描述

3 個解決方案

解決方案1 3 2015-12-17 11:12:03

解決方案2 2 2015-08-13 21:36:05

解決方案3 1 已采納 2015-08-05 06:55:51

解決方案1
3 2015-12-17 11:12:03

解決方案2
2 2015-08-13 21:36:05

解決方案3
1 已采納 2015-08-05 06:55:51