[英]IBM Watson Speech to text handling of large files
我一直在嘗試使用BlueMix SpeechToText Java庫,特別是com.ibm.watson.developer_cloud.speech_to_text.v1中的SpeechToText類。
我有很長的wav文件,我想轉換為文本。 文件大約是70MB。 目標是使用java API( http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/?java#recognize )來識別文本。 我意識到,自翻譯結束后,我需要每隔30秒檢查一次呼叫的狀態,我只有30秒的時間來檢索最終結果。
為了在使用RESTful API時這樣做,我需要創建一個會話,然后將我的搜索引擎綁定到所述會話,以便我可以查詢在會話上運行的作業的狀態。
我試圖創建一個會話但會話永遠不可用。 我已經驗證它似乎適用於提供的webapp( https://stream.watsonplatform.net/speech-to-text/api/v1/sessions?Method=GET )。
此外,我已經嘗試編寫自己的客戶端,我試圖設置從會話創建中檢索到的cookie,但這也不起作用。
我還試圖通過安全的websockets連接,但無法實現成功的連接。
下面是我一直在使用的一些示例代碼。
有任何想法嗎?
public class Speech2Text extends WatsonService {
private static final Logger logger = LoggerFactory .getLogger(Speech2Text.class);
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException, InterruptedException {
Speech2Text s2t = new Speech2Text();
s2t.httpClient();
// try {
// s2t.webSocketClient();
// } catch (URISyntaxException e) {
// TODO Auto-generated catch block
// e.printStackTrace();
// } catch (IOException e) {
// TODO Auto-generated catch block
// e.printStackTrace();
// }
}
public void httpClient() throws FileNotFoundException,UnsupportedEncodingException {
logger.info("Running http client");
final Stopwatch stopwatch = Stopwatch.createStarted();
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("XXXXXX","XXXXX");
List<SpeechModel> models = service.getModels();
for (SpeechModel model : models) {
logger.info(model.getName());
}
SpeechSession session = service.createSession("en-US_NarrowbandModel");
System.out.println(session.toString());
SessionStatus status = service.getRecognizeStatus(session);
logger.info(status.getModel());
logger.info(service.getEndPoint());
File audio = new File("/home/baaron/watson-bluemix/answer_06.wav");
Map params = new HashMap();
params.put("audio", audio);
params.put("content_type", "audio/wav");
params.put("continuous", "true");
params.put("session_id", session.getSessionId());
logger.info(service.getEndPoint());
SpeechResults transcript = service.recognize(params);
PrintWriter writer = new PrintWriter("/home/baaron/watson-bluemix/PCCJPApart1test.transcript", "UTF-8");
writer.println(transcript.toString());
SessionStatus status1 = service.getRecognizeStatus(session.getSessionId());
System.out.println(status1);
service.deleteSession(session.getSessionId());
writer.close();
stopwatch.stop();
logger.info("Processing took: " + stopwatch + ".");
}
public void webSocketClient() throws URISyntaxException, IOException,
InterruptedException {
logger.info("Running web socket client");
String encoding = new String(Base64.encodeBase64String("XXXXXXXXXX".getBytes()));
HttpPost httppost = new HttpPost( "https://stream.watsonplatform.net/authorization/api/v1/token?url=https://stream.watsonplatform.net/speech-to-text/api");
httppost.setHeader("Authorization", "Basic " + encoding);
System.out.println("executing request " + httppost.getRequestLine());
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpResponse response = httpclient.execute(httppost);
HttpEntity entity = response.getEntity();
logger.info(response.getStatusLine().getReasonPhrase());
WebSocketImpl.DEBUG = true;
BufferedReader reader = new BufferedReader(new InputStreamReader( entity.getContent()));
StringBuilder out = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
out.append(line);
}
String token = out.toString();
final WebSocketClient client = new WebSocketClient(
new URI("wss://stream.watsonplatform.net/speech-to-text-beta/api/v1/recognize?watson-token=" + token)) {
@Override
public void onMessage(String message) {
JSONObject obj = new JSONObject(message);
// String channel = obj.getString("channel");
}
@Override
public void onOpen(ServerHandshake handshake) {
System.out.println("opened connection");
}
@Override
public void onClose(int code, String reason, boolean remote) {
System.out.println("closed connection");
}
@Override
public void onError(Exception ex) {
ex.printStackTrace();
}
};
// open websocket
SSLContext sslContext = null;
try {
sslContext = SSLContext.getInstance("TLS");
sslContext.init(null, null, null);
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
} catch (KeyManagementException e) {
e.printStackTrace();
}
client.setWebSocketFactory(new DefaultSSLWebSocketClientFactory(
sslContext));
logger.info("CONNECTED: " + client.connectBlocking());
JSONObject obj = new JSONObject();
obj.put("action", "start");
obj.put("content-type", "audio/wav");
client.send(obj.toString());
logger.info("Done");
}
}
在https://stream.watsonplatform.net/speech-to-text/api/v1/sessions上進行GET將不會列出您的會話,即使它們已創建。
檢查您是否有會話的方法是在https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/yourSessionId上進行GET
如果會話在那里,您將獲得200響應,否則為404.請記住為此啟用cookie。
如果你想要的是轉錄音頻文件你可以做:
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username"}, "{password}");
RecognizeOptions options = new RecognizeOptions.Builder()
.contentType("audio/wav")
.continuous(true)
.model("en-US_NarrowbandModel")
.inactivityTimeout(-1) // Seconds after which the connection is closed if no audio is detected
.build();
String[] files = {"file1.wav", "file2.wav"};
for (String file : files) {
SpeechResults results = service.recognize(new File(file), options).execute();
System.out.println(results); // print results(you could write them to a file)
}
確保使用最新版本的Java SDK。
Maven的
<dependency>
<groupId>com.ibm.watson.developer_cloud</groupId>
<artifactId>java-sdk</artifactId>
<version>3.8.0</version>
</dependency>
搖籃
compile 'com.ibm.watson.developer_cloud:java-sdk:3.8.0'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.