简体繁体 English

使用 JavaSound API 从麦克风阵列同步录音

[英]Syncronized recording from a microphone array using the JavaSound API

原文 2019-08-21 07:19:34 3 2 java/ microphone/ javasound

I've gone through the tutorials for the Java Sound API and I've successfully read off data from my microphone.我已经阅读了 Java Sound API 的教程，并且成功地从我的麦克风中读取了数据。

I would now like to go a step further and get data synchronously from multiple microphones in a microphone array (like a PS3 Eye or Respeaker)我现在想更进一步，从麦克风阵列中的多个麦克风（如 PS3 Eye 或 Respeaker）同步获取数据

I could get a TargetDataLine for each microphone and open/start/write the input to buffers - but I don't know how to do this in a way that will give me data that I can then line up time-wise (I would like to eventually do beamforming)我可以为每个麦克风获取一个TargetDataLine并打开/启动/写入缓冲区的输入 - 但我不知道如何以一种方式执行此操作，以便为我提供数据，然后我可以按时间排列（我想最终进行波束成形）

When reading from something like ALSA I would get the bytes from the different microphone simultaneously, so I know that each byte from each microphone is from the same time instant - but the Java Sound API seems to have an abstration that obfuscates this b/c you are just dumping/writing data out of separate line buffers and processing it and each line is acting separately.当从 ALSA 之类的东西中读取时，我会同时从不同的麦克风中获取字节，所以我知道每个麦克风的每个字节都来自同一时间 - 但 Java Sound API 似乎有一个抽象，混淆了这个 b/c 你只是从单独的行缓冲区中转储/写入数据并对其进行处理，并且每行都单独执行。 You don't interact with the whole device/mic-array at once您不会一次与整个设备/麦克风阵列进行交互

However I've found someone who managed to do beamforming in Java with the Kinect 1.0 so I know it should be possible.但是，我发现有人设法使用 Kinect 1.0在 Java 中进行波束成形，所以我知道这应该是可能的。 The problem is that the secret sauce is inside a custom Mixer object inside a .jar that was pulled out of some other software.. So I don't have any easy way to figure out how they pulled it off问题是秘诀是在.jar中的自定义 Mixer 对象中，该对象是从其他软件中提取出来的。

2 个解决方案

You will only be able to align data from multiple sources with the time synchronous accuracy to perform beam-forming if this is supported by the underlying hardware drivers.如果底层硬件驱动程序支持，您将只能以时间同步精度对齐来自多个源的数据以执行波束成形。

If the underlying hardware provides you with multiple, synchronised, data-streams (eg recording in 2 channels - in stereo), then your array data will be time synchronised.如果底层硬件为您提供多个同步的数据流（例如在 2 通道中录制 - 立体声），那么您的阵列数据将是时间同步的。

If you are relying on the OS to simply provide you with two independent streams, then maybe you can rely on timestamping.如果您依赖操作系统简单地为您提供两个独立的流，那么也许您可以依赖时间戳。 Do you get the timestamp of the first element?你得到第一个元素的时间戳吗？ If so, then you can re-align data by dropping samples based on your sample rate.如果是这样，那么您可以通过根据您的采样率删除样本来重新对齐数据。 There may be a final difference (delta-t) that you will have factor in to your beam-forming algorithm.您可能会在波束成形算法中考虑最终差异 (delta-t)。

Reading about the PS3 Eye (which has an array of microphones), you will be able to do this if the audio driver provides all the channels at once.阅读有关 PS3 Eye（具有一系列麦克风）的信息，如果音频驱动程序一次提供所有通道，您将能够做到这一点。

For Java, this probably means "Can you open the channel with an AudioFormat that includes 4 channels"?对于 Java，这可能意味着“您可以使用包含 4 个频道的AudioFormat打开频道吗”？ If yes, then your samples will contain multiple frames and the decoded frame data will (almost certainly) be time aligned.如果是，那么您的样本将包含多个帧，并且解码的帧数据将（几乎可以肯定）时间对齐。 To quote the Java docs : "A frame contains the data for all channels at a particular time".引用Java 文档：“一个框架包含特定时间所有通道的数据”。

IDK what "beamforming" is, but if there is hardware that can provide synchronization, using that would obviously be the best solution. IDK 什么是“波束成形”，但是如果有可以提供同步的硬件，那么使用它显然是最好的解决方案。

Here, for what it is worth, is what should be a plausible algorithmic way to manage synchronization.在这里，就其价值而言，应该是一种合理的算法方式来管理同步。

(1) Set up a frame counter for each TargetDataLine . (1) 为每个TargetDataLine设置一个帧计数器。 You will have to convert bytes to PCM as part of this process.作为此过程的一部分，您必须将字节转换为 PCM。

(2) Set up some code to monitor the volume level on each line, some sort of RMS algorithm I would assume, on the PCM data. (2) 设置一些代码来监控每一行的音量水平，我假设某种 RMS 算法，在 PCM 数据上。

(3) Create a loud, instantaneous burst that reaches each microphone at the same time, one that the RMS algorithm is able to detect and to give the frame count for the onset. (3) 创建同时到达每个麦克风的响亮的瞬时突发，RMS 算法能够检测到并给出开始的帧数。

(4) Adjust the frame counters as needed, and reference them going forward on each line of incoming data. (4) 根据需要调整帧计数器，并在每一行输入数据上参考它们。

Rationale: Java doesn't offer real-time guarantees, as explained in this article on real-time, low latency audio processing .基本原理：Java 不提供实时保证，如这篇关于实时、低延迟音频处理的文章中所述。 But in my experience, the correspondence between the byte data and time (per the sample rate) is very accurate on lines closest to where Java interfaces with external audio services.但根据我的经验，字节数据和时间（每个采样率）之间的对应关系在最接近 Java 与外部音频服务接口的线路上非常准确。

How long would frame counting remain accurate without drifting?帧计数在不漂移的情况下保持准确多久？ I have never done any tests to research this.我从来没有做过任何测试来研究这个。 But on a practical level, I have coded a fully satisfactory "audio event" scheduler based on frame-counting, for playing multipart scores via real-time synthesis (all done with Java), and the timing is impeccable for the longest compositions attempted (6-7 minutes in length).但在实际层面上，我已经编写了一个完全令人满意的基于帧计数的“音频事件”调度程序，用于通过实时合成播放多部分乐谱（全部使用 Java 完成），并且时间对于尝试最长的作品来说是无可挑剔的（时长 6-7 分钟）。