使用SoX更改音频文件中一段时间的音量级别

Question

I'd like to change the volume level of a particular time range/slice in an audio file using SoX. 我想使用SoX更改音频文件中特定时间范围/切片的音量级别。

Right now, I'm having to: 现在，我不得不：

Trim the original file three times to get: the part before the audio effect change, the part during (where I'm changing the sound level), and the part after 修剪原始文件三次得到：音频效果改变前的部分，（我改变声级的部分），以及之后的部分
Perform the effect to change the sound level on the extracted “middle” chunk of audio, in its own file 执行效果以在其自己的文件中更改提取的“中间”音频块上的声级
Splice everything back together, taking into account the fading/crossfading 5ms overlaps that SoX recommends 考虑到SoX推荐的淡入/交叉淡化5ms重叠，将所有内容拼接在一起

Is there a better way to do this that doesn't involve writing a script to do the above? 有没有更好的方法来做这个不涉及编写脚本来执行上述操作？

Answer 1

For anyone who stumbles across this highly ranked thread, searching for a way to duck the middle of an audio file: 对于任何偶然发现这个排名很高的线程的人来说，寻找一种方法来躲避音频文件的中间部分：

I've been playing with SoX for ages and the method I built uses pipes to process each part without creating all those temporary files! 我已经玩SoX多年了，我构建的方法使用管道处理每个部分而不创建所有这些临时文件！

The result is a single line solution, though you will need to set timings and so, unless your fade timings will be the same for all files, it may be useful to generate the line with an algorithm. 结果是单线解决方案，但您需要设置时序，因此，除非您的淡入淡出时间对所有文件都相同，否则使用算法生成线可能很有用。

I was pleased to get piping working, as I know this aspect has proved difficult for others. 我很高兴让管道工作，因为我知道这方面对其他人来说很难。 The command line options can be difficult to get right. 命令行选项可能很难正确。 However I really didn't like the messy additional files as an alternative. 但是我真的不喜欢凌乱的附加文件作为替代。

By using mix functionality and positioning each part using pad, then giving each section trim & fade we can also avoid use of 'splice' here. 通过使用混合功能并使用垫定位每个部件，然后给每个部分修剪和褪色，我们也可以避免在这里使用“拼接”。 I really wasn't a fan. 我真的不是粉丝。

A working single line example, tested in SoX 14.4.2 Windows: 一个工作的单行示例，在SoX 14.4.2 Windows中测试：

It fades (ducks) by -6dB at 2 seconds, returning to 0dB at 5 seconds (using linear fades of 0.4 seconds): 它在2秒时衰减（鸭子）-6dB，在5秒时恢复到0dB（使用0.4秒的线性衰落）：

sox -m -t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4" -t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8" -t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8" outputfile.wav gain 9.542

Let's make that a little more readable here by breaking it down into sections: 让我们通过将其分解为几个部分来使其更具可读性：

Section 1 = full volume, Section 2 = ducked, Section 3 = full volume 第1节=满量，第2节=低头，第3节=满量

sox -m
    -t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4" 
    -t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8"
    -t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8"
    outputfile.wav gain 9.542

Now, to break it down, very thoroughly 现在，要彻底打破它

' -m ' .. says we're going to mix (this automatically reduces gain, see last parameter) ' -m '..说我们要混合（这会自动降低增益，参见最后一个参数）

' -t wav ' .. says the piped command that follows will return a WAV (it seems the WAV header is being lost in the pipeline) ' -t wav '..说接下来的管道命令将返回一个WAV（似乎WAV标题在管道中丢失）

Then.. the FIRST piped part (full volume before duck) 然后..第一个管道部分（鸭子前的全卷）

' -V1 ' .. says ignore warnings - there will be a warning about not knowing length of output file for this specific section as it's piping out, but there should be no other warning from this operation ' -V1 '..表示忽略警告 - 会发出一个警告，说明这个特定部分的输出文件长度不知道，因为它已经输出，但此操作不应该有其他警告

then the input filename 然后是输入文件名

' -t wav ' .. forces the output type ' -t wav '..强制输出类型

' - ' .. is the standard name for a piped output which will return to SoX command line ' - '..是管道输出的标准名称，它将返回到SoX命令行

' fade t 0 2.2 0.4 ' .. fades out the full volume section. ' 淡出t 0 2.2 0.4 '..淡出整个音量部分。 t = linear. t =线性。 0 fade in. Then (as we want the crossfade's halfway point to be at 2 seconds) we fade out by 2.2 seconds, with a 0.4 second fade (the fadeout parameter is for when the fade ENDS!) 0淡入。然后（因为我们希望交叉淡入淡出的中间点为2秒），我们淡出2.2秒，淡入0.4秒（淡出参数用于淡入淡出时！）

' -t wav ' .. to advise type of next part - as above ' -t wav '..建议下一部分的类型 - 如上所述

Then.. the SECOND piped part (the ducked section) 然后..第二个管道部分（鸭子部分）

' -V1 ' .. again, to ignore output length warning - see above then the same input filename ' -V1 '..再次，忽略输出长度警告 - 见上面然后相同的输入文件名

' -t wav ' .. forces output type, as above ' -t wav '..强制输出类型，如上所述

' - ' .. for piped output, see above ' - '..用于管道输出，见上文

' trim 1.8 ' .. because this middle section will hit the middle of the transition at 2 seconds, so (with a 0.4 second crossfade) the ducked audio file will start 0.2 seconds before that ' 修剪1.8 '..因为这个中间部分会在2秒内到达转换的中间位置，所以（使用0.4秒的交叉渐变）躲避的音频文件将在此之前0.2秒开始

' fade t 0.4 3.4 0.4 ' .. to fade in the ducked section & fade back out again. ' 淡出t 0.4 3.4 0.4 '..淡化鸭子部分并再次淡出。 So a 0.4 fade in. Then (the most complicated part) as the next crossfade will end at 5.2 seconds we must take that figure minus trimmed amount for this section, so 5.2-1.8=3.4 (again this is because fadeout position deals with the end timing of the fadeout) 所以0.4渐弱。然后（最复杂的部分）作为下一个交叉渐变将在5.2秒结束我们必须采取该数字减去该部分的修剪量，所以5.2-1.8 = 3.4（再次这是因为淡出位置处理结束时间的淡出）

' gain -6 ' .. is the amount, in dB, by which we should duck ' 增益-6 '..是我们应该躲避的数量，以dB为单位

' pad 1.8 ' .. must match the trim figure above, so that amount of silence is inserted at the start to make it synch when sections are mixed ' pad 1.8 '..必须与上面的修剪图匹配，以便在开始时插入静音量，以便在切片混合时使其同步

' -t wav ' .. to advise type of next part - as above ' -t wav '..建议下一部分的类型 - 如上所述

Then.. the THIRD piped part (return to full level) 然后..第三个管道部分（返回完整级别）

' -V1 ' .. again - see above ' -V1 '..再次 - 见上文

then the same input filename 那么相同的输入文件名

-t wav ' .. to force output type, as above -t wav '..强制输出类型，如上所述

- ' .. for piped output, see above - '..用于管道输出，见上文

trim 4.8 ' .. this final section will start at 5 seconds, but (with a 0.4 second crossfade) the audio will start 0.2 seconds before that 修剪4.8 '..这个最后一节将在5秒开始，但是（用0.4秒交叉渐变）音频将在此前0.2秒开始

' fade t 0.4 0 0 ' .. just fade in to this full volume section. ' 淡出t 0.4 0 0 '..只是淡入这个完整的音量部分。 No fade out 没有淡出

' pad 4.8 ' .. must match the trim figure above, as explained above then output filename ' pad 4.8 '..必须与上面的修剪图匹配，如上所述然后输出文件名

' gain 9.542 ' .. looks tricky, but basically when you "-m" to mix 3 files the volume is reduced to 1/3 (one third) by SoX to give headroom. ' 获得9.542 '..看起来很棘手，但基本上当你“-m”混合3个文件时，SoX的音量减少到1/3（三分之一）以提供空间。

Rather than defeating that, we boost to 300%. 而不是打败它，我们提高到300％。 We get the dB amount of 9.542 with this formula 20*log(3)/log(10) 我们得到的dB量为9.542，该公式为20 * log（3）/ log（10）

If you copy & paste the single line somewhere you can see it all easily, it's a lot less scary than the explanation! 如果您将单行复制并粘贴到某个地方，您可以轻松地看到它，这比解释要简单得多！

Final though - I was initially concerned about whether the crossfades needed to be logarithmic rather than linear, but in my case from listening to the results linear has definitely given the sound I expected. 最后 - 我最初担心交叉渐变是否需要是对数而不是线性，但在我的情况下，从听取结果线性确实给出了我预期的声音。

You may like to try longer crossfades, or have the point of transition happening earlier or later but I hope that single line gives hope to anyone who thought many temporary files would be required! 您可能希望尝试更长时间的交叉淡入淡出，或者更早或更晚地发生转换，但我希望单行给那些认为需要许多临时文件的人带来希望！

Let me know if more clarification would help! 如果有更多的说明会有帮助，请告诉我！

audacity waveform 大胆波形

Answer 2

Okay, with ffmpeg and filters it's all quite simple. 好的，使用ffmpeg和过滤器都非常简单。

Imagine that you have 2 tracks, A and B. And you want to crop ones and do something about the volume. 想象一下，你有2首曲目，A和B.你想要裁剪一些并对音量做些什么。 So the solution would be: 所以解决方案是：

ffmpeg -y -i 1.mp3 -i 2.mp3 i f454495482c151aea8761dda.mp3 -i f5544954796af4a171f11b57.mp3 -i f754495448788e35e6123679.mp3 -i f754495448788e35e6123679.mp3 -i f85449545e646dea98e5dd19.mp3 \
-filter_complex "[0]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,volume='if(between(t,129.00,129.20),0.15000*(t - 129.00) + 0.03,1)':eval=frame,volume='if(between(t,129.20,181.50),-0.00057*(t - 129.20) + 0.06,1)':eval=frame,volume='if(between(t,181.50,181.60),0.40000*(t - 181.50) + 0.03,1)':eval=frame,volume='if(between(t,181.60,183.50),-0.03684*(t - 181.60) + 0.07,1)':eval=frame,volume='if(between(t,183.50,188.00),0.00000*(t - 183.50) + 0.00,1)':eval=frame,atrim=0.00:56.00,adelay=129000|129000|129000|129000,apad[0:o];[1]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,volume='if(between(t,0.00,134.00),0.00000*(t - 0.00) + 0.06,1)':eval=frame,atrim=0.00:134.00,apad[1:o];[0:o][1:o]amix=inputs=28,atrim=duration=185.00" -shortest -ac 2 output.mp3

which will take 2 input files, transform both of the streams to the appropriate aformat and then apply volume filters. 这将获取2个输入文件，将两个流转换为适当的格式，然后应用卷过滤器。

The syntax for volume is simple: if time t is between some start and end time - then apply the volume filter, based on the desired start volume level plus by some coefficient multiplied by difference between the start time and current time t . 音量的语法很简单：如果时间t 在某个开始时间和结束时间之间 - 然后应用音量滤波器，基于所需的起始音量水平加上某个系数乘以开始时间和当前时间t之间的差值。

This will increase the volume linearly from initial volume to desired value on a range. 这将在一定范围内将体积从初始体积线性增加到期望值。

atrim will trim the audio chunk after the volume has been adjusted on all ranges. 在所有范围上调整音量后， atrim将修剪音频块。

ffmpeg is just amazing, the expressions could be very complex and many of math functions may be used in the expressions. ffmpeg真是太棒了，表达式可能非常复杂，许多数学函数都可以用在表达式中。

使用SoX更改音频文件中一段时间的音量级别

问题描述

2 个解决方案

解决方案1
10 已采纳 2015-10-07 16:39:13

解决方案2
3 2014-10-24 03:33:09

使用SoX更改音频文件中一段时间​​的音量级别

问题描述

2 个解决方案

解决方案1 10 已采纳 2015-10-07 16:39:13

解决方案2 3 2014-10-24 03:33:09

使用SoX更改音频文件中一段时间的音量级别

解决方案1
10 已采纳 2015-10-07 16:39:13

解决方案2
3 2014-10-24 03:33:09