FFmpeg + SoundTouch实现音频的变调变速

ykva7679 9年前
   <p>本文使用FFmpeg + SoundTouch实现将音频解码后，进行变调变速处理，并将处理后的结果保存为WAV文件。</p>    <p>主要有以下内容：</p>    <ul>     <li> <p>实现一个FFmpeg的工具类，保存多媒体文件所需的解码信息</p> </li>     <li> <p>将解码后的音频保存为WAV文件</p> </li>     <li> <p>SoundTouch的使用指南</p> </li>    </ul>    <h2><strong>1.从视频文件中提取音频保存为WAV文件</strong></h2>    <p>本小节实现从视频文件中提取音频，解码并保存为WAV文件。</p>    <p>在使用FFmpeg解码时，一般的流程是：</p>    <ul>     <li>打开一个多媒体文件流</li>     <li>得到媒体流信息</li>     <li>查找视频、音频流的index</li>     <li>根据流的index查找相应的的CODEC，打开 AVCodecContext</li>    </ul>    <p>进行完以上操作后，就得到解码所需的各种信息： AVFormateContext 、 AVCodecContext 以及对应流的index。也就说，这些数据是解码多媒体流的必须信息，所以这里对上述操作做一个封装，提供一个单一接口来获取解码所需的信息。</p>    <h3><strong>1.1 MediaInfo工具类</strong></h3>    <p>在使用FFmpeg进行解码的时候，所需要的信息如下：</p>    <ul>     <li>AVFormatContext</li>     <li>AVCodecContext</li>     <li>流的index</li>    </ul>    <p>MediaInfo 的声明如下：</p>    <pre>  <code class="language-java">class CMediaInfo  {    public:      CMediaInfo();      CMediaInfo(MEDIA_TYPE media);      ~CMediaInfo();    public:      ERROR_TYPE open(const char *filename);      void close();      void error_message(ERROR_TYPE error);    public:      MEDIA_TYPE type;      AVFormatContext *pFormatContext;        AVCodecContext *pVideo_codec_context;      AVCodecContext *pAudio_codec_context;        int video_stream_index;      int audio_stream_index;  };</code></pre>    <ul>     <li>构造函数需要一个参数，指出该类中包含的信息为视频、音频或者音视频都包含；</li>     <li>open 方法，根据传入的多媒体文件填充各个字段信息； close 方法，关闭打开的 AVFormatContext 和 AVCodecContext 等。</li>     <li>字段 为解码所需的各类信息。</li>    </ul>    <p>至于具体的实现，可参考前面的文章 ，在最后会提供本文使用的代码，这里不再多说。</p>    <h3><strong>1.2 从视频中提取音频</strong></h3>    <p><strong>1.2.1 获取解码所需的信息</strong></p>    <p>使用上面的提供的 MediaInfo 工具类，首先根据视频文件路径填充 MediaInfo 的各个字段</p>    <pre>  <code class="language-java">char* filename = "E:\\Wildlife.wmv";      CMediaInfo media(MEDIA_TYPE::AUDIO);      media.open(filename);</code></pre>    <p><strong>1.2.2 设置音频的保存格式</strong></p>    <p>在真正的提取解码之前，需要首先设置好要保存的WAV的音频格式。FFmpeg使用 SwrContext 设置音频的转换格式，具体代码如下：</p>    <pre>  <code class="language-java">AVSampleFormat dst_format = AV_SAMPLE_FMT_S16;       uint8_t dst_channels = 2;      auto dst_layout = av_get_default_channel_layout(dst_channels);      auto audio_ctx = media.pAudio_codec_context;      if (audio_ctx->channel_layout <= 0)          audio_ctx->channel_layout = av_get_default_channel_layout(audio_ctx->channels);      SwrContext *swr_ctx = swr_alloc();      swr_alloc_set_opts(swr_ctx, dst_layout, dst_format, audio_ctx->sample_rate,          audio_ctx->channel_layout, audio_ctx->sample_fmt, audio_ctx->sample_rate, 0, nullptr);      if (!swr_ctx || swr_init(swr_ctx))          return -1;</code></pre>    <p>这里设置音频的sample格式为16位的有符号整数，通道数为2通道，采样率不变。</p>    <p><strong>1.2.3 解码，并保存为WAV文件</strong></p>    <p>使用 MediaInfo 获取到关于解码的相关信息，并且设置好格式转换需要的 SwrContext ，然后调用 av_read_frame 从流中读取packet，解码。最后将解码后的数据进行格式转换后，将转换后的数据写入WAV文件。</p>    <pre>  <code class="language-java">int pcm_data_size = 0;      while (av_read_frame(media.pFormatContext, packet) >= 0)      {          if (packet->stream_index == media.audio_stream_index)          {              auto ret = avcodec_send_packet(media.pAudio_codec_context, packet);              if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)                  return -1;              ret = avcodec_receive_frame(media.pAudio_codec_context, frame);              if (ret < 0 && ret != AVERROR_EOF)                  return -1;              auto nb = swr_convert(swr_ctx, &buffer, 192000, (const uint8_t **)frame->data, frame->nb_samples);              auto length = nb * dst_channels * av_get_bytes_per_sample(dst_format);              ofs.write((char*)buffer, length);              pcm_data_size += length;                     }          }</code></pre>    <p>在写入文件的时候要使用二进制的方式，并且要记录好写入的音频的数据的字节数，在最后写WAV文件头的时候需要。</p>    <p>写入WAV文件头</p>    <pre>  <code class="language-java">// 写Wav文件头      Wave_header header(dst_channels, audio_ctx->sample_rate, av_get_bytes_per_sample(dst_format) * 8);      header.data->cb_size = ((pcm_data_size + 1) / 2) * 2;      header.riff->cb_size = 4 + 4 + header.fmt->cb_size + 4 + 4 + header.data->cb_size + 4;      ofs.seekp(0, ios::beg);      CWaveFile::write_header(ofs, header);</code></pre>    <p>首先将音频的PCM数据写入文件，然后根据PCM数据的长度填充WAV文件头的相关字段。</p>    <h2><strong>2.SoundTouch使用指南</strong></h2>    <p>SoundTouch 是一个开源的音频库，主要有以下功能：</p>    <ul>     <li>变速不变调（TSM，Time Scale Modification），改变音频的播放速度（快或者慢）同时不影响音频的声调(Pitch)。</li>     <li>变调不变速 Pitch Shifting ，改变音频声调的同时保持音频的播放速度不变</li>     <li>变调变速，同时改变音频的声调和速度</li>    </ul>    <h3><strong>2.1 编译</strong></h3>    <p>从 SoundTouch 下载源代码，解压后在 <strong>README.html</strong> 中给出了具体的编译方法，在Windows下有两种方法来编译源代码：</p>    <ul>     <li> <p>执行解压文件夹下面的 <strong>make-win.bat</strong> 脚本。试过这种方法没有成功，看了下make-win.bat脚本的内容，应该是没有找到相关的环境变量（VS2008）。该脚本主要是执行下面命令</p> <pre>  <code class="language-java">devenv source\SoundStretch\SoundStretch.vcproj /upgrade  devenv source\SoundStretch\SoundStretch.vcproj /build debug  devenv source\SoundStretch\SoundStretch.vcproj /build release  devenv source\SoundStretch\SoundStretch.vcproj /build releasex64</code></pre> </li>     <li> <p>使用Visudl Studio IDE来编译，打开source\Soundtouch下面的SoundTouch.sln，然后编译即可。SoundTouch.sln编译出来的是静态链接库，使用VS版本为Visual Studio 2008。</p> </li>    </ul>    <p>对编译后库的使用需要注意以下两点：</p>    <ul>     <li>VS2008编译出来的静态链接库在VS2013调用会出现问题，提示ERROR LINK2019错误找不到相关的符号。</li>     <li>在source目录下有个 <strong>SoundTouchDLL</strong> 项目，一看名字就是编译动态链接库dll的。编译，配置相应的参数(dll,lib)，然后实例化 SoundTouch s_touch 。这时候又会提示ERROR LINK2019，一直以为是环境没有配置好，找不到相应的dll文件。结果，是动态链接库dll的导出的不是整个 SoundTouch 类，只是其中的一些方法。</li>    </ul>    <pre>  <code class="language-java">/// Sets new rate control value. Normal rate = 1.0, smaller values  /// represent slower rate, larger faster rates.  SOUNDTOUCHDLL_API void __cdecl soundtouch_setRate(HANDLE h, float newRate);    /// Sets new tempo control value. Normal tempo = 1.0, smaller values  /// represent slower tempo, larger faster tempo.  SOUNDTOUCHDLL_API void __cdecl soundtouch_setTempo(HANDLE h, float newTempo);    /// Sets new rate control value as a difference in percents compared  /// to the original rate (-50 .. +100 %);  SOUNDTOUCHDLL_API void __cdecl soundtouch_setRateChange(HANDLE h, float newRate);</code></pre>    <p>后来，看了下Android的示例，这个动态链接库导出的函数应该是提供给Android使用的API。</p>    <h3><strong>2.2 使用</strong></h3>    <p>得到编译后的静态链接库后，SoundTouch的使用还是很简单的，其外部API封装在了类 SoundTouch 中。在使用的时候只需要下面三个步骤：</p>    <ul>     <li>实例话 SoundTouch 类</li>     <li>设置相关的参数（速度，音调的改变）</li>     <li>调用 putSamples 方法传入处理的Audio Sample；调用 receiveSamples 接收处理后的Sample。</li>     <li>在处理完成后，调用 soundtouch.fflush() 接收管道内余下的sample</li>    </ul>    <p>使用实例如下：</p>    <pre>  <code class="language-java">////////////////////////////////////////////////////////////////////              // 1. 设置SoundTouch，配置变调变速参数              soundtouch::SoundTouch s_touch;              s_touch.setSampleRate(audio_ctx->sample_rate); // 设置采样率              s_touch.setChannels(audio_ctx->channels); // 设置通道数                ////////////////////////////////////////////              // 2. 设置 rate或者pitch的改变参数              //s_touch.setRate(0.5); // 设置速度为0.5，原始的为1.0              s_touch.setRateChange(-50.0);                //////////////////////////////////////////////////////////////              // 3. 传入sample，并接收处理后的sample                // 将解码后的buffer(uint8*)转换为soundtouch::SAMPLETYPE，也就是singed int 16              auto len = nb * dst_channels * av_get_bytes_per_sample(dst_format);              for (auto i = 0; i < len; i++)              {                  touch_buffer[i] = (buffer[i * 2] | (buffer[i * 2 + 1] << 8));                 }                // 传入Sample              s_touch.putSamples(touch_buffer, nb);              do              {                  // 接收处理后的sample                  nb = s_touch.receiveSamples(touch_buffer, 96000);                    auto length = nb * dst_channels * av_get_bytes_per_sample(dst_format);                  ofs.write((char*)touch_buffer, length);                    pcm_data_size += length;              } while (nb != 0);                ///////////////////////////////////////////////              // 4. 接收管道内余下的处理后数据              s_touch.flush();              int nSamples;              do              {                  nSamples = s_touch.receiveSamples(touch_buffer, 96000);                    auto length = nSamples * dst_channels * av_get_bytes_per_sample(dst_format);                  ofs.write((char*)touch_buffer, length);                    pcm_data_size += length;              } while (nSamples != 0);</code></pre>    <p>SoundTouch内部使用通道的方式来管理sample数据，所以在主循环接收好，要接收管道内剩余的sample。</p>    <p>使用的时候需要注意以下几点</p>    <ul>     <li> <p>sample的类型。SoundTouch支持两种类型sample类型：16位有符号整数和32位浮点数，默认使用的是32为浮点数。其sample类型在头文件 STTypes.h 中声明为 SAMPLETYPE 。在该文件的开始位置，使用宏 SOUNDTOUCH_INTEGER_SAMPLES 和 SOUNDTOUCH_FLOAT_SAMPLES 来决定使用那种sample类型。</p> <pre>  <code class="language-java">#define SOUNDTOUCH_INTEGER_SAMPLES     1    //< 16bit integer samples      //#define SOUNDTOUCH_FLOAT_SAMPLES       1    //< 32bit float samples</code></pre> </li>    </ul>    <p>另外，为了防止计算时有溢出，也支持32为有符号整数和64位浮点数，其类型为 LONG_SAMPLETYPE 。</p>    <ul>     <li>速度和pitch参数的设置      <ul>       <li>变调不变速        <ul>         <li>setPitch(double newPitch) 源pitch = 1.0，小于1音调变低；大于1音调变高</li>         <li>setPitchOctaves(double newPitch) 在源pitch的基础上，使用八度音(Octave)设置新的pitch [-1.00, 1.00]。</li>         <li>setPitchSemiTones(double or int newPitch) 在源pitch的基础上，使用半音(Semitones)设置新的pitch [-12.0,12.0]</li>        </ul> </li>       <li>变速不变调        <ul>         <li>setRate(double newRate) 设置新的rate，源rate=1.0，小于1变慢；大于1变快</li>         <li>setRateChange(double newRate) 在源rate的基础上，以百分比设置新的rate[-50,100]</li>         <li>setTempo(double newTempo) 设置新的节拍tempo，源tempo=1.0，小于1则变慢；大于1变快</li>         <li>setTempoChange(double newTempo) 在源tempo的基础上，以百分比设置新的tempo[-50,100]</li>        </ul> </li>      </ul> </li>    </ul>    <h2>3. FFmpeg + SoundTouch 变调、变速</h2>    <p>有了前面的实现，只需要在FFmepg解码后，将解码后的数据发送到 SoundTouch 中进行处理即可。有一点需要注意，FFmpeg解码后的数据存放在类型为 uint8 的缓存中，在将sample发送给 SoundTouch 处理前，需要根据 SoundTouch 的 <strong>SAMPLETYPE</strong> 进行相应的转换。本文使用的SAMPLETYPE的是S16，首先将 uint8 两个字节组合一个S16（小端）</p>    <pre>  <code class="language-java">// 将解码后的buffer(uint8*)转换为soundtouch::SAMPLETYPE，也就是singed int 16              auto len = nb * dst_channels * av_get_bytes_per_sample(dst_format);              for (auto i = 0; i < len; i++)              {                  touch_buffer[i] = (buffer[i * 2] | (buffer[i * 2 + 1] << 8));                 }</code></pre>    <p>首先计算缓存中的字节数，然后按照小端的方式组合为16为有符号整数。然后将转换后的buffer传送给 SoundTouch 即可。</p>    <pre>  <code class="language-java">s_touch.putSamples(touch_buffer, nb);              do              {                  // 接收处理后的sample                  nb = s_touch.receiveSamples(touch_buffer, 96000);                    auto length = nb * dst_channels * av_get_bytes_per_sample(dst_format);                  ofs.write((char*)touch_buffer, length);                    pcm_data_size += length;              } while (nb != 0);</code></pre>    <p>变调变速的处理结果如下图：</p>    <p style="text-align: center;"><img src="https://simg.open-open.com/show/123efdef9421be8322837735419e2305.png"></p>    <p>频谱图，上图为原始音频的频谱；下图为使用 setPitch(0.1) 将pitch设为原始的10%得到的频谱图</p>    <p style="text-align: center;"><img src="https://simg.open-open.com/show/b16b5cccf7ae642f12e1585995decb06.png"></p>    <p>波形图，上图为原始的波形图；下图为使用 setRateChange(-50.0) 设置速度减少50%得到的波形图</p>    <h2><strong>4. 总结</strong></h2>    <p>本文使用FFmepg + SoundTouch相结合的方式，将音频从视频从提取出来，进行变调变速处理后保存为WAV文件。结合前面的学习总结，可以很容易的实现音频的变调变速播放。</p>    <p> </p>    <p> </p>    <p>来自：http://www.cnblogs.com/wangguchangqing/p/6003087.html</p>    <p> </p>
FFmpeg + SoundTouch实现音频的变调变速

相关经验

目录