By /robex/, April 2018. Back to Articles and Guides
coub.com is a website for sharing "GIFs with sound" as they call them, even though in reality they're mp4s with mp3s. If you look at youtube-dl's supported sites list, you can see that it is indeed listed there, however this is what we get upon running it:
$ youtube-dl https://coub.com/view/11bxoz [Coub] 11bxoz: Downloading JSON metadata [download] Destination: coub.mp4 [download] 100% of 1.15MiB in 00:00
Looks good so far, right? Lets play it:
$ vlc coub.mp4 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x7fdf78c4cd60] moov atom not found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x7fdf4cc01c60] moov atom not found [00007fdf4cc017e8] avformat demux error: Could not open coub.mp4: Unknown error 1094995529 [00007fdf4cc017e8] ps demux error: cannot peek [00007fdf780036b8] core input error: no suitable demux module for coub.mp4'
So... it looks like the moov atom (aka movie box, basically the metadata for the mp4) is broken. Using the network analyzer contained in Firefox's developer tools to download the video, we come to the same result. Basically, the site is serving a corrupt file. Then how the hell is it playing on their site? Well, here's the culprit, embedded in their javascript:
MediaElementStrategy.prototype.decode = function(buf) { var x; x = new Uint16Array(buf, 0, 2); if (x[0] === 19392) { return x[0] = 0; } };
This checks the first two bytes to see if they match (in hex) C0 4B, and if they do, it returns 00 00. How ingenious, huh? Lets make sure this is true by checking our downloaded file:
$ xxd -l 32 coub.mp4 00000000: c04b 0018 6674 7970 6973 6f35 0000 0001 .K..ftypiso5.... 00000010: 6973 6f35 6461 7368 0000 0048 6672 6565 iso5dash...Hfree
Sure enough, the bytes are there. Lets change them to the proper value. You can do this with a hex editor, but since we'll make a script later to automate the whole process, we will go with a more portable approach:
printf '\x00\x00' | dd of="coub.mp4" bs=1 count=2 conv=notrunc
The conv=notrunc option just stops dd from trimming the file after writing the first two bytes. Now the file plays fine, however coub stores the audio stream separately in a mp3 file. So how are we gonna download the audio? Well, thankfully youtube-dl offers an audio format upon specifying the following option:
$ youtube-dl -F https://coub.com/view/11bxoz [Coub] 11bxoz: Downloading JSON metadata [info] Available formats for 11bxoz: format code extension resolution note html5-audio-med mp3 audio only 888.98KiB html5-audio-high mp3 audio only 1.16MiB html5-video-med mp4 unknown 420.57KiB html5-video-high mp4 unknown 1.15MiB (best)
At this point, we already have both working video and audio, therefore all that is left is merging them with ffmpeg. However, we have to make a decision here. In the website, the audio is sometimes longer than the video, which keeps looping until the audio finishes. We can do two things: trim the audio when the video finishes, or loop the video until the audio finishes. For this guide I'll go with the former approach.
Here is the command I used, the audio is encoded with lame but you can of course use any other codec you want:
ffmpeg -i coub.mp4 -i coub.mp3 -shortest -c:a libmp3lame convcoub.mp4
Up until this point I've been using coub.mp4 as the filename, but here is a script that automates the whole process, keeping the original filename as well:
#!/bin/bash # get video name, and remove newline filevid=`youtube-dl $1 --get-filename | tr -d '\n'` # get audio name fileaud=${filevid/.mp4/.mp3} youtube-dl "$1" > /dev/null youtube-dl -f html5-audio-high "$1" > /dev/null # overwrite bad bytes printf '\x00\x00' | dd of="$filevid" bs=1 count=2 conv=notrunc 2>/dev/null 1>/dev/null ffmpeg -loglevel fatal -i "$filevid" -i "$fileaud" -shortest -c:a libmp3lame "out$filevid" mv out$filevid $filevid rm $fileaud
This makes the whole thing silent since I was piping the output, just remove the redirections to /dev/null and the -loglevel fatal if you want to see the progress.