Download coub.com videos using youtube-dl and ffmpeg

By /robex/, April 2018. Back to Articles and Guides

The problem

coub.com is a website for sharing "GIFs with sound" as they call them, even though in reality they're mp4s with mp3s. If you look at youtube-dl's supported sites list, you can see that it is indeed listed there, however this is what we get upon running it:

$ youtube-dl https://coub.com/view/11bxoz
[Coub] 11bxoz: Downloading JSON metadata
[download] Destination: coub.mp4
[download] 100% of 1.15MiB in 00:00

Looks good so far, right? Lets play it:

$ vlc coub.mp4 
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7fdf78c4cd60] moov atom not found
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7fdf4cc01c60] moov atom not found
[00007fdf4cc017e8] avformat demux error: Could not open coub.mp4: Unknown error 1094995529
[00007fdf4cc017e8] ps demux error: cannot peek
[00007fdf780036b8] core input error: no suitable demux module for coub.mp4'

The reason

So... it looks like the moov atom (aka movie box, basically the metadata for the mp4) is broken. Using the network analyzer contained in Firefox's developer tools to download the video, we come to the same result. Basically, the site is serving a corrupt file. Then how the hell is it playing on their site? Well, here's the culprit, embedded in their javascript:

MediaElementStrategy.prototype.decode = function(buf) {
	var x;
	x = new Uint16Array(buf, 0, 2);
	if (x[0] === 19392) {
		return x[0] = 0;
	}
};

This checks the first two bytes to see if they match (in hex) C0 4B, and if they do, it returns 00 00. How ingenious, huh? Lets make sure this is true by checking our downloaded file:

$ xxd -l 32 coub.mp4
00000000: c04b 0018 6674 7970 6973 6f35 0000 0001  .K..ftypiso5....
00000010: 6973 6f35 6461 7368 0000 0048 6672 6565  iso5dash...Hfree

The solution

Sure enough, the bytes are there. Lets change them to the proper value. You can do this with a hex editor, but since we'll make a script later to automate the whole process, we will go with a more portable approach:

printf '\x00\x00' | dd of="coub.mp4" bs=1 count=2 conv=notrunc

The conv=notrunc option just stops dd from trimming the file after writing the first two bytes. Now the file plays fine, however coub stores the audio stream separately in a mp3 file. So how are we gonna download the audio? Well, thankfully youtube-dl offers an audio format upon specifying the following option:

$ youtube-dl -F https://coub.com/view/11bxoz
[Coub] 11bxoz: Downloading JSON metadata
[info] Available formats for 11bxoz:
format code       extension  resolution note
html5-audio-med   mp3        audio only 888.98KiB
html5-audio-high  mp3        audio only 1.16MiB
html5-video-med   mp4        unknown    420.57KiB
html5-video-high  mp4        unknown    1.15MiB (best)

At this point, we already have both working video and audio, therefore all that is left is merging them with ffmpeg. However, we have to make a decision here. In the website, the audio is sometimes longer than the video, which keeps looping until the audio finishes. We can do two things: trim the audio when the video finishes, or loop the video until the audio finishes. For this guide I'll go with the former approach.

Here is the command I used, the audio is encoded with lame but you can of course use any other codec you want:

ffmpeg -i coub.mp4 -i coub.mp3 -shortest -c:a libmp3lame convcoub.mp4

Up until this point I've been using coub.mp4 as the filename, but here is a script that automates the whole process, keeping the original filename as well:

#!/bin/bash

# get video name, and remove newline
filevid=`youtube-dl $1 --get-filename | tr -d '\n'`
# get audio name
fileaud=${filevid/.mp4/.mp3}

youtube-dl "$1" > /dev/null
youtube-dl -f html5-audio-high "$1" > /dev/null

# overwrite bad bytes
printf '\x00\x00' | dd of="$filevid" bs=1 count=2 conv=notrunc 2>/dev/null 1>/dev/null

ffmpeg -loglevel fatal -i "$filevid" -i "$fileaud" -shortest -c:a libmp3lame "out$filevid"
mv out$filevid $filevid
rm $fileaud

This makes the whole thing silent since I was piping the output, just remove the redirections to /dev/null and the -loglevel fatal if you want to see the progress.

/robex/ - Last edited: 2024-11-16 15:22:04