FFOPTS and default behavior doesn't re-encode.

README: removing xxd references.
Remove xxd dep & '\'. Change fun. names. </d/n in while read block. Comments.
2022-01-26 20:44:45 +01:00 · 2022-01-26 13:12:37 +01:00 · 2022-01-26 12:27:41 +01:00 · 2022-01-17 12:08:29 +01:00 · 2021-04-30 01:22:30 +02:00 · 2021-04-30 01:21:23 +02:00
2 changed files with 317 additions and 140 deletions
--- a/README.md
+++ b/README.md
@ -1,45 +1,89 @@
+# Get Tracks
+
+`get-tracks.sh` allows you to easily extract music tracks from a file with multiple songs, like a CD-in-a-file.
+
 # Required applications

-* soxi
-* ffmpeg
+`get-tracks.sh` is a simple shell script based around `ffmpeg`, using only POSIX tools.
+
+It was tested on both OpenBSD and Linux (Ubuntu, Alpine).

 # Usage

+Download a CD and the starting time of each track in a file, and you're good to go.
+
 ```Bash
-get-tracks.sh rip audio-file time-file
+get-tracks.sh <audio-file> <time-file>
+
+# Example
+get-track.sh doom-eternal.opus doom-eternal.txt
 ```


-`audio-file` can be in any format understood by `soxi` and `ffmpeg`
+*audio-file* can be in any format understood by `ffmpeg`.

-In the `time-file`:
+The *time-file* must have this format:

 ```
-0:00 First song
-2:20 Second song
+0:00 intro
+2:20 First song
 3:18 Awesome song
 ```

 # Environment variables

-* SIMULATION [0|1]: do not invoke ffmpeg
-* FORMAT [mp3,ogg,opus,…]: for the song file format
-* WITH_NUMBER [separator]: if not null, write song number with this separator\
-  example WITH_NUMBER=_, song names will be *1_song.opus*, *2_song.opus*, (...)
-* QUIET: if set to any value, ffmpeg commands are not displayed
+The behavior of the script can be changed by several environment variables.

-# Limitations
+* **SIMULATION** [empty or not]\
+    do not invoke ffmpeg

-This script doesn't work with UTF-8 content.
-If your file with timings contains UTF-8 characters, apply this:
+* **FORMAT** [mp3,ogg,opus,…]\
+    see the ffmpeg documentation for the output formats available
+* **FFOPTS** [any ffmpeg options] *(default: '-c:a copy')*\
+    ffmpeg options, can be used to change audio quality\
+    can be **required** to change in case input and output file formats differ\
+    see the ffmpeg documentation for available parameters

-```
-iconv -f utf-8 -t ascii timings-file > timings-file-fixed
+* **NONUMBER** [empty or 1]\
+    do not write song numbers
+* **SEPARATOR** [separator] *(default: ' - ')*\
+    separator between number and name\
+    example with SEPARATOR='_': 01_intro.opus 02_blah.opus…
+
+* **HEADERS** [empty or 1]\
+    print environment parameters (verbosity, simulation, etc.)
+* **VERBOSITY** [0-3] *(default: 1)*\
+    0: no output except errors from ffmpeg\
+    1: simple indications on the current track being extracted\
+    2: print actual ffmpeg commands the script currently runs
+
+# Different input and output file formats
+
+In case you want to change the file format, let's say from `flac` to `opus`, you need to override the default ffmpeg options provided by `get-tracks.sh`.
+This is done through the `FFOPTS` environment variable, **which needs to NOT be empty** in order to replace the default `get-tracks.sh` behavior (which is `-c:a copy`).
+By default, `ffmpeg` performs re-encoding by itself.
+
+```Bash
+FORMAT=opus FFOPTS=" " get-tracks.sh cd.flac cd.txt
 ```

-Then verify the content.
+### Warning: sometimes you don't even need to

+You may encounter files in some format like `webm` and you want to convert the output files in `opus`.
+But, inside the `webm` format, you **may** have `opus`-encoded audio.
+In these cases, no re-encoding is necessary, and you can do something like:
+
+```Bash
+FORMAT=opus get-tracks.sh cd.webm cd.txt
+```
+
+You'll have a warning mentionning FFOPTS (based on different formats).
+But the generated audio files won't have any quality loss.
+This happens sometimes with the `youtube-dl` utility.
+
+In case there are actual `ffmpeg` errors, and you don't have output audio files, then the contained audio hadn't the right format.
+You'll have to re-encode.

 # More

-Run `get-track.sh` without arguments.
+You can get some help by running `get-track.sh` without arguments.
--- a/get-tracks.sh
+++ b/get-tracks.sh
@ -1,158 +1,291 @@
 #!/usr/bin/env sh

-get_time(){
-	echo "$*" | sed "s/[ \t].*//"
+# From a single byte in hexadecimal per line to lines ending with 0a
+# (hex for '\n'). Ex: 61 62 63 0a
+# Required to easily match (and remove) multi-byte characters.
+regroup_lines() awk '
+	BEGIN {
+		line_start=1
+	}
+
+	{
+		if (line_start == 1)
+			line = $1;
+		else
+			line = line " " $1;
+
+		line_start = 0;
+		if ($1 == "0a") {
+			print line;
+			line_start = 1
+		}
+	}
+
+	END {
+		if (line_start == 0)
+			print line
+	}
+	'
+
+# From ’ to '
+simple_quote()                sed "s/e2 80 99/27/g"
+
+# From / to '-'
+replace_slashes()             sed "s/2f/2d/g"
+
+remove_backslashes()          sed "s/5c//g"
+
+remove_multibyte_characters() sed "s/e2 80 .. //g"
+
+uppercase()                   tr "[a-z]" "[A-Z]"
+
+# One column decimal to plain text.
+from_dec()                    awk '{ printf ("%c", $1 + 0) }'
+
+# Replace spaces by line returns, outputs a single column.
+spaces_to_line_returns()      tr " " "\n"
+
+# Convert input into hexadecimal and a single byte per line.
+to_hex_one_column() { od -An -tx1 | awk '{for(i=1;i<=NF;i++){ print $i }}'; }
+
+# One column hexa to one column decimal.
+hex_to_dec() { { echo "obase=10;ibase=16;" ; cat ; } | bc ; }
+
+# Reverse hexadecimal (with space separators) to original value.
+from_hex() { spaces_to_line_returns | uppercase | hex_to_dec | from_dec; }
+
+# Remove non ascii, backslashes and invalid filename characters,
+# convert "’" to "'", "/" to " - ".
+to_ascii(){
+	to_hex_one_column | # Input to hexadecimal, 1-byte representation per line.
+		regroup_lines | # From 1-byte to x-byte lines with space separators.
+		simple_quote |  # From "’" to "'".
+		replace_slashes | # From / to '-'.
+		remove_multibyte_characters | # Remove non ascii values.
+		remove_backslashes | # Can mess with the script.
+		from_hex # Convert back from hex (x-byte per line, space separator).
 }

-get_title(){
-	echo "$*" | cut -d ' ' -f 2-
-}
+comp_end_of_tracks() awk -v NONUMBER="$NONUMBER" -v SEPARATOR="$SEPARATOR" '
+	BEGIN {
+		OFS="	"
+	}

-reverse_word_order(){
-	local result=
-	for word in $@; do
-		result="$word $result"
-	done
-	echo "$result" 
-}
+	{
+		if (NR > 1) {
+			print timestamp, $1, title;
+		}

-# bc is mandatory: arythmetic operations are very limited in ash.
-get_seconds(){
-	local n=0
-	local v=0
+		timestamp = $1;

-	values=$(echo "$*" | sed 's/:/\
-/g' | sed "s/^0//")
-	for i in $(reverse_word_order $values); do
-		case $n in
-			0) v=$(echo "$v +         $i " | bc);;
-			1) v=$(echo "$v + (60   * $i)" | bc);;
-			2) v=$(echo "$v + (3600 * $i)" | bc);;
-			*) echo "invalid timecode $*"; exit 1;;
-		esac
+		if (NONUMBER == 1) {
+			title = $2
+		}
+		else {
+			if (NR < 10) {
+				title = "0" NR SEPARATOR $2
+			}
+			else {
+				title = NR SEPARATOR $2
+			}
+		}
+		for (i=3; i <= NF; i++) {
+			title = title " " $i
+		}
+	}

-		n=$((n+1))
-	done
+	END {
+		print timestamp, "END_OF_FILE", title;
+	}
+	'

-	echo $v
-}
+first_column_to_seconds() awk '
+	{
+		# from 10:30 to 630
+		n = split ($1, arr, ":")
+		for (i = 0; i <= n; i++) {
+			if (i == 0) {
+				v = arr[n-i];
+			}
+			else if (i == 1) {
+				v += 60 * arr[n-i];
+			}
+			else if (i == 2) {
+				v += 3600 * arr[n-i];
+			}
+		}
+
+		$1 = v;
+		print;
+		v = 0;
+	}
+	'

 # Get a more usable time representation for the beginning and the end of songs.
-get_values(){
-
-	audio_file="$1"
-	time_file="$2"
-	total_length=$(soxi -D "${audio_file}" | sed "s/\..*//") # integer values only
-
-	n=0
-
-	while read X; do
-
-		if [ $n -ne 0 ]; then
-			to=$(get_time $X)
-			to_s=$(get_seconds $to)
-			echo -e "$from_s\t$to_s\t$title"
-		fi
-
-		#echo $X
-		from=$(get_time $X)
-		from_s=$(get_seconds $from)
-		title=$(get_title $X)
-
-		if [ -z "${WITH_NUMBER}" ]; then
-			:
-		else
-			title="${n}${WITH_NUMBER}${title}"
-		fi
-
-		n=$(echo $n + 1 | bc)
-	done < "${time_file}"
-
-	echo -e "$from_s\t$total_length\t$title"
-}
+get_timestamps(){ to_ascii | first_column_to_seconds | comp_end_of_tracks; }

 run_ffmpeg(){
-	local file=$1
-	local from=$2
-	local duration=$3
-	local title=$4
+	file=$1
+	from=$2
+	to=$3
+	final_title=$4

-	if [ "${SIMULATION}" = 1 ]; then
-		[ -z "${QUIET}" ] && echo "ffmpeg -loglevel error -ss '$from' -t '$duration' -i '${file}' '${title}'"
-	else
-		[ -z "${QUIET}" ] && echo "ffmpeg -loglevel error -ss '$from' -t '$duration' -i '${file}' '${title}'"
-		$(< /dev/null ffmpeg -loglevel quiet -ss "$from" -t "$duration" -i "${file}" "${title}")
+	LOG_LEVEL="-loglevel error"
+	FROM="-ss $from"
+	TO=""
+	if [ "$to" != "" ]; then
+		TO="-to $to"
+	fi
+	INPUT_FILE="$file"
+	OUTPUT_FILE="$final_title"
+
+	case "v$VERBOSITY" in
+		v0)
+			;;
+		v1)
+			echo "extracting '$final_title'"
+			;;
+		v2)
+			echo "ffmpeg $LOG_LEVEL $FROM $TO -i $INPUT_FILE $FFOPTS '$OUTPUT_FILE'"
+			;;
+		*)
+			echo "verbosity is not set properly" >&2
+			exit 1
+			;;
+	esac
+
+	if [ "$SIMULATION" = "" ]; then
+		ffmpeg $LOG_LEVEL $FROM $TO -i "$INPUT_FILE" $FFOPTS "$OUTPUT_FILE"
 	fi
 }

-rip(){
-	n=0
-	from=0
-	to=0
-
+extraction(){
 	audio_file="$1"
 	time_file="$2"

-	[ "$FORMAT" = "" ] && echo "default format: opus" && FORMAT="opus"
+	get_timestamps < "$time_file" | while read LINE; do
+		track_start=$(echo $LINE | cut -d ' ' -f 1)
+		track_end=$(echo $LINE | cut -d ' ' -f 2)
+		track_title=$(echo $LINE | cut -d ' ' -f 3-)

-	#echo "from	to	duration	title"
-	get_values "$audio_file" "$time_file" | while read LINE; do
-		from=$(echo $LINE | cut -d ' ' -f 1)
-		to=$(echo $LINE | cut -d ' ' -f 2)
-		title=$(echo $LINE | cut -d ' ' -f 3-)
-		duration=$(echo "$to - $from" | bc)
+		if [ "$track_end" = "END_OF_FILE" ]; then
+			track_end=""
+		fi

-		run_ffmpeg "${audio_file}" "${from}" "${duration}" "${title}.${FORMAT}"
-		n=$((n + 1))
+		# Input is /dev/null, otherwise subshells will take the output
+		# of "get_timestamps" as input.
+		# Be careful: "while read X" is a dangerous shell design.
+		< /dev/null run_ffmpeg "${audio_file}" \
+			"${track_start}" "${track_end}" \
+			"${track_title}.${FORMAT}"
 	done
 }

 usage(){
-	echo "usage: $0 command"
-	echo "command: show <single-file-playlist> <song-list>"
-	echo "command: rip  <single-file-playlist> <song-list>"
-	echo
-	echo "song-list line format example: 1:30 My second track of the playlist"
-	echo "show output format: start end title"
-	echo
-	echo "envvar: SIMULATION [0|1] (do not invoke ffmpeg)"
-	echo "envvar: FORMAT [mp3,ogg,opus,…] (see the ffmpeg documentation)"
-	echo "envvar: WITH_NUMBER [separator] (not null = write song number, with this separator)"
-	echo "        example: WITH_NUMBER=_ Song names will be 1_song.opus 2_song.opus…"
-	echo "envvar: QUIET (if set to any value, ffmpeg commands are not displayed)"
+	cat <<END
+Get tracks:
+usage: $0 <single-file-playlist> <song-list>
+
+Debug mode (displays starting and ending times for each song):
+usage: $0 <song-list>
+
+
+Format for <song-list>:
+  0:00 First track
+  1:30 Second track
+
+Environment variables:
+- SIMULATION [empty or not]         do not invoke ffmpeg
+
+- FORMAT [mp3,ogg,opus,…]           see ffmpeg documentation
+- FFOPTS (default: '-c:a copy')     see ffmpeg documentation
+
+- NONUMBER [empty or 1]             do not write song numbers
+- SEPARATOR [separator] (default: ' - ')
+    separator between number and name
+    example with SEPARATOR='_': 01_intro.opus 02_blah.opus…
+
+- HEADERS [empty or 1]             print env params (verbosity, quality, etc.)
+- VERBOSITY [0-3] (default: 1)
+    0: no output except errors from ffmpeg
+    1: simple indications on the current track being extracted
+    2: print actual ffmpeg commands the script currently runs
+END
 }

-if [ $# -lt 1 ]; then
-	usage
-	exit 0
+header(){
+	if [ "$HEADERS" = "1" ]; then
+		echo $*
+	fi
+}
+
+warning(){
+	echo "WARNING: $*"
+}
+
+# Default output format is based on the extension of the input audio file.
+if [ $# -eq 2 ]; then
+	DEFAULT_FORMAT="$(echo $1 | awk -F . '{print $NF}')"
+else
+	header "no default FORMAT selected"
 fi

-command=$1
-shift
+if [ "$FORMAT" = "" ]; then
+	FORMAT="$DEFAULT_FORMAT"
+	header "default FORMAT: ${FORMAT}"
+else
+	header "FORMAT: $FORMAT"
+fi

-case "x-${command}" in
-	x-show)
+# For unexperienced users, print a warning when input and output formats differ.
+# In case FFOPTS is set, encoding is expected to be handled, drop the warning.
+# Example (remove the get-tracks.sh default behavior, perform re-encoding):
+#   FFOPTS=" "
+if [ "$FFOPTS" = "" ] && [ "$FORMAT" != "$DEFAULT_FORMAT" ]; then
+	warning "input and output formats seem to differ"
+	warning "1. re-encoding may be required (through the FFOPTS envvar)"
+	warning "2. FFOPTS represents ffmpeg options, directly given to ffmpeg"
+	warning '   (default: "-c:a copy" = copy without re-encoding)'
+	warning '   You can put FFOPTS=" " if you want to perform re-encoding.'
+fi

-		# Takes the audio file in first parameter
-		if [ $# -ne 2 ]; then
-			echo "Usage: $0 show music-file time-stamps-file"
-			exit 1
-		fi
+if [ "$VERBOSITY" = "" ]; then
+	header "default VERBOSITY: 1"
+	VERBOSITY=1
+else
+	header "VERBOSITY level: $VERBOSITY"
+fi

-		get_values "$1" "$2"
-		;;
+if [ "$NONUMBER" = "" ]; then
+	header "default NONUMBER: disabled"
+	NONUMBER=0

-	x-rip)
+	# Assume that there should be a separator.
+	if [ "$SEPARATOR" = "" ]; then
+		header "default SEPARATOR: ' - '"
+		SEPARATOR=" - "
+	else
+		header "SEPARATOR: '$SEPARATOR'"
+	fi
+else
+	header "NONUMBER: won't prefix tracks"
+	SEPARATOR=""
+fi

-		# Takes the audio file in first parameter
-		if [ $# -ne 2 ]; then
-			echo "Usage: $0 show music-file time-stamps-file"
-			exit 1
-		fi
+if [ "$FFOPTS" != "" ]; then
+	header "FFOPTS envvar is set: ${FFOPTS}."
+else
+	FFOPTS="-c:a copy"
+	header "default FFOPTS: ${FFOPTS}"
+fi

-		rip "$1" "$2"
-		;;
-	*)
-		usage 1>&2
-		exit 1
+if [ "$SIMULATION" != "" ]; then
+	header "SIMULATION envvar is set: this is a simulation."
+fi
+
+case $# in
+	0) usage; exit 0;;
+	1) get_timestamps < "$1";;
+	2) extraction "$1" "$2";;
+	*) usage 1>&2; exit 1;;
 esac
Author	SHA1	Message	Date
Karchnu	13dd38ed56	FFOPTS and default behavior doesn't re-encode.	2022-01-26 20:44:45 +01:00
Karchnu	1bb2b97760	README: removing xxd references.	2022-01-26 13:12:37 +01:00
Karchnu	51538fd5c3	Remove xxd dep & '\'. Change fun. names. </d/n in while read block. Comments.	2022-01-26 12:27:41 +01:00
Karchnu	3f892dc644	Simpler code, README update (objective,usage,envvars) and better usage().	2022-01-17 12:08:29 +01:00
Karchnu	69e4cac809	do not add spaces to slash replacements (s_/_-_ and not s_/_ - _)	2021-04-30 01:22:30 +02:00
Karchnu	e8362b55da	HEADERS: print headers only when requested	2021-04-30 01:21:23 +02:00
Karchnu	f58bfdde9c	Grooming.	2021-04-30 01:13:53 +02:00
Karchnu	95358ff18d	fixes other english mistakes	2021-04-06 23:03:02 +02:00
Karchnu	ce2e72b3db	fixes english errors	2021-04-06 23:00:08 +02:00
Karchnu	6fde82167f	fixes invalid "/" in filenames, fixes handling of filenames with spaces.	2021-04-06 22:55:30 +02:00
Karchnu	f36c7d604e	Code grooming.	2021-04-06 17:24:26 +02:00
Karchnu	390659b000	README: text enhancements.	2021-04-06 16:10:02 +02:00
Karchnu	0e8a90dd81	Comments.	2021-04-06 16:09:15 +02:00
Karchnu	84422ac6ca	VERBOSITY => verbosity (in documentation)	2021-04-06 05:30:01 +02:00
Karchnu	56da871a4f	README: removed iconv references, update envvars, update requirements (awk, xxd).	2021-04-06 05:28:16 +02:00
Karchnu	bd9aa59a26	multi-byte characters removal, from "while read" to awk.	2021-04-06 05:18:19 +02:00
Karchnu	7d073a56fd	Removed soxi requirement.	2021-04-05 02:33:10 +02:00
Karchnu	8d25557220	portability, NONUMBER SEPARATOR and VERBOSITY variables, numbers from 00	2021-04-05 01:41:16 +02:00