Demo Video
1. Creating a function that gets the source code of the web page
we’ll be scraping Wikitionary !
getsource(){
WORD=leben
LANG=de # de for german (en for english etc)
URL="https://${LANG}.wiktionary.org/wiki/${WORD}"
curl $URL > url.html
}
getsource # calling the funciton
- We have Declared a variable called
WORD
- Used curl to the get the source code of the page & redirect the oupout to
url.html
The output of url.html
will look something like this:
2. Formating and cleaning up
Our goal is to get the audio files of the german word leben
getaudio(){
grep -Eo "//[a-zA-Z0-9./?=_%:-]*\.+(ogg|mp3|flac|aac|wav)" url.html
}
# ogg, mp3, flac, aac, wav are extensions of audio files
Regex Explanation:
[-]
Matches any character within the range[a-zA-Z]
Matches any alphabetical letter lower case and upper case[0-9]
Matches any number
.
This matches any one character.+
This means that the preceding item must match one or more times.*
This means that the preceding item must match\
This is the escape character for escaping any of the special characters mentioned previously.?
This means that the preceding item must match one or zero times.|
This specifies the alternation .. One item on either of the sides of|
should match()
This treats the terms enclosed as one entity- Example : ma(tri)?x matches max or matrix.
The ouptput will give us the following links:
//upload.wikimedia.org/wikipedia/commons/3/3f/De-leben.ogg
//upload.wikimedia.org/wikipedia/commons/c/c2/De-leben2.ogg
//upload.wikimedia.org/wikipedia/commons/8/8a/De-riskant_leben.ogg
//upload.wikimedia.org/wikipedia/commons/6/6f/De-in_beschr%C3%A4nkten_Verh%C3%A4ltnissen_leben.ogg
All Good !
Note : We need to add https:
in bignning of each line !
Let’s put it all together
getsource(){
WORD=leben
LANG=de # de for german (en for english etc)
URL="https://${LANG}.wiktionary.org/wiki/${WORD}"
curl $URL
}
getaudio(){
grep -Eo "//[a-zA-Z0-9./?=_%:-]*\.+(ogg|mp3|flac|aac|wav)"
}
getsource | getaudio | xargs -I {} echo "https:{}" > audios_list
# and optionnaly download them with
# yt-dlp -a audio_list
#Or
while read line; do
wget -N $line &
done < audios_list
For a more practical script checkout this sample on github
The script used in the video:
For the latest
script version check out my github acc :
Github : https://github.com/AnasBoubechra/Pronounce_this
Click to expand
set -e lang= query= tmpfile= dmenu=false fzf=false download_dir="$HOME/.pt" version="dev 2.0" sname="$(basename $0)" show_help(){ printf " Usage: $sname [ -q ARG ] [ -l ARG ] [vfdm] -q For the search query ! Example: $sname -q hallo -l To add a language code Example $sname -l en -q hello -d To download the audios and store them locally. The default path is ${download_dir} * For each query a folder will be created and store all the audios inside it ! * Support offline usage. -v Show version -m To use dmenu -f To use fzf " } getsource(){ trap cleanup INT QUIT TERM EXIT query=$(printf "$query" | tr '[:upper:]' '[:lower:]') tmpfile=`mktemp` cleanup(){ [ -f $tmpfile ] && rm $tmpfile } curl -s "https://${lang:=en}.wiktionary.org/wiki/${query}" | \ grep -Eo '//upload[a-zA-Z0-9./?=_%:-]*\.+(ogg|mp3|wav|aac)' | sed 's/\/\//https:&/g' >$tmpfile } show_version(){ printf "$sname Version: %s\n" "$version" exit 0 } check_dep(){ if ! command -v "$1";then printf "$1 is not installed !\n" && exit 127 fi } _main_(){ if ls -A "$aud_dir" 2> /dev/null;then # check if a dir is not empty instead of existance selected=$(ls "$aud_dir" | $1) mpv "${aud_dir}/${selected}" exit 0 else getsource if [ -s $tmpfile ];then selected=$(rev $tmpfile | cut -d'/' -f 1 | rev | $1 ) grep $selected $tmpfile | mpv --playlist=- [ $download ] && download_aud else printf "-> No results found for %s :/\n" "$query" >&2 exit 0 fi fi } download_aud(){ mkdir -p "${aud_dir}" # Buggy printf "Downloading the audios ..." cat $tmpfile | parallel curl -O --output-dir "${aud_dir}" && printf "Download completed." \ || (printf "Enable to download the audio files !" && exit 1) } while getopts "q:hl:dfmv" OPT; do case "$OPT" in h) show_help && exit 0 ;; q) query=$OPTARG ;; l) lang=$OPTARG ;; f) fzf=true;; d) download=true ;; m) dmenu=true ;; v) show_version ;; *) printf "Wrong usage !\n Show help with: $sname -h\n" >&2 && exit 1;; esac done # If the query is empty or is not a casual word if ! expr "$query" : "[a-zA-Z]" 1>/dev/null; then printf "Error: The query must be a non empty letter !\n" >&2 exit 1 fi [ $dmenu = "true" ] && [ $fzf = "true" ] && \ printf "Error: -m and -f are mutually exclusive and may only be used once\n" >&2 && exit 2 aud_dir="${download_dir}/${query}" if $dmenu;then check_dep dmenu _main_ 'dmenu -l 10' elif $fzf then check_dep fzf _main_ "fzf --reverse --height=40%" else _main_ "head -n 1" fi
Enjoy 🤓 !