I like to buy things online and compare offerings of different vendors prior to any purchase. There are a lot of online services that offer product price comparison. Those services are super useful, but fail short when it comes to alerting me. I signed up to alerts multiple times and never received an alert despite my target price was reached. I will describe how to write a short and simple script that watches product price development on a price comparison service and sends a daily report to my email address.
For this example I chose to use the service idealo to collect offers of different vendors for me.
Identify product page
Copy the address of the page. In my example case, I'm looking for St. Lucia 1931 Rum 5th Edition. The price history says that this product was offered for roughly between 52€ and 63€ in the recent past. Today, the price is at the top of the price range of course.
Inspect source code
Right click on the page and view source code. Search for the latest price shown on the website (in my case 63.5) in order to find where this is located in the code. I found two occurrences in the code that look like this:
The other occurrence of this pattern just has a different price. The key here is the string "product_price". On another service, this will be different of course. This string is what the script will be looking for.
The basic idea is to use the price summary idealo offers and skim the data from here. This way, they do the hard work of collecting all the data from different shops and I take over where they fail: To send an alert by mail.
This is how the basic script looks like written in Bash:
#!/bin/bash #download a copy of the website: wget https://www.idealo.de/preisvergleich/Typ/786992100743.html #skim the price data and process it: cat 786992100743.html | grep "product_price" | grep -oE "[0-9][0-9]\.[0-9]" | uniq exit
What the script does so far:
- Fetch the website using
- Output the file using
- Pipe to
grepto filter showing only lines that contain "product_price"
- Pipe through another
grepthat discards everything except a string that is two digits, a decimal point and another digit
- Show only unique occurrences of this string since there may be many of them by piping through
uniqin the end
When run from the command line, the output is for example:
Now this data can be wrapped in some descriptive test and be send by mail. A cron job needs to run the script each day. Of course the script could be extended to only send a mail in case the price is below my target, but I find monitoring the development on my own has some benefit.
This is how a more complete script could look like:
#!/bin/bash # create empty document to write the email into touch mail.txt # write mail header echo "Subject: Daily price report" >> mail.txt echo "" >> mail.txt # process the data for a single product product_link="https://www.idealo.de/preisvergleich/Typ/786992100743.html" product_document="786992100743.html" echo "Todays price for the Saint Lucia 1931 Rum:" >> mail.txt wget --random-wait --wait=60 --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36" "$product_link" cat "$product_document" | grep "product_price" | grep -oE "[0-9][0-9]\.[0-9]" | uniq >> mail.txt echo "" >> mail.txt rm "$product_document" sleep 60 # repeat above eight lines for another product you monitor # send mail using MSMTP and removes temporary file that contains the mail cat mail.txt | /usr/bin/msmtp "firstname.lastname@example.org" rm "mail.txt" exit
I would run the script from an extra directory where all the downloaded files are collected in order not to create a mess. In above script I use MSMTP for sending mails since I don't want a huge mail system on my server. Setup is described in my article about the Raspberry Pi installation.
One day I noticed that I didn't receive reports anymore. Seems like the idealo server didn't like the way I requested data. I added wait times and changed the user agent to make sure not to get blocked. However, this did not help.
Here is some help to correctly set the cron job time and recurrence. Assuming the script resides in a directory called "product_price_monitor" in your home directory and is named "product_price_monitor.sh", the cron job running each day 09:00 am would be:
0 9 * * * ~/product_price_monitor/product_price_monitor.sh >/dev/null 2>&1
Don't forget to make the script executable:
chmod 700 product_price_monitor.sh
This is one of the shortest and most simple scripts I have. It could be extended and maybe I will do so one day. I'm not really good at programming an probably there are way more efficient and elegant ways to approach this. Nonetheless, it gets the job done. I like automation to save time and improve things and this is a good example.
I bought a bottle of the rum a few days later for 56€, which means 7.5€ or roughly 12% saved. Cheers!
Meanwhile, this method skimming the latest price from idealo has stopped working. They implemented some countermeasures and now reject any attempt to fetch a web page by wget. So they are not only incompetent implementing a price alert on their own, but also uncooperative circumventing their inaptitude.