Spotify Data ETL with Python
Wed 03 2021 #datascience
Based on Karolina Sowinska tutorial
Link to repo. You can clone or fork it at your will.
Preparation
- Getting Spotify token on this link
Extract
Extract means to download or obtain data from vendor. In this case we will get the data by calling an API. We will use requests library to call Spotify API. Recommended: Create a virtual environment for dependencies using conda. Activate it and install dependencies
conda create -n etl python=3 anaconda -y source activate etl conda install sqlalchemy pandas requests -y conda install -c conda-forge python-dotenv schedule -y
If you get this error:
{'error': {'status': 401, 'message': 'The access token expired'}}
Just reissue the token at Spotify. It expires very fast. Once you run the script you will have a data frame with the first 20 songs you listened to in the last 24 hours. Example:
song_name artlist_name played_at timestamp 0 Ahora - Directo Acústico Estopa 2021-03-04T16:51:58.535Z 2021-03-04 1 Just A Lover Hayden James 2021-03-03T19:48:52.589Z 2021-03-03 2 Nanã Polo & Pan 2021-03-03T19:46:01.361Z 2021-03-03 3 YOU'RE THE ONE KAYTRANADA 2021-03-03T19:42:50.098Z 2021-03-03 4 Can't Do Without You Caribou 2021-03-03T19:38:31.760Z 2021-03-03 5 Feel Like I Do Disclosure 2021-03-03T19:34:34.860Z 2021-03-03 6 Hong Kong C. Tangana 2021-03-03T19:31:10.274Z 2021-03-03 7 Los Tontos C. Tangana 2021-03-03T19:27:25.438Z 2021-03-03 8 Cuándo Olvidaré C. Tangana 2021-03-03T19:24:12.671Z 2021-03-03 9 CAMBIA! C. Tangana 2021-03-03T19:20:40.938Z 2021-03-03 10 Muriendo De Envidia C. Tangana 2021-03-03T19:17:33.074Z 2021-03-03 11 Te Olvidaste C. Tangana 2021-03-03T19:13:30.174Z 2021-03-03 12 Un Veneno - G-Mix C. Tangana 2021-03-03T19:10:22.709Z 2021-03-03 13 Nominao C. Tangana 2021-03-03T19:05:43.191Z 2021-03-03 14 Ingobernable C. Tangana 2021-03-03T19:02:46.900Z 2021-03-03 15 Párteme La Cara C. Tangana 2021-03-03T18:59:39.944Z 2021-03-03 16 Nunca Estoy C. Tangana 2021-03-03T18:56:52.031Z 2021-03-03 17 Comerte Entera C. Tangana 2021-03-03T18:54:09.563Z 2021-03-03 18 Tú Me Dejaste De Querer C. Tangana 2021-03-03T18:50:13.927Z 2021-03-03 19 Demasiadas Mujeres C. Tangana 2021-03-03T18:46:55.615Z 2021-03-03
Once you got this data the extract stage is completed.
Load
Load means validate the data to ensure quality. We will create a function to which we will pass a data frame as parameter to check:
- If dataframe is empty
- That there are no duplicates
- No nulls
- No data out of the specific time frame (last 24 hours)
If everything ok, the check will return the following message:
Data valid, proceed to Load stage
Extract
The goal of this stage is to save the data in our database, which in this case is SQLite. Once completed this step you can check table “my_played_tracks”:
sqlite3
sqlite > .open my_played_tracks.sqlite sqlite > select * from my_played_tracks; Just A Lover|Hayden James|2021-03-03T19:48:52.589Z|2021-03-03 Nanã|Polo & Pan|2021-03-03T19:46:01.361Z|2021-03-03 YOU'RE THE ONE|KAYTRANADA|2021-03-03T19:42:50.098Z|2021-03-03 Can't Do Without You|Caribou|2021-03-03T19:38:31.760Z|2021-03-03 Feel Like I Do|Disclosure|2021-03-03T19:34:34.860Z|2021-03-03 Hong Kong|C. Tangana|2021-03-03T19:31:10.274Z|2021-03-03 Los Tontos|C. Tangana|2021-03-03T19:27:25.438Z|2021-03-03 Cuándo Olvidaré|C. Tangana|2021-03-03T19:24:12.671Z|2021-03-03 CAMBIA!|C. Tangana|2021-03-03T19:20:40.938Z|2021-03-03 Muriendo De Envidia|C. Tangana|2021-03-03T19:17:33.074Z|2021-03-03 Te Olvidaste|C. Tangana|2021-03-03T19:13:30.174Z|2021-03-03 Un Veneno - G-Mix|C. Tangana|2021-03-03T19:10:22.709Z|2021-03-03 Nominao|C. Tangana|2021-03-03T19:05:43.191Z|2021-03-03 Ingobernable|C. Tangana|2021-03-03T19:02:46.900Z|2021-03-03 Párteme La Cara|C. Tangana|2021-03-03T18:59:39.944Z|2021-03
With this last step our ETL is completed. We have downloaded, validated and saved data from a vendor to our database. There are multiple variations of this process, please let me know if you want to explore a different setup Contact. i.e saving to a non relational database in the cloud.
Optional: Scheduling the ETL Process with Cron Jobs
Instead of triggering the code every 24 hours we can automate this step. In this particular case we used the schedule library, so while the program is running it will trigger each day at 00:00. Details in the code.