Spotify Data Analysis

Created a script which analyze the streams since creating your account.

How to get the data: The folder that you have should contain your entire streaming history data for the life of your account. This can be obtained by pressing the Request data button in this website if you are logged in to your account. Make sure to select the field that says Select Extended streaming history. After some weeks, you will get an email with the extended streaming history. After downloading it, you will have a zip file called my_spotify_data.zip and when opened the directory is MyData.

The MyData will contain a pdf file which details the contents of the other files in the directory. The files we care about are the ones that start with Streaming_History_Audio and are json files.

The following is from the Understanding my Data. A list of items (e.g. songs, videos, and podcasts) listened to or watched during the lifetime of your account, including the following details:

ts - Date and time of when the stream ended in UTC format (Coordinated Universal Time zone).
username - Your Spotify username.
platform - Platform used when streaming the track (e.g. Android OS, Google Chromecast).
ms_played - For how many milliseconds the track was played.
conn_country - Country code of the country where the stream was played.
ip_addr_decrypted - IP address used when streaming the track.
user_agent_decrypted - User agent used when streaming the track (e.g. a browser).
master_metadata_track_name - Name of the track.
master_metadata_album_artist_name - Name of the artist, band or podcast.
master_metadata_album_album_name - Name of the album of the track.
spotify_track_uri - A Spotify Track URI, that is identifying the unique music track.
episode_name - Name of the episode of the podcast.
episode_show_name - Name of the show of the podcast.
spotify_episode_uri - A Spotify Episode URI, that is identifying the unique podcast episode.
reason_start - Reason why the track started (e.g. previous track finished or you picked it from the playlist).
reason_end - Reason why the track ended (e.g. the track finished playing or you hit the next button).
shuffle - Whether shuffle mode was used when playing the track.
skipped - Information whether the user skipped to the next song.
offline - Information whether the track was played in offline mode.
offline_timestamp - Timestamp of when offline mode was used, if it was used.
incognito_mode - Information whether the track was played during a private session.

Example of the streaming data of one song can be seen in the image on top of this page

All the functions in main.py have docstrings which contains the parameters of the function and it also contains what is returned by the function. Note that the columns of the pandas DataFrame returned by get_all_data() can be found here

How to run main.py in the terminal First, make sure that the MyData directory is in the same directory as main.py. After this, you should run python3 main.py and it will create a txt file called analysis.txt which will contain the analyzed data after running the functions of main.py using the data from the MyData directory.

The analysis.txt file contains the following analysis of the data from the MyData directory:

Total Time Listened
Most Streamed Artist by time
Most Streamed Artist by songs played
Most Streamed Songs by time played
Most Streamed Songs by amount of times played
Percent of songs on shuffle vs not on shuffle
Percent of songs played offline vs played online
Reasons a song started with their respective percentage
Reasons a song ended with their respective percentage

The source code can be found here.