Visualising my Goodreads data

I had to push quite hard to reach my Goodreads reading goal for 2025. I finished Circe by Madeline Miller, my 40th book of the year, on the 27th of December. You can download your personal Goodreads data in your profile. Let’s analyze mine.

Inspection

Install pandas.

%pip install pandas

Load the data.

import pandas as pd

goodreads_data = pd.read_csv("goodreads_library_export.csv", parse_dates=["Date Added", "Date Read"])

Show information about “Circe”.

circe_info = goodreads_data[goodreads_data["Title"] == "Circe"]
info_subset = ["Title", "Author", "My Rating", "Date Added", "Date Read"]
circe_info[info_subset]

  Title Author My Rating Date Added Date Read
1 Circe Madeline Miller 3 2025-12-22 2025-12-27

See how many books I’ve read in total.

goodreads_data['Exclusive Shelf'].value_counts()
Exclusive Shelf
read                 349
to-read                6
currently-reading      1
Name: count, dtype: int64

See how many books I’ve read in 2025.

books_read = goodreads_data[goodreads_data["Exclusive Shelf"] == "read"]
book_read_2025 =books_read[books_read["Date Read"].dt.year == 2025]
len(book_read_2025.index)
40

Visualising reading behaviour

Install matplotlib. Import numpy and matplotlib

%pip install matplotlib
import numpy as np
import matplotlib.pyplot as plt

Show how many books per year I’ve read and the linear trend.

books_per_year = (
    books_read.groupby(books_read["Date Read"].dt.year)
    .size()
)
# Fill missing years with 0
books_per_year = books_per_year.reindex(range(int(books_per_year.index.min()), int(books_per_year.index.max()) + 1), fill_value=0)

# fit a linear trend
books_per_year_z = np.polyfit(books_per_year.index, books_per_year.values, 1)
books_per_year_p = np.poly1d(books_per_year_z)

plt.figure(figsize=(6, 3))
plt.plot(books_per_year.index, books_per_year.values, marker="o")
plt.plot(books_per_year.index, books_per_year_p(books_per_year.index), "r--", alpha=0.8, label="Trend")

plt.title("Books read per year")
plt.xlabel("Year")
plt.ylabel("Books read")
plt.xticks(books_per_year.index, rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Books read per year

Nice! Looks like the number of books I read per year is trending upwards. Critical thinking: does reading more books automatically mean that I read more words? Maybe it’s more fair to plot the number of pages per year.

pages_per_year = (
    books_read.groupby(books_read["Date Read"].dt.year)["Number of Pages"]
    .sum(min_count=1)
    .sort_index()
)

# fill missing years with 0
pages_per_year = pages_per_year.reindex(range(int(pages_per_year.index.min()), int(pages_per_year.index.max()) + 1), fill_value=0)

pages_per_year_z = np.polyfit(pages_per_year.index, pages_per_year.values, 1)
pages_per_year_p = np.poly1d(pages_per_year_z)

plt.figure(figsize=(6, 3))
plt.plot(pages_per_year.index, pages_per_year.values, marker="o")
plt.plot(pages_per_year.index, pages_per_year_p(pages_per_year.index), "r--", alpha=0.8, label="Trend")
plt.title("Pages read per year")
plt.xlabel("Year")
plt.ylabel("Pages read")
plt.xticks(pages_per_year.index, rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Pages reads per year

book_slope, pages_slope = books_per_year_z[0], pages_per_year_z[0]
book_mean, pages_mean = books_per_year.mean(), pages_per_year.mean()
book_normalized_slope, pages_normalized_slope = book_slope / book_mean, pages_slope / pages_mean

print(f"Books slope: {book_slope:.2f} books/year")
print(f"Pages slope: {pages_slope:.2f} pages/year")
print(f"Books normalized slope: {book_normalized_slope:.4f} (books/year/mean)")
print(f"Pages normalized slope: {pages_normalized_slope:.4f} (pages/year/mean)")
Books slope: 3.18 books/year
Pages slope: 1005.24 pages/year
Books normalized slope: 0.1477 (books/year/mean)
Pages normalized slope: 0.1249 (pages/year/mean)

The number of read books increases by 3.18 on average per year. The number of read pages increases by 1005.24 on average per year. The number of books increases more on average (14.77% of mean) than the number of pages (12.49% of mean). I can confirm this anecdotally, since having a Goodreads goal has caused me to pick up a few shorter reads in more recent years.

For our final visualisation let’s see which reading goals were met. This data is not present in the library export, but I can add it here by hand:

reading_goals = pd.DataFrame(
    {
        "Year": [2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026],
        "Goal": [0, 0, 0, 0, 0, 20, 25, 25, 25, 25, 35, 40, 40, 40, 40]
    }
)
reading_goals
Year Goal
2012 0
2013 0
2014 0
2015 0
2016 0
2017 20
2018 25
2019 25
2020 25
2021 25
2022 35
2023 40
2024 40
2025 40
2026 40

I had reading goals defined since 2017. I’ll set the reading goal in preceding years to 0.

from matplotlib.lines import Line2D
from matplotlib.patches import Patch

reading_goals["Books Read"] = reading_goals["Year"].map(books_per_year).fillna(0).astype(int)
reading_goals["Goal Met"] = reading_goals["Books Read"] >= reading_goals["Goal"]
colors = np.where(reading_goals["Goal Met"], "tab:green", "tab:red")

plt.figure(figsize=(6, 3))
plt.bar(reading_goals["Year"], reading_goals["Books Read"], color=colors, alpha=0.85)
plt.plot(reading_goals["Year"], reading_goals["Goal"], color="black", linestyle="--", marker="o")

legend_handles = [
    Patch(facecolor="tab:green", edgecolor="none", label="Goal met"),
    Patch(facecolor="tab:red", edgecolor="none", label="Goal not met"),
    Line2D([0], [0], color="black", linestyle="--", marker="o", label="Goal"),
]
plt.legend(handles=legend_handles)

plt.title("Reading goals vs books read")
plt.xlabel("Year")
plt.ylabel("Books")
plt.xticks(reading_goals["Year"], rotation=45)
plt.grid(True, axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

Reading goals vs books read

Analysing my own “goal-picking” behaviour, it looks like I’ve increased my target almost every year after I hit a target. The exceptions are 2012-2016, during which I wasn’t actually using Goodreads yet, so this is more of an administrative issue than a reading issue, and this year (2026), because I only just made it to 40 books, and I actively sought out a couple of shorter books to get there.