Trending-Collaborative-filtering-recommendation-system-ML¶

I have created flask api for both these filtering method using which u can make a frontend showcasing trending books and find book recommendations based on users input¶

Link to GithubRepo - Github¶

In [ ]:
import numpy as np
import pandas as pd
In [ ]:
books = pd.read_csv(r'Data\Books.csv')
users = pd.read_csv(r'Data\Users.csv')
ratings = pd.read_csv(r'Data\Ratings.csv')
C:\Users\omkar\AppData\Local\Temp\ipykernel_6436\1990932736.py:1: DtypeWarning: Columns (3) have mixed types. Specify dtype option on import or set low_memory=False.
  books = pd.read_csv(r'Data\Books.csv')
In [ ]:
books.head()
Out[ ]:
ISBN Book-Title Book-Author Year-Of-Publication Publisher Image-URL-S Image-URL-M Image-URL-L
0 0195153448 Classical Mythology Mark P. O. Morford 2002 Oxford University Press http://images.amazon.com/images/P/0195153448.0... http://images.amazon.com/images/P/0195153448.0... http://images.amazon.com/images/P/0195153448.0...
1 0002005018 Clara Callan Richard Bruce Wright 2001 HarperFlamingo Canada http://images.amazon.com/images/P/0002005018.0... http://images.amazon.com/images/P/0002005018.0... http://images.amazon.com/images/P/0002005018.0...
2 0060973129 Decision in Normandy Carlo D'Este 1991 HarperPerennial http://images.amazon.com/images/P/0060973129.0... http://images.amazon.com/images/P/0060973129.0... http://images.amazon.com/images/P/0060973129.0...
3 0374157065 Flu: The Story of the Great Influenza Pandemic... Gina Bari Kolata 1999 Farrar Straus Giroux http://images.amazon.com/images/P/0374157065.0... http://images.amazon.com/images/P/0374157065.0... http://images.amazon.com/images/P/0374157065.0...
4 0393045218 The Mummies of Urumchi E. J. W. Barber 1999 W. W. Norton & Company http://images.amazon.com/images/P/0393045218.0... http://images.amazon.com/images/P/0393045218.0... http://images.amazon.com/images/P/0393045218.0...
In [ ]:
users.head()
Out[ ]:
User-ID Location Age
0 1 nyc, new york, usa NaN
1 2 stockton, california, usa 18.0
2 3 moscow, yukon territory, russia NaN
3 4 porto, v.n.gaia, portugal 17.0
4 5 farnborough, hants, united kingdom NaN
In [ ]:
ratings.head()
Out[ ]:
User-ID ISBN Book-Rating
0 276725 034545104X 0
1 276726 0155061224 5
2 276727 0446520802 0
3 276729 052165615X 3
4 276729 0521795028 6
In [ ]:
print("books data",books.shape)
print("rating data",ratings.shape)
print("user data",users.shape)
books data (271360, 8)
rating data (1149780, 3)
user data (278858, 3)

so we got to know that we have 2lakh 71 thousand books and have 2 lakh 71 thousand users and 11+ lakhs of ratings

In [ ]:
books.isnull().sum()
Out[ ]:
ISBN                   0
Book-Title             0
Book-Author            2
Year-Of-Publication    0
Publisher              2
Image-URL-S            0
Image-URL-M            0
Image-URL-L            3
dtype: int64

so there is only few data in the books missing we can drop this all data

In [ ]:
ratings.isnull().sum()
Out[ ]:
User-ID        0
ISBN           0
Book-Rating    0
dtype: int64
In [ ]:
books.duplicated().sum()
Out[ ]:
0

so its good to know that there are zero duplicate books

In [ ]:
ratings.duplicated().sum()
Out[ ]:
0
In [ ]:
users.duplicated().sum()
Out[ ]:
0

Popularity based recommender system¶

In [ ]:
ratings.merge(books,on='ISBN').shape
Out[ ]:
(1031136, 10)

here we got 10 lakhs and not 11 lakhs coz ratings had 11 lakhs rows means rating for 11 lakh books .So the reason is there are some books in rating table whose isbn number is not in books table so it got filtered out

In [ ]:
ratings_with_name=ratings.merge(books,on='ISBN')

# 2. ONE-TIME conversion to numeric
ratings_with_name['Book-Rating'] = pd.to_numeric(
    ratings_with_name['Book-Rating'], errors='coerce'
)
ratings_with_name = ratings_with_name.dropna(subset=['Book-Rating'])
ratings_with_name
Out[ ]:
User-ID ISBN Book-Rating Book-Title Book-Author Year-Of-Publication Publisher Image-URL-S Image-URL-M Image-URL-L
0 276725 034545104X 0 Flesh Tones: A Novel M. J. Rose 2002 Ballantine Books http://images.amazon.com/images/P/034545104X.0... http://images.amazon.com/images/P/034545104X.0... http://images.amazon.com/images/P/034545104X.0...
1 276726 0155061224 5 Rites of Passage Judith Rae 2001 Heinle http://images.amazon.com/images/P/0155061224.0... http://images.amazon.com/images/P/0155061224.0... http://images.amazon.com/images/P/0155061224.0...
2 276727 0446520802 0 The Notebook Nicholas Sparks 1996 Warner Books http://images.amazon.com/images/P/0446520802.0... http://images.amazon.com/images/P/0446520802.0... http://images.amazon.com/images/P/0446520802.0...
3 276729 052165615X 3 Help!: Level 1 Philip Prowse 1999 Cambridge University Press http://images.amazon.com/images/P/052165615X.0... http://images.amazon.com/images/P/052165615X.0... http://images.amazon.com/images/P/052165615X.0...
4 276729 0521795028 6 The Amsterdam Connection : Level 4 (Cambridge ... Sue Leather 2001 Cambridge University Press http://images.amazon.com/images/P/0521795028.0... http://images.amazon.com/images/P/0521795028.0... http://images.amazon.com/images/P/0521795028.0...
... ... ... ... ... ... ... ... ... ... ...
1031131 276704 0876044011 0 Edgar Cayce on the Akashic Records: The Book o... Kevin J. Todeschi 1998 A.R.E. Press (Association of Research & Enlig http://images.amazon.com/images/P/0876044011.0... http://images.amazon.com/images/P/0876044011.0... http://images.amazon.com/images/P/0876044011.0...
1031132 276704 1563526298 9 Get Clark Smart : The Ultimate Guide for the S... Clark Howard 2000 Longstreet Press http://images.amazon.com/images/P/1563526298.0... http://images.amazon.com/images/P/1563526298.0... http://images.amazon.com/images/P/1563526298.0...
1031133 276706 0679447156 0 Eight Weeks to Optimum Health: A Proven Progra... Andrew Weil 1997 Alfred A. Knopf http://images.amazon.com/images/P/0679447156.0... http://images.amazon.com/images/P/0679447156.0... http://images.amazon.com/images/P/0679447156.0...
1031134 276709 0515107662 10 The Sherbrooke Bride (Bride Trilogy (Paperback)) Catherine Coulter 1996 Jove Books http://images.amazon.com/images/P/0515107662.0... http://images.amazon.com/images/P/0515107662.0... http://images.amazon.com/images/P/0515107662.0...
1031135 276721 0590442449 10 Fourth Grade Rats Jerry Spinelli 1996 Scholastic http://images.amazon.com/images/P/0590442449.0... http://images.amazon.com/images/P/0590442449.0... http://images.amazon.com/images/P/0590442449.0...

1031136 rows × 10 columns

In [ ]:
num_rating_df=ratings_with_name.groupby('Book-Title').count()['Book-Rating'].reset_index()
num_rating_df.rename(columns={'Book-Rating':'num_ratings'},inplace=True)
num_rating_df
Out[ ]:
Book-Title num_ratings
0 A Light in the Storm: The Civil War Diary of ... 4
1 Always Have Popsicles 1
2 Apple Magic (The Collector's series) 1
3 Ask Lily (Young Women of Faith: Lily Series, ... 1
4 Beyond IBM: Leadership Marketing and Finance ... 1
... ... ...
241066 Ã?Â?lpiraten. 2
241067 Ã?Â?rger mit Produkt X. Roman. 4
241068 Ã?Â?sterlich leben. 1
241069 Ã?Â?stlich der Berge. 3
241070 Ã?Â?thique en toc 2

241071 rows × 2 columns

in the above line i did group by by book title coz this dataset has same books with multiple isbn number like harry potter has multiple isbn number means its just a catlog number like harry potter paperback has different isbn number and hardcover has different isbn number but at the end it is the same book so it should carry the same weight from both the rating so we calculate it by title

In [ ]:
# avg_rating_df=ratings_with_name.groupby('Book-Title').mean()['Book-Rating'].reset_index()
# avg_rating_df.rename(columns={'Book-Rating':'avg_ratings'},inplace=True)
# avg_rating_df
# so now we convert the book rating column in the rating_with_name dataframe to numeric and put nan or text  values to 0
# in the ratings_with_name cell i converted the book rating to numeric and dropped the nan values
In [ ]:
avg_rating_df = (
    ratings_with_name
    .groupby('Book-Title')['Book-Rating']
    .mean()
    .reset_index(name='avg_ratings')
)
avg_rating_df
Out[ ]:
Book-Title avg_ratings
0 A Light in the Storm: The Civil War Diary of ... 2.250000
1 Always Have Popsicles 0.000000
2 Apple Magic (The Collector's series) 0.000000
3 Ask Lily (Young Women of Faith: Lily Series, ... 8.000000
4 Beyond IBM: Leadership Marketing and Finance ... 0.000000
... ... ...
241066 Ã?Â?lpiraten. 0.000000
241067 Ã?Â?rger mit Produkt X. Roman. 5.250000
241068 Ã?Â?sterlich leben. 7.000000
241069 Ã?Â?stlich der Berge. 2.666667
241070 Ã?Â?thique en toc 4.000000

241071 rows × 2 columns

In [ ]:
# just merging the two dataframes to get the popular books by number of ratings and average rating
popular_df=num_rating_df.merge(avg_rating_df,on='Book-Title')
popular_df
Out[ ]:
Book-Title num_ratings avg_ratings
0 A Light in the Storm: The Civil War Diary of ... 4 2.250000
1 Always Have Popsicles 1 0.000000
2 Apple Magic (The Collector's series) 1 0.000000
3 Ask Lily (Young Women of Faith: Lily Series, ... 1 8.000000
4 Beyond IBM: Leadership Marketing and Finance ... 1 0.000000
... ... ... ...
241066 Ã?Â?lpiraten. 2 0.000000
241067 Ã?Â?rger mit Produkt X. Roman. 4 5.250000
241068 Ã?Â?sterlich leben. 1 7.000000
241069 Ã?Â?stlich der Berge. 3 2.666667
241070 Ã?Â?thique en toc 2 4.000000

241071 rows × 3 columns

In [ ]:
popular_df = popular_df[popular_df['num_ratings'] >= 250].sort_values('avg_ratings', ascending=False).head(50)
#popular_df.shape
popular_df
Out[ ]:
Book-Title num_ratings avg_ratings
80434 Harry Potter and the Prisoner of Azkaban (Book 3) 428 5.852804
80422 Harry Potter and the Goblet of Fire (Book 4) 387 5.824289
80441 Harry Potter and the Sorcerer's Stone (Book 1) 278 5.737410
80426 Harry Potter and the Order of the Phoenix (Boo... 347 5.501441
80414 Harry Potter and the Chamber of Secrets (Book 2) 556 5.183453
191612 The Hobbit : The Enchanting Prelude to The Lor... 281 5.007117
187377 The Fellowship of the Ring (The Lord of the Ri... 368 4.948370
80445 Harry Potter and the Sorcerer's Stone (Harry P... 575 4.895652
211384 The Two Towers (The Lord of the Rings, Part 2) 260 4.880769
219741 To Kill a Mockingbird 510 4.700000
183573 The Da Vinci Code 898 4.642539
187880 The Five People You Meet in Heaven 430 4.551163
180556 The Catcher in the Rye 449 4.545657
196326 The Lovely Bones: A Novel 1295 4.468726
764 1984 284 4.454225
144165 Prodigal Summer: A Novel 253 4.450593
128670 Neverwhere 265 4.449057
206502 The Secret Life of Bees 774 4.447028
168719 Stupid White Men ...and Other Sorry Excuses fo... 283 4.356890
223135 Tuesdays with Morrie: An Old Man, a Young Man,... 493 4.354970
204387 The Red Tent (Bestselling Backlist) 723 4.334716
191589 The Hitchhiker's Guide to the Galaxy 268 4.328358
129379 Nickel and Dimed: On (Not) Getting By in America 335 4.289552
93381 Into the Wild 252 4.273810
63867 Fahrenheit 451 409 4.264059
74750 Girl with a Pearl Earring 526 4.218631
136145 Outlander 283 4.173145
233370 Where the Heart Is (Oprah's Book Club (Paperba... 585 4.105983
156102 Seabiscuit: An American Legend 275 4.098182
107962 Life of Pi 664 4.088855
176845 The Bean Trees 389 4.087404
2281 A Child Called \It\": One Child's Courage to S... 265 4.086792
8434 ANGELA'S ASHES 279 4.075269
76343 Good in Bed 490 4.055102
64931 Fast Food Nation: The Dark Side of the All-Ame... 321 4.037383
12700 American Gods 302 4.006623
161645 Skipping Christmas 322 4.006211
105777 Left Behind: A Novel of the Earth's Last Days ... 318 4.003145
189551 The Golden Compass (His Dark Materials, Book 1) 336 4.000000
181679 The Color Purple 314 3.964968
160336 Silence of the Lambs 256 3.960938
8752 About a Boy 262 3.900763
158138 Seven Up (A Stephanie Plum Novel) 278 3.888489
175097 The Alchemist: A Fable About Following Your Dream 266 3.875940
80069 Hard Eight : A Stephanie Plum Novel (A Stephan... 269 3.825279
170101 Suzanne's Diary for Nicholas 457 3.820569
111073 Lord of the Flies 259 3.818533
5664 A Prayer for Owen Meany 413 3.796610
212070 The Vampire Lestat (Vampire Chronicles, Book II) 301 3.777409
233851 White Oleander : A Novel (Oprah's Book Club) 356 3.772472

so here we printed top 50 and now i will show this all books on my top 50 books but we also see same books as because we have same books with different isbn number or just minor minor change so we drop the duplicate titles if any¶

In [ ]:
## here now i am getting the book data likes images isbn number from books dataframe for my popular books
popular_df=popular_df.merge(books, on='Book-Title').drop_duplicates('Book-Title')
popular_df
Out[ ]:
Book-Title num_ratings avg_ratings ISBN Book-Author Year-Of-Publication Publisher Image-URL-S Image-URL-M Image-URL-L
0 Harry Potter and the Prisoner of Azkaban (Book 3) 428 5.852804 0439136350 J. K. Rowling 1999 Scholastic http://images.amazon.com/images/P/0439136350.0... http://images.amazon.com/images/P/0439136350.0... http://images.amazon.com/images/P/0439136350.0...
3 Harry Potter and the Goblet of Fire (Book 4) 387 5.824289 0439139597 J. K. Rowling 2000 Scholastic http://images.amazon.com/images/P/0439139597.0... http://images.amazon.com/images/P/0439139597.0... http://images.amazon.com/images/P/0439139597.0...
5 Harry Potter and the Sorcerer's Stone (Book 1) 278 5.737410 0590353403 J. K. Rowling 1998 Scholastic http://images.amazon.com/images/P/0590353403.0... http://images.amazon.com/images/P/0590353403.0... http://images.amazon.com/images/P/0590353403.0...
9 Harry Potter and the Order of the Phoenix (Boo... 347 5.501441 043935806X J. K. Rowling 2003 Scholastic http://images.amazon.com/images/P/043935806X.0... http://images.amazon.com/images/P/043935806X.0... http://images.amazon.com/images/P/043935806X.0...
13 Harry Potter and the Chamber of Secrets (Book 2) 556 5.183453 0439064872 J. K. Rowling 2000 Scholastic http://images.amazon.com/images/P/0439064872.0... http://images.amazon.com/images/P/0439064872.0... http://images.amazon.com/images/P/0439064872.0...
16 The Hobbit : The Enchanting Prelude to The Lor... 281 5.007117 0345339681 J.R.R. TOLKIEN 1986 Del Rey http://images.amazon.com/images/P/0345339681.0... http://images.amazon.com/images/P/0345339681.0... http://images.amazon.com/images/P/0345339681.0...
17 The Fellowship of the Ring (The Lord of the Ri... 368 4.948370 0345339703 J.R.R. TOLKIEN 1986 Del Rey http://images.amazon.com/images/P/0345339703.0... http://images.amazon.com/images/P/0345339703.0... http://images.amazon.com/images/P/0345339703.0...
26 Harry Potter and the Sorcerer's Stone (Harry P... 575 4.895652 059035342X J. K. Rowling 1999 Arthur A. Levine Books http://images.amazon.com/images/P/059035342X.0... http://images.amazon.com/images/P/059035342X.0... http://images.amazon.com/images/P/059035342X.0...
28 The Two Towers (The Lord of the Rings, Part 2) 260 4.880769 0345339711 J.R.R. TOLKIEN 1986 Del Rey http://images.amazon.com/images/P/0345339711.0... http://images.amazon.com/images/P/0345339711.0... http://images.amazon.com/images/P/0345339711.0...
39 To Kill a Mockingbird 510 4.700000 0446310786 Harper Lee 1988 Little Brown & Company http://images.amazon.com/images/P/0446310786.0... http://images.amazon.com/images/P/0446310786.0... http://images.amazon.com/images/P/0446310786.0...
47 The Da Vinci Code 898 4.642539 0385504209 Dan Brown 2003 Doubleday http://images.amazon.com/images/P/0385504209.0... http://images.amazon.com/images/P/0385504209.0... http://images.amazon.com/images/P/0385504209.0...
53 The Five People You Meet in Heaven 430 4.551163 0786868716 Mitch Albom 2003 Hyperion http://images.amazon.com/images/P/0786868716.0... http://images.amazon.com/images/P/0786868716.0... http://images.amazon.com/images/P/0786868716.0...
55 The Catcher in the Rye 449 4.545657 0316769487 J.D. Salinger 1991 Little, Brown http://images.amazon.com/images/P/0316769487.0... http://images.amazon.com/images/P/0316769487.0... http://images.amazon.com/images/P/0316769487.0...
62 The Lovely Bones: A Novel 1295 4.468726 0316666343 Alice Sebold 2002 Little, Brown http://images.amazon.com/images/P/0316666343.0... http://images.amazon.com/images/P/0316666343.0... http://images.amazon.com/images/P/0316666343.0...
63 1984 284 4.454225 0451524934 George Orwell 1990 Signet Book http://images.amazon.com/images/P/0451524934.0... http://images.amazon.com/images/P/0451524934.0... http://images.amazon.com/images/P/0451524934.0...
72 Prodigal Summer: A Novel 253 4.450593 0060959037 Barbara Kingsolver 2001 Perennial http://images.amazon.com/images/P/0060959037.0... http://images.amazon.com/images/P/0060959037.0... http://images.amazon.com/images/P/0060959037.0...
73 Neverwhere 265 4.449057 0380789019 Neil Gaiman 1998 Avon http://images.amazon.com/images/P/0380789019.0... http://images.amazon.com/images/P/0380789019.0... http://images.amazon.com/images/P/0380789019.0...
78 The Secret Life of Bees 774 4.447028 0142001740 Sue Monk Kidd 2003 Penguin Books http://images.amazon.com/images/P/0142001740.0... http://images.amazon.com/images/P/0142001740.0... http://images.amazon.com/images/P/0142001740.0...
84 Stupid White Men ...and Other Sorry Excuses fo... 283 4.356890 0060392452 Michael Moore 2002 Regan Books http://images.amazon.com/images/P/0060392452.0... http://images.amazon.com/images/P/0060392452.0... http://images.amazon.com/images/P/0060392452.0...
85 Tuesdays with Morrie: An Old Man, a Young Man,... 493 4.354970 0385484518 MITCH ALBOM 1997 Doubleday http://images.amazon.com/images/P/0385484518.0... http://images.amazon.com/images/P/0385484518.0... http://images.amazon.com/images/P/0385484518.0...
88 The Red Tent (Bestselling Backlist) 723 4.334716 0312195516 Anita Diamant 1998 Picador USA http://images.amazon.com/images/P/0312195516.0... http://images.amazon.com/images/P/0312195516.0... http://images.amazon.com/images/P/0312195516.0...
89 The Hitchhiker's Guide to the Galaxy 268 4.328358 0671461494 Douglas Adams 1982 Pocket http://images.amazon.com/images/P/0671461494.0... http://images.amazon.com/images/P/0671461494.0... http://images.amazon.com/images/P/0671461494.0...
98 Nickel and Dimed: On (Not) Getting By in America 335 4.289552 0805063897 Barbara Ehrenreich 2002 Owl Books http://images.amazon.com/images/P/0805063897.0... http://images.amazon.com/images/P/0805063897.0... http://images.amazon.com/images/P/0805063897.0...
100 Into the Wild 252 4.273810 0385486804 Jon Krakauer 1997 Anchor http://images.amazon.com/images/P/0385486804.0... http://images.amazon.com/images/P/0385486804.0... http://images.amazon.com/images/P/0385486804.0...
103 Fahrenheit 451 409 4.264059 3257208626 Ray Bradbury 1994 Distribooks Inc http://images.amazon.com/images/P/3257208626.0... http://images.amazon.com/images/P/3257208626.0... http://images.amazon.com/images/P/3257208626.0...
116 Girl with a Pearl Earring 526 4.218631 0452282152 Tracy Chevalier 2001 Plume Books http://images.amazon.com/images/P/0452282152.0... http://images.amazon.com/images/P/0452282152.0... http://images.amazon.com/images/P/0452282152.0...
117 Outlander 283 4.173145 0440222915 DIANA GABALDON 1996 Dell http://images.amazon.com/images/P/0440222915.0... http://images.amazon.com/images/P/0440222915.0... http://images.amazon.com/images/P/0440222915.0...
123 Where the Heart Is (Oprah's Book Club (Paperba... 585 4.105983 0446672211 Billie Letts 1998 Warner Books http://images.amazon.com/images/P/0446672211.0... http://images.amazon.com/images/P/0446672211.0... http://images.amazon.com/images/P/0446672211.0...
124 Seabiscuit: An American Legend 275 4.098182 0449005615 LAURA HILLENBRAND 2002 Ballantine Books http://images.amazon.com/images/P/0449005615.0... http://images.amazon.com/images/P/0449005615.0... http://images.amazon.com/images/P/0449005615.0...
126 Life of Pi 664 4.088855 0151008116 Yann Martel 2002 Harcourt http://images.amazon.com/images/P/0151008116.0... http://images.amazon.com/images/P/0151008116.0... http://images.amazon.com/images/P/0151008116.0...
130 The Bean Trees 389 4.087404 0060915544 Barbara Kingsolver 1989 Perennial http://images.amazon.com/images/P/0060915544.0... http://images.amazon.com/images/P/0060915544.0... http://images.amazon.com/images/P/0060915544.0...
133 A Child Called \It\": One Child's Courage to S... 265 4.086792 1558743669 Dave Pelzer 1995 Health Communications http://images.amazon.com/images/P/1558743669.0... http://images.amazon.com/images/P/1558743669.0... http://images.amazon.com/images/P/1558743669.0...
135 ANGELA'S ASHES 279 4.075269 0684874350 Frank McCourt 1996 Scribner http://images.amazon.com/images/P/0684874350.0... http://images.amazon.com/images/P/0684874350.0... http://images.amazon.com/images/P/0684874350.0...
137 Good in Bed 490 4.055102 0743418174 Jennifer Weiner 2002 Washington Square Press http://images.amazon.com/images/P/0743418174.0... http://images.amazon.com/images/P/0743418174.0... http://images.amazon.com/images/P/0743418174.0...
139 Fast Food Nation: The Dark Side of the All-Ame... 321 4.037383 0060938455 Eric Schlosser 2002 Perennial http://images.amazon.com/images/P/0060938455.0... http://images.amazon.com/images/P/0060938455.0... http://images.amazon.com/images/P/0060938455.0...
140 American Gods 302 4.006623 0380789035 Neil Gaiman 2002 HarperTorch http://images.amazon.com/images/P/0380789035.0... http://images.amazon.com/images/P/0380789035.0... http://images.amazon.com/images/P/0380789035.0...
141 Skipping Christmas 322 4.006211 0385505833 JOHN GRISHAM 2001 Doubleday http://images.amazon.com/images/P/0385505833.0... http://images.amazon.com/images/P/0385505833.0... http://images.amazon.com/images/P/0385505833.0...
147 Left Behind: A Novel of the Earth's Last Days ... 318 4.003145 0842329129 Tim Lahaye 1996 Tyndale House Publishers http://images.amazon.com/images/P/0842329129.0... http://images.amazon.com/images/P/0842329129.0... http://images.amazon.com/images/P/0842329129.0...
149 The Golden Compass (His Dark Materials, Book 1) 336 4.000000 037582345X PHILIP PULLMAN 2002 Knopf Books for Young Readers http://images.amazon.com/images/P/037582345X.0... http://images.amazon.com/images/P/037582345X.0... http://images.amazon.com/images/P/037582345X.0...
157 The Color Purple 314 3.964968 0671617028 Alice Walker 1985 Pocket Books http://images.amazon.com/images/P/0671617028.0... http://images.amazon.com/images/P/0671617028.0... http://images.amazon.com/images/P/0671617028.0...
163 Silence of the Lambs 256 3.960938 0312924585 Thomas Harris 1991 St. Martin's Press http://images.amazon.com/images/P/0312924585.0... http://images.amazon.com/images/P/0312924585.0... http://images.amazon.com/images/P/0312924585.0...
166 About a Boy 262 3.900763 1573227331 Nick Hornby 1999 Riverhead Books http://images.amazon.com/images/P/1573227331.0... http://images.amazon.com/images/P/1573227331.0... http://images.amazon.com/images/P/1573227331.0...
173 Seven Up (A Stephanie Plum Novel) 278 3.888489 0312265840 Janet Evanovich 2001 St. Martin's Press http://images.amazon.com/images/P/0312265840.0... http://images.amazon.com/images/P/0312265840.0... http://images.amazon.com/images/P/0312265840.0...
175 The Alchemist: A Fable About Following Your Dream 266 3.875940 0062502174 Paulo Coelho 1993 HarperSanFrancisco http://images.amazon.com/images/P/0062502174.0... http://images.amazon.com/images/P/0062502174.0... http://images.amazon.com/images/P/0062502174.0...
177 Hard Eight : A Stephanie Plum Novel (A Stephan... 269 3.825279 0312983867 Janet Evanovich 2003 St. Martin's Paperbacks http://images.amazon.com/images/P/0312983867.0... http://images.amazon.com/images/P/0312983867.0... http://images.amazon.com/images/P/0312983867.0...
182 Suzanne's Diary for Nicholas 457 3.820569 0316969443 James Patterson 2001 Little, Brown http://images.amazon.com/images/P/0316969443.0... http://images.amazon.com/images/P/0316969443.0... http://images.amazon.com/images/P/0316969443.0...
186 Lord of the Flies 259 3.818533 0399501487 William Gerald Golding 1959 Perigee Trade http://images.amazon.com/images/P/0399501487.0... http://images.amazon.com/images/P/0399501487.0... http://images.amazon.com/images/P/0399501487.0...
192 A Prayer for Owen Meany 413 3.796610 0345361792 John Irving 1990 Ballantine Books http://images.amazon.com/images/P/0345361792.0... http://images.amazon.com/images/P/0345361792.0... http://images.amazon.com/images/P/0345361792.0...
194 The Vampire Lestat (Vampire Chronicles, Book II) 301 3.777409 0345313860 ANNE RICE 1986 Ballantine Books http://images.amazon.com/images/P/0345313860.0... http://images.amazon.com/images/P/0345313860.0... http://images.amazon.com/images/P/0345313860.0...
195 White Oleander : A Novel (Oprah's Book Club) 356 3.772472 0316284955 Janet Fitch 2000 Back Bay Books http://images.amazon.com/images/P/0316284955.0... http://images.amazon.com/images/P/0316284955.0... http://images.amazon.com/images/P/0316284955.0...

Collaborative filtering based recommendation system¶

In [ ]:
ratings_with_name
Out[ ]:
User-ID ISBN Book-Rating Book-Title Book-Author Year-Of-Publication Publisher Image-URL-S Image-URL-M Image-URL-L
0 276725 034545104X 0 Flesh Tones: A Novel M. J. Rose 2002 Ballantine Books http://images.amazon.com/images/P/034545104X.0... http://images.amazon.com/images/P/034545104X.0... http://images.amazon.com/images/P/034545104X.0...
1 276726 0155061224 5 Rites of Passage Judith Rae 2001 Heinle http://images.amazon.com/images/P/0155061224.0... http://images.amazon.com/images/P/0155061224.0... http://images.amazon.com/images/P/0155061224.0...
2 276727 0446520802 0 The Notebook Nicholas Sparks 1996 Warner Books http://images.amazon.com/images/P/0446520802.0... http://images.amazon.com/images/P/0446520802.0... http://images.amazon.com/images/P/0446520802.0...
3 276729 052165615X 3 Help!: Level 1 Philip Prowse 1999 Cambridge University Press http://images.amazon.com/images/P/052165615X.0... http://images.amazon.com/images/P/052165615X.0... http://images.amazon.com/images/P/052165615X.0...
4 276729 0521795028 6 The Amsterdam Connection : Level 4 (Cambridge ... Sue Leather 2001 Cambridge University Press http://images.amazon.com/images/P/0521795028.0... http://images.amazon.com/images/P/0521795028.0... http://images.amazon.com/images/P/0521795028.0...
... ... ... ... ... ... ... ... ... ... ...
1031131 276704 0876044011 0 Edgar Cayce on the Akashic Records: The Book o... Kevin J. Todeschi 1998 A.R.E. Press (Association of Research & Enlig http://images.amazon.com/images/P/0876044011.0... http://images.amazon.com/images/P/0876044011.0... http://images.amazon.com/images/P/0876044011.0...
1031132 276704 1563526298 9 Get Clark Smart : The Ultimate Guide for the S... Clark Howard 2000 Longstreet Press http://images.amazon.com/images/P/1563526298.0... http://images.amazon.com/images/P/1563526298.0... http://images.amazon.com/images/P/1563526298.0...
1031133 276706 0679447156 0 Eight Weeks to Optimum Health: A Proven Progra... Andrew Weil 1997 Alfred A. Knopf http://images.amazon.com/images/P/0679447156.0... http://images.amazon.com/images/P/0679447156.0... http://images.amazon.com/images/P/0679447156.0...
1031134 276709 0515107662 10 The Sherbrooke Bride (Bride Trilogy (Paperback)) Catherine Coulter 1996 Jove Books http://images.amazon.com/images/P/0515107662.0... http://images.amazon.com/images/P/0515107662.0... http://images.amazon.com/images/P/0515107662.0...
1031135 276721 0590442449 10 Fourth Grade Rats Jerry Spinelli 1996 Scholastic http://images.amazon.com/images/P/0590442449.0... http://images.amazon.com/images/P/0590442449.0... http://images.amazon.com/images/P/0590442449.0...

1031136 rows × 10 columns

In [ ]:
x=ratings_with_name.groupby('User-ID').count()['Book-Rating']>200
padhe_likhe_users=x[x].index # this will filter all false values
padhe_likhe_users.shape #here only 811 users have rated more than 200 books
Out[ ]:
(811,)
In [ ]:
filtered_rating = ratings_with_name[ratings_with_name['User-ID'].isin(padhe_likhe_users)]
filtered_rating
Out[ ]:
User-ID ISBN Book-Rating Book-Title Book-Author Year-Of-Publication Publisher Image-URL-S Image-URL-M Image-URL-L
1150 277427 002542730X 10 Politically Correct Bedtime Stories: Modern Ta... James Finn Garner 1994 John Wiley & Sons Inc http://images.amazon.com/images/P/002542730X.0... http://images.amazon.com/images/P/002542730X.0... http://images.amazon.com/images/P/002542730X.0...
1151 277427 0026217457 0 Vegetarian Times Complete Cookbook Lucy Moll 1995 John Wiley & Sons http://images.amazon.com/images/P/0026217457.0... http://images.amazon.com/images/P/0026217457.0... http://images.amazon.com/images/P/0026217457.0...
1152 277427 003008685X 8 Pioneers James Fenimore Cooper 1974 Thomson Learning http://images.amazon.com/images/P/003008685X.0... http://images.amazon.com/images/P/003008685X.0... http://images.amazon.com/images/P/003008685X.0...
1153 277427 0030615321 0 Ask for May, Settle for June (A Doonesbury book) G. B. Trudeau 1982 Henry Holt & Co http://images.amazon.com/images/P/0030615321.0... http://images.amazon.com/images/P/0030615321.0... http://images.amazon.com/images/P/0030615321.0...
1154 277427 0060002050 0 On a Wicked Dawn (Cynster Novels) Stephanie Laurens 2002 Avon Books http://images.amazon.com/images/P/0060002050.0... http://images.amazon.com/images/P/0060002050.0... http://images.amazon.com/images/P/0060002050.0...
... ... ... ... ... ... ... ... ... ... ...
1029357 275970 1931868123 0 There's a Porcupine in My Outhouse: Misadventu... Mike Tougias 2002 Capital Books (VA) http://images.amazon.com/images/P/1931868123.0... http://images.amazon.com/images/P/1931868123.0... http://images.amazon.com/images/P/1931868123.0...
1029358 275970 3411086211 10 Die Biene. Sybil Gr�¤fin Sch�¶nfeldt 1993 Bibliographisches Institut, Mannheim http://images.amazon.com/images/P/3411086211.0... http://images.amazon.com/images/P/3411086211.0... http://images.amazon.com/images/P/3411086211.0...
1029359 275970 3829021860 0 The Penis Book Joseph Cohen 1999 Konemann http://images.amazon.com/images/P/3829021860.0... http://images.amazon.com/images/P/3829021860.0... http://images.amazon.com/images/P/3829021860.0...
1029360 275970 4770019572 0 Musashi Eiji Yoshikawa 1995 Kodansha International (JPN) http://images.amazon.com/images/P/4770019572.0... http://images.amazon.com/images/P/4770019572.0... http://images.amazon.com/images/P/4770019572.0...
1029361 275970 9626340762 8 Northanger Abbey (Classic Literature with Clas... Jane Austen 1996 Naxos Audiobooks Ltd. http://images.amazon.com/images/P/9626340762.0... http://images.amazon.com/images/P/9626340762.0... http://images.amazon.com/images/P/9626340762.0...

474007 rows × 10 columns

so here only 474007 ratings are done by padhe likhe users rest 5 lakhs are done by voters who didnt rate or has read less than 200 books

In [ ]:
y=filtered_rating.groupby('Book-Title').count()['Book-Rating']>=50
famous_books=y[y].index 
famous_books
Out[ ]:
Index(['1984', '1st to Die: A Novel', '2nd Chance', '4 Blondes',
       'A Bend in the Road', 'A Case of Need',
       'A Child Called \It\": One Child's Courage to Survive"',
       'A Civil Action', 'A Day Late and a Dollar Short', 'A Fine Balance',
       ...
       'Winter Solstice', 'Wish You Well', 'Without Remorse',
       'Wizard and Glass (The Dark Tower, Book 4)', 'Wuthering Heights',
       'Year of Wonders', 'You Belong To Me',
       'Zen and the Art of Motorcycle Maintenance: An Inquiry into Values',
       'Zoya', '\O\" Is for Outlaw"'],
      dtype='object', name='Book-Title', length=706)
In [ ]:
final_ratings=filtered_rating[filtered_rating['Book-Title'].isin(famous_books)]
final_ratings
Out[ ]:
User-ID ISBN Book-Rating Book-Title Book-Author Year-Of-Publication Publisher Image-URL-S Image-URL-M Image-URL-L
1150 277427 002542730X 10 Politically Correct Bedtime Stories: Modern Ta... James Finn Garner 1994 John Wiley & Sons Inc http://images.amazon.com/images/P/002542730X.0... http://images.amazon.com/images/P/002542730X.0... http://images.amazon.com/images/P/002542730X.0...
1163 277427 0060930535 0 The Poisonwood Bible: A Novel Barbara Kingsolver 1999 Perennial http://images.amazon.com/images/P/0060930535.0... http://images.amazon.com/images/P/0060930535.0... http://images.amazon.com/images/P/0060930535.0...
1165 277427 0060934417 0 Bel Canto: A Novel Ann Patchett 2002 Perennial http://images.amazon.com/images/P/0060934417.0... http://images.amazon.com/images/P/0060934417.0... http://images.amazon.com/images/P/0060934417.0...
1168 277427 0061009059 9 One for the Money (Stephanie Plum Novels (Pape... Janet Evanovich 1995 HarperTorch http://images.amazon.com/images/P/0061009059.0... http://images.amazon.com/images/P/0061009059.0... http://images.amazon.com/images/P/0061009059.0...
1174 277427 006440188X 0 The Secret Garden Frances Hodgson Burnett 1998 HarperTrophy http://images.amazon.com/images/P/006440188X.0... http://images.amazon.com/images/P/006440188X.0... http://images.amazon.com/images/P/006440188X.0...
... ... ... ... ... ... ... ... ... ... ...
1029196 275970 1400031354 0 Tears of the Giraffe (No.1 Ladies Detective Ag... Alexander McCall Smith 2002 Anchor http://images.amazon.com/images/P/1400031354.0... http://images.amazon.com/images/P/1400031354.0... http://images.amazon.com/images/P/1400031354.0...
1029197 275970 1400031362 0 Morality for Beautiful Girls (No.1 Ladies Dete... Alexander McCall Smith 2002 Anchor http://images.amazon.com/images/P/1400031362.0... http://images.amazon.com/images/P/1400031362.0... http://images.amazon.com/images/P/1400031362.0...
1029270 275970 1573229725 0 Fingersmith Sarah Waters 2002 Riverhead Books http://images.amazon.com/images/P/1573229725.0... http://images.amazon.com/images/P/1573229725.0... http://images.amazon.com/images/P/1573229725.0...
1029309 275970 1586210661 9 Me Talk Pretty One Day David Sedaris 2001 Time Warner Audio Major http://images.amazon.com/images/P/1586210661.0... http://images.amazon.com/images/P/1586210661.0... http://images.amazon.com/images/P/1586210661.0...
1029310 275970 1586212230 0 Naked David Sedaris 2001 Time Warner Audio Major http://images.amazon.com/images/P/1586212230.0... http://images.amazon.com/images/P/1586212230.0... http://images.amazon.com/images/P/1586212230.0...

58586 rows × 10 columns

In [ ]:
pt=final_ratings.pivot_table(index='Book-Title',columns='User-ID',values='Book-Rating').fillna(0)
pt
Out[ ]:
User-ID 254 2276 2766 2977 3363 4017 4385 6251 6323 6543 ... 271705 273979 274004 274061 274301 274308 275970 277427 277639 278418
Book-Title
1984 9.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 10.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1st to Die: A Novel 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2nd Chance 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 Blondes 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
A Bend in the Road 0.0 0.0 7.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Year of Wonders 0.0 0.0 0.0 7.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
You Belong To Me 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Zoya 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
\O\" Is for Outlaw" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 0.0 0.0

706 rows × 810 columns

so there are 706 books and 810 readers

In [ ]:
!pip install scikit-learn
from sklearn.metrics.pairwise import cosine_similarity
Requirement already satisfied: scikit-learn in c:\python312\lib\site-packages (1.5.2)
Requirement already satisfied: numpy>=1.19.5 in c:\python312\lib\site-packages (from scikit-learn) (1.26.4)
Requirement already satisfied: scipy>=1.6.0 in c:\python312\lib\site-packages (from scikit-learn) (1.11.4)
Requirement already satisfied: joblib>=1.2.0 in c:\python312\lib\site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in c:\python312\lib\site-packages (from scikit-learn) (3.5.0)
In [ ]:
similarity_scores=cosine_similarity(pt)
similarity_scores
Out[ ]:
array([[1.        , 0.10255025, 0.01220856, ..., 0.12110367, 0.07347567,
        0.04316046],
       [0.10255025, 1.        , 0.2364573 , ..., 0.07446129, 0.16773875,
        0.14263397],
       [0.01220856, 0.2364573 , 1.        , ..., 0.04558758, 0.04938579,
        0.10796119],
       ...,
       [0.12110367, 0.07446129, 0.04558758, ..., 1.        , 0.07085128,
        0.0196177 ],
       [0.07347567, 0.16773875, 0.04938579, ..., 0.07085128, 1.        ,
        0.10602962],
       [0.04316046, 0.14263397, 0.10796119, ..., 0.0196177 , 0.10602962,
        1.        ]])

this means for every book we find cosine similarity mean the distance to every book so the array matrix would be 706x706 which means 1 book list will contain similarity score for 706 books

In [ ]:
similarity_scores.shape
Out[ ]:
(706, 706)
In [ ]:
# def recommend(book_name):
#     index=np.where(pt.index==book_name)[0][0]
#     similar_items=list(enumerate(similarity_scores[index]))
#     similar_items=sorted(similar_items,key=lambda x:x[1],reverse=True)[1:6]
    
#     data=[]
#     for item in similar_items:
#         data.append((pt.index[item[0]],item[1]))
        
#     return pd.DataFrame(data,columns=['Book-Title','similarity_score']
In [ ]:
def recommend(book_name):
    # index fetch
    index = np.where(pt.index==book_name)[0][0]
    similar_items = sorted(list(enumerate(similarity_scores[index])),key=lambda x:x[1],reverse=True)[1:5]
    
    data = []
    for i in similar_items:
        item = []
        temp_df = books[books['Book-Title'] == pt.index[i[0]]]
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Book-Title'].values))
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Book-Author'].values))
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Image-URL-M'].values))
        
        data.append(item)
    
    return data
In [ ]:
recommend('Message in a Bottle')
Out[ ]:
[['Nights in Rodanthe',
  'Nicholas Sparks',
  'http://images.amazon.com/images/P/0446531332.01.MZZZZZZZ.jpg'],
 ['The Mulberry Tree',
  'Jude Deveraux',
  'http://images.amazon.com/images/P/0743437640.01.MZZZZZZZ.jpg'],
 ['A Walk to Remember',
  'Nicholas Sparks',
  'http://images.amazon.com/images/P/0446608955.01.MZZZZZZZ.jpg'],
 ["River's End",
  'Nora Roberts',
  'http://images.amazon.com/images/P/0515127833.01.MZZZZZZZ.jpg']]
In [ ]:
import pickle 
pickle.dump(popular_df, open('popular.pkl', 'wb'))
In [ ]:
pickle.dump(pt, open('data/pt.pkl', 'wb'))
pickle.dump(similarity_scores, open('data/similarity_scores.pkl', 'wb'))
pickle.dump(books, open('data/books.pkl', 'wb'))