=====
Usage
=====

To use LightFM Dataset helper in a project

imports the module

.. code:: python

   from lightfm_dataset_helper.lightfm_dataset_helper import DatasetHelper

loading csv files

.. code:: python

   # using pandas to load csv files
   import pandas as pd

   def read_csv(filename):
       return pd.read_csv(filename, sep=";", error_bad_lines=False, encoding="latin-1", low_memory=False)

   books = read_csv("Data/BX-Books.csv")
   users = read_csv("Data/BX-Users.csv")
   ratings = read_csv("Data/BX-Book-Ratings.csv")

Columns Definitions

.. code:: python

   items_column = "ISBN"
   user_column = "User-ID"
   ratings_column = "Book-Rating"

   items_feature_columns = [
       "Book-Title",
       "Book-Author",
       "Year-Of-Publication",
       "Publisher",
   ]

   user_features_columns = ["Location", "Age"]

-  Optional\* for testing on small amount of data (500)

.. code:: python

   # just cutting down the amount of data to 500 for less time (making sure no missing data will be passed )
   Test_amount = 500
   ratings = ratings[:Test_amount]
   books = books[books[items_column].isin(ratings[items_column])]
   users = users[users[user_column].isin(ratings[user_column])]

feeding the dataframes to the helper and running the routine

.. code:: python

   dataset_helper_instance = DatasetHelper(
       users_dataframe=users,
       items_dataframe=books,
       interactions_dataframe=ratings,
       item_id_column=items_column,
       items_feature_columns=items_feature_columns,
       user_id_column=user_column,
       user_features_columns=user_features_columns,
       interaction_column=ratings_column,
       clean_unknown_interactions=True,
   )

   # run the routine
   dataset_helper_instance.routine()

feeding the dataset to the LightFM class

.. code:: python

   from lightfm import LightFM

   model = LightFM(no_components=24, loss="warp", k=15)
   model.fit(
       interactions=dataset_helper_instance.interactions,
       sample_weight=dataset_helper_instance.weights,
       item_features=dataset_helper_instance.item_features_list,
       user_features=dataset_helper_instance.user_features_list,
       verbose=True,
       epochs=10,
       num_threads=20,
   )

Used Dataset
------------

using books Dataset from `here`_

The Book-Crossing dataset comprises 3 tables.
::

   BX-Users
   Contains the users. Note that user IDs (`User-ID`) have been anonymized and map to integers. Demographic data is provided (`Location`, `Age`) if available. Otherwise, these fields contain NULL-values.

   BX-Books
   Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (`Book-Title`, `Book-Author`, `Year-Of-Publication`, `Publisher`), obtained from Amazon Web Services. Note that in case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavours (`Image-URL-S`, `Image-URL-M`, `Image-URL-L`), i.e., small, medium, large. These URLs point to the Amazon web site.

   BX-Book-Ratings
   Contains the book rating information. Ratings (`Book-Rating`) are either

.. _here: http://www2.informatik.uni-freiburg.de/~cziegler/BX/