autofunc.find_similarities

Builds a similarity matrix with product IDs as the rows and headers and the similarity between each combination as the matrix value in that index. The diagonal is 1 because each product is 100% similar to itself.

Similarity here is defined as the percentage of components that two products have in common. The matrix is not symmetric because each product can have a different number of components.

Module Contents

Functions

find_similarities(input_dataframe)

Find the similarity between all products in a repository

autofunc.find_similarities.find_similarities(input_dataframe)

Find the similarity between all products in a repository

Parameters

input_dataframe (Pandas dataframe) – A Pandas dataframe with the product information

Returns

Returns a Pandas dataframe in an nxn matrix format with the similarity between each product

Return type

similarity_df