local sensivity hashing. code that was done as a assignment
Requirments :
1.Python 2.7
Special notes :
1.100% similar inputs are removed as a preprosessing and and at the end of the proccess added to the output. Max 106 couples found at the current configuration.
3.Edit the configuration field in the script to change the configurations.
4.Configurations are set to achive maximum accuracy.
5.24 Random permutations of shingle list are used in opposed to the random hash functions to imitate random permutations of the Character Matrix.
Guidlines to change the Configurations:
1.If 'wordsForShingle' changed have to make changes inside shingle() function.
2.Make sure, noOfHashFunctions % sizeOfBand == 0.