How do I improve string similarity in my current code?

SUGSKY · May-27-2020, 08:10 AM

Here’s my current code:

new_list = []

for i in range(len(title1)):
    for j in range(len(title2)):
    r = []
    title_distance = fuzz.token_sort_ratio(title1[i], title2[j])
    if (title_distance > threshold):
        r.append(amazon_s['idAmazon'][i])
        r.append(google_s['idGoogleBase'][j])
        new_list.append(r)

df = pd.DataFrame(new_list)
df.to_csv('task.csv')

title1 here is a list of product title from amazon website
title2 is a list of product title from google website

id1 is an id corresponding to amazon website product
id2 is an id corresponding to google website product

Both of them have a list of product titles,
Case 1: Title is the same
Case 2: Title is similar

After sorting them out,
I would like to output them into an excel file using pandas,
which contain id1 and id2 if Case 1 and Case 2 are satisfied.

Any thoughts on this problem?

Calli · May-27-2020, 08:50 AM

You will need to import the pandas module first like import pandas as pd

SUGSKY · May-27-2020, 09:16 AM

(May-27-2020, 08:50 AM)Calli Wrote: You will need to import the pandas module first like import pandas as pd

Not the code that i need improving on, but the algorithm to solve this problem.
Right now:

Recall: 0.72307 out of 1
Precision: 0.77049 out of 1

recall = tp/(tp+fn)
precision = tp/(tp+fp)

True Positive Count: 94
False Positive Count: 28
False Negative Count: 36
True Negative Count: 22042

**deanhystad** · May-28-2020, 05:16 AM

Nice description of the fuzz functions here:

https://stackoverflow.com/questions/3180...-2-strings

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Similarity function for couple system	sunnydayxo	1	2,146	Apr-16-2021, 07:11 AM Last Post: MH90000
	How will you improve my code?	Gateux	4	2,462	Jul-20-2019, 12:55 PM Last Post: ndc85430
	fingerprint similarity audio microphone	alessandro87gatto	1	2,444	May-03-2019, 01:33 PM Last Post: alessandro87gatto
	Similarity network	Absolumentpasadrien	3	2,744	Apr-05-2019, 10:31 AM Last Post: DeaD_EyE
	Improve this code (Receipt alike)	Leonzxd	10	8,016	Jun-26-2018, 03:33 PM Last Post: Leonzxd

How do I improve string similarity in my current code?

User Panel Messages

Announcements