Naive Bayes too slow - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Naive Bayes too slow (/thread-597.html) |
Naive Bayes too slow - pythlang - Oct-21-2016 So, after fooling around with this algorithm I've noticed that it's entirely too slow since it's a learning kit, especially for analyzing large sets of data. I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process. Can I use scikitlearn as a wrapper of some sort instead? That seems like it would be better equipped to deal with the problem. Here's my code, feel free to make revisions in addition to helping me speed up the processing time: import nltk import random from nltk.corpus import movie_reviews documents = [(list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)] random.shuffle(documents) all_words = [] for w in movie_reviews.words(): all_words.append(w.lower()) all_words = nltk.FreqDist(all_words) word_features = list(all_words.keys())[:3000] def find_features(document): words = set(document) features = {} for w in word_features: features[w] = (w in words) return features print((find_features(movie_reviews.words('neg/cv000_29416.txt')))) featuresets = [(find_features(rev), category) for (rev, category) in documents] training_set = featuresets[:1900] testing_set = featuresets[:1900:] classifier = nltk.NaiveBayesClassifier.train(training_set) print("Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, testing_set))*100) classifier.show_most_informative_features(15)
RE: Naive Bayes too slow - snippsat - Oct-21-2016 (Oct-21-2016, 09:22 PM)pythlang Wrote: I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.What to mean bye long time,that code takes 9-sec for me. RE: Naive Bayes too slow - pythlang - Oct-21-2016 (Oct-21-2016, 09:48 PM)snippsat Wrote:(Oct-21-2016, 09:22 PM)pythlang Wrote: I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.What to mean bye long time,that code takes 9-sec for me. It takes like 5 minutes for me. EDIT: What could be causing this to happen? RE: Naive Bayes too slow - Ofnuts - Oct-21-2016 (Oct-21-2016, 09:52 PM)pythlang Wrote:(Oct-21-2016, 09:48 PM)snippsat Wrote:(Oct-21-2016, 09:22 PM)pythlang Wrote: I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.What to mean bye long time,that code takes 9-sec for me. Not enough memory causing swapping? See your process monitor displays.... RE: Naive Bayes too slow - pythlang - Oct-21-2016 (Oct-21-2016, 10:03 PM)Ofnuts Wrote:(Oct-21-2016, 09:52 PM)pythlang Wrote:Not enough memory causing swapping? See your process monitor displays....(Oct-21-2016, 09:48 PM)snippsat Wrote:It takes like 5 minutes for me. EDIT: What could be causing this to happen?(Oct-21-2016, 09:22 PM)pythlang Wrote: I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.What to mean bye long time,that code takes 9-sec for me. how would i be able to view/change this and what are pretty acceptable standards for these types of processes? RE: Naive Bayes too slow - snippsat - Oct-21-2016 (Oct-21-2016, 09:52 PM)pythlang Wrote: EDIT: What could be causing this to happen?You have downloaded all NLTK data? >>> import nltk >>> nltk.download() Quote:A new window should open, showing the NLTK Downloader. RE: Naive Bayes too slow - pythlang - Oct-21-2016 (Oct-21-2016, 10:18 PM)snippsat Wrote:(Oct-21-2016, 09:52 PM)pythlang Wrote: EDIT: What could be causing this to happen?You have downloaded all NLTK data?>>> import nltk >>> nltk.download()Quote: A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download Directory. For central installation, set this to C:\nltk_data (Windows), /usr/local/share/nltk_data (Mac), or /usr/share/nltk_data (Unix). Next, select the packages or collections you want to download. Jordans-MBP:~ jordan$ which python /usr/bin/python Jordans-MBP:~ jordan$ python Python 2.7.10 (default, Jul 30 2016, 18:31:42) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/Library/Python/2.7/site-packages/pip-8.1.2-py2.7.egg', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python27.zip', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-old', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload', '/Users/jordanXXX/Library/Python/2.7/lib/python/site-packages', '/Library/Python/2.7/site-packages', '/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python', '/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/PyObjC'] >>> quit() Jordans-MBP:~ jordan$ python3 Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 26 2016, 10:47:25) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk-3.2.1-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python35.zip', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages'] >>> >>> import nltk.data path in nltk.data.path >>> path in nltk.data.path True >>> import os, os.path >>> path = os.path.expanduser('~/nltk_data') >>> if not os.path.exists(path): ... os.mkdir(path) ... os.path.exists(path) ... >>> import nltk.data >>> path in nltk.data.path True >>>as far as I know i've downloaded all the note data or otherwise i probably wouldn't be able to use these tools and would run into something like this which has happened when i tried to use matplotlib for the first time. are the installed paths for python3 and nltk_data ok?
RE: Naive Bayes too slow - pythlang - Oct-22-2016 As I'm going along I have incurred a problem with scikit-learn. Can anyone shed some light on this as I have scoured Google to no avail with something that could help me that I can understand: import nltk import random from nltk.corpus import movie_reviews from nltk.classify.scikitlearn import SklearnClassifier import pickle from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB documents = [(list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)] random.shuffle(documents) all_words = [] for w in movie_reviews.words(): all_words.append(w.lower()) all_words = nltk.FreqDist(all_words) word_features = list(all_words.keys())[:3000] def find_features(document): words = set(document) features = {} for w in word_features: features[w] = (w in words) return features # print((find_features(movie_reviews.words('neg/cv000_29416.txt')))) featuresets = [(find_features(rev), category) for (rev, category) in documents] training_set = featuresets[:1900] testing_set = featuresets[:1900:] # classifier = nltk.NaiveBayesClassifier.train(training_set) classifier_f = open("naivebayes.pickle", "rb") classifier = pickle.load(classifier_f) classifier_f.close() print("Original Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, testing_set))*100) classifier.show_most_informative_features(15) # save_classifier = open("naivebayes.pickle", "wb") # pickle.dump(classifier, save_classifier) # save_classifier.close() MNB_classifier = SklearnClassifier(MultinomialNB()) MNB_classifier.train(training_set) print("MNB_classifier accuracy percent:", (nltk.classify.accuracy(MNB_classifier, testing_set))*100) GaussianNB_classifier = SklearnClassifier(GaussianNB()) GaussianNB_classifier.train(training_set) print("GaussianNB_classifier:", (nltk.classify.accuracy(GaussianNB_classifier, testing_set))*100) BernoulliNB_classifier = SklearnClassifier(BernoulliNB()) BernoulliNB_classifier.train(training_set) print("BernoulliNB_classifier:", (nltk.classify.accuracy(BernoulliNB_classifier, testing_set))*100) EDIT:
RE: Naive Bayes too slow - Larz60+ - Oct-22-2016 There was a build problem - see here for work around RE: Naive Bayes too slow - pythlang - Oct-22-2016 (Oct-22-2016, 02:42 AM)Larz60+ Wrote: There was a build problem - see here for work around thanks for replying, I read that but still unsure of what it means or how to work around it. could you clarify? is there no way to "rebuild" scikit-learn in the proper manner? thanks. |