Python Forum
Naive Bayes too slow - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Naive Bayes too slow (/thread-597.html)

Pages: 1 2 3


Naive Bayes too slow - pythlang - Oct-21-2016

So, after fooling around with this algorithm I've noticed that it's entirely too slow since it's a learning kit, especially for analyzing large sets of data.

I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.

Can I use scikitlearn as a wrapper of some sort instead? 

That seems like it would be better equipped to deal with the problem.

Here's my code, feel free to make revisions in addition to helping me speed up the processing time:

import nltk
import random
from nltk.corpus import movie_reviews

documents = [(list(movie_reviews.words(fileid)), category)
            for category in movie_reviews.categories()
            for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

all_words = []
for w in movie_reviews.words():
    all_words.append(w.lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def find_features(document):
    words = set(document)
    features = {}
    for w in word_features:
        features[w] = (w in words)

    return features

print((find_features(movie_reviews.words('neg/cv000_29416.txt'))))

featuresets = [(find_features(rev), category) for (rev, category) in documents]

training_set = featuresets[:1900]
testing_set = featuresets[:1900:]

classifier = nltk.NaiveBayesClassifier.train(training_set)
print("Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, testing_set))*100)
classifier.show_most_informative_features(15)
Output:
[color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']False, u'effected': False, u'compared': False, u'nonetheless': False, u'deadly': False, u'purproses': False, u'lately': False, u'kerrigans': False, u'compares': False, u'details': False, u'behold': False, u'vulgarize': False, u'illusion': False, u'ponytail': False, u'rebelled': False, u'repeat': False, u'zhou': False, u'treason': False, u'allotting': False, u'impregnating': False, u'tinier': False, u'trunchbull': False, u'laude': False, u'exposure': False, u'searches': False, u'ustinov': False, u'disatisfaction': False, u'mishears': False, u'torrid': False, u'compete': False, u'lestat': False, u'villainous': False, u'searched': False, u'gardens': False, u'homerian': False}[/font][/size][/font][/size] [/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']('Naive Bayes Algo accuracy percent:', 87.78947368421053)[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']Most Informative Features[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']              insulting = True              neg : pos    =     10.6 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']                   sans = True              neg : pos    =      8.4 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']                wasting = True              neg : pos    =      8.4 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']           refreshingly = True              pos : neg    =      8.3 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']             mediocrity = True              neg : pos    =      7.7 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']              dismissed = True              pos : neg    =      7.0 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']            bruckheimer = True              neg : pos    =      6.3 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']              sumptuous = True              pos : neg    =      6.3 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']             cronenberg = True              pos : neg    =      6.3 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']                 fabric = True              pos : neg    =      6.3 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']                    ugh = True              neg : pos    =      5.8 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']                 doubts = True              pos : neg    =      5.8 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']                 bounce = True              neg : pos    =      5.7 : 1.0[/font][/size][/font][/size][/color] [color=#333333][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']                  wires = True              neg : pos    =      5.7 : 1.0[/font][/size][/font][/size][/color] [size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback'][color=#333333]                   wits = True              pos : neg    =      5.7 : 1.0[/color][/font][/size][/font][/size]



RE: Naive Bayes too slow - snippsat - Oct-21-2016

(Oct-21-2016, 09:22 PM)pythlang Wrote: I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.
What to mean bye long time,that code takes 9-sec for me.


RE: Naive Bayes too slow - pythlang - Oct-21-2016

(Oct-21-2016, 09:48 PM)snippsat Wrote:
(Oct-21-2016, 09:22 PM)pythlang Wrote: I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.
What to mean bye long time,that code takes 9-sec for me.

It takes like 5 minutes for me.

EDIT: What could be causing this to happen?


RE: Naive Bayes too slow - Ofnuts - Oct-21-2016

(Oct-21-2016, 09:52 PM)pythlang Wrote:
(Oct-21-2016, 09:48 PM)snippsat Wrote:
(Oct-21-2016, 09:22 PM)pythlang Wrote: I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.
What to mean bye long time,that code takes 9-sec for me.

It takes like 5 minutes for me.

EDIT: What could be causing this to happen?

Not enough memory causing swapping? See your process monitor displays....


RE: Naive Bayes too slow - pythlang - Oct-21-2016

(Oct-21-2016, 10:03 PM)Ofnuts Wrote:
(Oct-21-2016, 09:52 PM)pythlang Wrote:
(Oct-21-2016, 09:48 PM)snippsat Wrote:
(Oct-21-2016, 09:22 PM)pythlang Wrote: I want to be able to retain the function of Naive Bayes without the insane amount of time it takes to process.
What to mean bye long time,that code takes 9-sec for me.
It takes like 5 minutes for me. EDIT: What could be causing this to happen?
Not enough memory causing swapping? See your process monitor displays....

how would i be able to view/change this and what are pretty acceptable standards for these types of processes?


RE: Naive Bayes too slow - snippsat - Oct-21-2016

(Oct-21-2016, 09:52 PM)pythlang Wrote: EDIT: What could be causing this to happen?
You have downloaded all  NLTK data?
>>> import nltk
>>> nltk.download()
Quote:A new window should open, showing the NLTK Downloader.
Click on the File menu and select Change Download Directory.
For central installation, set this to C:\nltk_data (Windows),
/usr/local/share/nltk_data (Mac), or /usr/share/nltk_data (Unix).
Next, select the packages or collections you want to download.



RE: Naive Bayes too slow - pythlang - Oct-21-2016

(Oct-21-2016, 10:18 PM)snippsat Wrote:
(Oct-21-2016, 09:52 PM)pythlang Wrote: EDIT: What could be causing this to happen?
You have downloaded all  NLTK data?
>>> import nltk >>> nltk.download()
Quote: A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download Directory. For central installation, set this to C:\nltk_data (Windows), /usr/local/share/nltk_data (Mac), or /usr/share/nltk_data (Unix). Next, select the packages or collections you want to download.

Jordans-MBP:~ jordan$ which python
/usr/bin/python
Jordans-MBP:~ jordan$ python
Python 2.7.10 (default, Jul 30 2016, 18:31:42) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/Library/Python/2.7/site-packages/pip-8.1.2-py2.7.egg', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python27.zip', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-old', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload', '/Users/jordanXXX/Library/Python/2.7/lib/python/site-packages', '/Library/Python/2.7/site-packages', '/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python', '/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/PyObjC']
>>> quit()
Jordans-MBP:~ jordan$ python3
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 26 2016, 10:47:25) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk-3.2.1-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python35.zip', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages']
>>> 
>>> import nltk.data
path in nltk.data.path
>>> path in nltk.data.path
True
>>> import os, os.path
>>> path = os.path.expanduser('~/nltk_data')
>>> if not os.path.exists(path):
...     os.mkdir(path)
...     os.path.exists(path)
... 
>>> import nltk.data
>>> path in nltk.data.path
True
>>> 
as far as I know i've downloaded all the note data or otherwise i probably wouldn't be able to use these tools and would run into  something like this which has happened when i tried to use matplotlib for the first time.
Error:
no module named "X"
are the installed paths for python3 and nltk_data ok?


RE: Naive Bayes too slow - pythlang - Oct-22-2016

As I'm going along I have incurred a problem with scikit-learn. 

Can anyone shed some light on this as I have scoured Google to no avail with something that could help me that I can understand:

import nltk
import random
from nltk.corpus import movie_reviews
from nltk.classify.scikitlearn import SklearnClassifier
import pickle
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB


documents = [(list(movie_reviews.words(fileid)), category)
            for category in movie_reviews.categories()
            for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

all_words = []
for w in movie_reviews.words():
    all_words.append(w.lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def find_features(document):
    words = set(document)
    features = {}
    for w in word_features:
        features[w] = (w in words)

    return features

# print((find_features(movie_reviews.words('neg/cv000_29416.txt'))))

featuresets = [(find_features(rev), category) for (rev, category) in documents]

training_set = featuresets[:1900]
testing_set = featuresets[:1900:]

# classifier = nltk.NaiveBayesClassifier.train(training_set)

classifier_f = open("naivebayes.pickle", "rb")
classifier = pickle.load(classifier_f)
classifier_f.close()

print("Original Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, testing_set))*100)
classifier.show_most_informative_features(15)

# save_classifier = open("naivebayes.pickle", "wb")
# pickle.dump(classifier, save_classifier)
# save_classifier.close()

MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(training_set)
print("MNB_classifier accuracy percent:", (nltk.classify.accuracy(MNB_classifier, testing_set))*100)

GaussianNB_classifier = SklearnClassifier(GaussianNB())
GaussianNB_classifier.train(training_set)
print("GaussianNB_classifier:", (nltk.classify.accuracy(GaussianNB_classifier, testing_set))*100)

BernoulliNB_classifier = SklearnClassifier(BernoulliNB())
BernoulliNB_classifier.train(training_set)
print("BernoulliNB_classifier:", (nltk.classify.accuracy(BernoulliNB_classifier, testing_set))*100)
Error:
Traceback (most recent call last):  File "/Users/jordanXXX/Documents/NLP/scikitlearn", line 6, in <module>    from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB  File "/Library/Python/2.7/site-packages/sklearn/__init__.py", line 56, in <module>    from . import __check_build  File "/Library/Python/2.7/site-packages/sklearn/__check_build/__init__.py", line 46, in <module>    raise_build_error(e)  File "/Library/Python/2.7/site-packages/sklearn/__check_build/__init__.py", line 41, in raise_build_error    %s""" % (e, local_dir, ''.join(dir_content).strip(), msg)) ImportError: No module named _check_build ___________________________________________________________________________ Contents of /Library/Python/2.7/site-packages/sklearn/__check_build: __init__.py               __init__.pyc              __pycache__ _check_build.cpython-35m-darwin.sosetup.py ___________________________________________________________________________ It seems that scikit-learn has not been built correctly. If you have installed scikit-learn from source, please do not forget to build the package before using it: run `python setup.py install` or `make` in the source directory. If you have used an installer, please check that it is suited for your Python version, your operating system and your platform.
EDIT:  Wall Wall Wall Wall Wall Naughty Think Snooty Pray Doh


RE: Naive Bayes too slow - Larz60+ - Oct-22-2016

There was a build problem - see here for work around


RE: Naive Bayes too slow - pythlang - Oct-22-2016

(Oct-22-2016, 02:42 AM)Larz60+ Wrote: There was a build problem - see here for work around


thanks for replying,

I read that but still unsure of what it means or how to work around it. could you clarify?

is there no way to "rebuild" scikit-learn in the proper manner?

thanks.