Python Forum
Twitter scraping exclude some data - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Twitter scraping exclude some data (/thread-4599.html)



Twitter scraping exclude some data - Robbert - Aug-29-2017

Hello everyone! Im new here and im also new to python. Im eager to learn python because the possibilities are immense. Currently im working on a twitter streaming code, which I pasted in the code section below.
Im wondering how I should exclude data from the streamer?
1. For instance, i want to check wether the 'status' or 'location' fields are not null.
2. I would like to exclude some fields. For instance, 'retweets'.

If someone could explain how I'm supposed to program [1] en [2] then I would be very happy :)

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json

# consumer key, consumer secret, access token, access secret.
consumer_key = "xxx"
consumer_secret = "xxx"
access_token = "xxxx"
access_token_secret = "xxxx"


class StdOutlistener(StreamListener):
    def on_data(self, data):
        json_data = json.loads(data)
        print (json_data)

        # Open json text file to save the tweets
        with open('tweets.json', 'a') as tf:
            tf.write(data)
        return True

    def on_error(self, status):

        print(status)


auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

twitterStream = Stream(auth, StdOutlistener())
twitterStream.filter(track=["Test"])



RE: Twitter scraping exclude some data - Robbert - Aug-31-2017

Can anyone help me?
Is my question unclear>?


RE: Twitter scraping exclude some data - nilamo - Aug-31-2017

Are those fields part of the json response you're receiving?


RE: Twitter scraping exclude some data - Robbert - Aug-31-2017

(Aug-31-2017, 05:33 PM)nilamo Wrote: Are those fields part of the json response you're receiving?

Yes. The json format contains all the data that is available.
In this tutorial: http://adilmoujahid.com/posts/2014/07/twitter-analytics/ an overview is given of the data and Json output when no filters are applied. Literally everything is passing through and I would like to know whether it is possible to skip some fields. For instance; im not interesse in the fact that someone does have 20 followers or something.


RE: Twitter scraping exclude some data - nilamo - Aug-31-2017

Here's a direct link to the StreamListener class from the tweepy module: https://github.com/tweepy/tweepy/blob/v3.5.0/tweepy/streaming.py#L30

You're currently using on_data(), which fires off for every single message.  Have you tried using one of the more specific ones, like on_status()?


RE: Twitter scraping exclude some data - Robbert - Sep-02-2017

(Aug-31-2017, 08:55 PM)nilamo Wrote: Here's a direct link to the StreamListener class from the tweepy module: https://github.com/tweepy/tweepy/blob/v3.5.0/tweepy/streaming.py#L30

You're currently using on_data(), which fires off for every single message.  Have you tried using one of the more specific ones, like on_status()?

Thanks for your reply and suggestion. I will have at the webpage you mentioned.
No, i haven't tried on_status which would probably be better. But i have no idea how to use on_status in this particular script.

Do you perhaps have a link for that to?


RE: Twitter scraping exclude some data - nilamo - Sep-02-2017

You currently use on_data.  replace the word "data" with "status", and it should run.