Twitter scraping exclude some data - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Twitter scraping exclude some data (/thread-4599.html) |
Twitter scraping exclude some data - Robbert - Aug-29-2017 Hello everyone! Im new here and im also new to python. Im eager to learn python because the possibilities are immense. Currently im working on a twitter streaming code, which I pasted in the code section below. Im wondering how I should exclude data from the streamer? 1. For instance, i want to check wether the 'status' or 'location' fields are not null. 2. I would like to exclude some fields. For instance, 'retweets'. If someone could explain how I'm supposed to program [1] en [2] then I would be very happy :) from tweepy import Stream from tweepy import OAuthHandler from tweepy.streaming import StreamListener import json # consumer key, consumer secret, access token, access secret. consumer_key = "xxx" consumer_secret = "xxx" access_token = "xxxx" access_token_secret = "xxxx" class StdOutlistener(StreamListener): def on_data(self, data): json_data = json.loads(data) print (json_data) # Open json text file to save the tweets with open('tweets.json', 'a') as tf: tf.write(data) return True def on_error(self, status): print(status) auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) twitterStream = Stream(auth, StdOutlistener()) twitterStream.filter(track=["Test"]) RE: Twitter scraping exclude some data - Robbert - Aug-31-2017 Can anyone help me? Is my question unclear>? RE: Twitter scraping exclude some data - nilamo - Aug-31-2017 Are those fields part of the json response you're receiving? RE: Twitter scraping exclude some data - Robbert - Aug-31-2017 (Aug-31-2017, 05:33 PM)nilamo Wrote: Are those fields part of the json response you're receiving? Yes. The json format contains all the data that is available. In this tutorial: http://adilmoujahid.com/posts/2014/07/twitter-analytics/ an overview is given of the data and Json output when no filters are applied. Literally everything is passing through and I would like to know whether it is possible to skip some fields. For instance; im not interesse in the fact that someone does have 20 followers or something. RE: Twitter scraping exclude some data - nilamo - Aug-31-2017 Here's a direct link to the StreamListener class from the tweepy module: https://github.com/tweepy/tweepy/blob/v3.5.0/tweepy/streaming.py#L30 You're currently using on_data(), which fires off for every single message. Have you tried using one of the more specific ones, like on_status()? RE: Twitter scraping exclude some data - Robbert - Sep-02-2017 (Aug-31-2017, 08:55 PM)nilamo Wrote: Here's a direct link to the StreamListener class from the tweepy module: https://github.com/tweepy/tweepy/blob/v3.5.0/tweepy/streaming.py#L30 Thanks for your reply and suggestion. I will have at the webpage you mentioned. No, i haven't tried on_status which would probably be better. But i have no idea how to use on_status in this particular script. Do you perhaps have a link for that to? RE: Twitter scraping exclude some data - nilamo - Sep-02-2017 You currently use on_data. replace the word "data" with "status", and it should run. |