Python Forum
Same Data Showing Several Times With Beautifulsoup Query - Printable Version

+- Python Forum (
+-- Forum: Python Coding (
+--- Forum: General Coding Help (
+--- Thread: Same Data Showing Several Times With Beautifulsoup Query (/thread-37342.html)

Same Data Showing Several Times With Beautifulsoup Query - eddywinch82 - May-29-2022

Hi there,

I have the following Python Code :-

import pandas as pd
import requests
import numpy as np
from bs4 import BeautifulSoup
import xlrd
import re

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

res3 = requests.get("")     
soup3 = BeautifulSoup(res3.content,'lxml')

BBMF_2022 = []

#BBMF_elem = soup3.find_all('a', string=re.compile(r'between|Flypast'))

for item in soup3.find_all('a', string=re.compile(r'between|Flypast')):
    li1 = item.find_parent().text
    #li2 = li1.find_previous().font

#check if links are in dataframe
#df = pd.DataFrame(BBMF_2022, columns=['BBMF_2022'])

The issue I have is when I run the Code, the Data is printed for 15 Entries from May 28th to May 29th, several times,
I am not sure why that is the case ? Could someone suggest for me the reason why ? And tell me what I need to change in the Code, so
that that Data is printed only once and not several times ? I have tried to Scrape Data from a Website, where entries contain the word between or Flypast.

When I use the following piece of Code instead :-

for item in soup3.find_all('a', string=re.compile(r'between|Flypast')):
    li1 = item.find_parent().text
    #li2 = li1.find_previous().font
 df = pd.DataFrame(BBMF_2022, columns=['BBMF_2022'])

The first entry for the 28th May, is printed out in the DataFrame 15 times ! instead of 15 seperate Entries I mentioned before.

Any help would be much appreciated.

Best Regards

Eddie Winch ))

RE: Same Data Showing Several Times With Beautifulsoup Query - Larz60+ - May-29-2022

You are using a redirected url, instead use: ?

This code will get all data and save as a json file, without any filtering. You can add filters, and any other data you need
import requests
from bs4 import BeautifulSoup
import os
import json
import sys

class airshowdata:
    def __init__(self):
        self.airshow_details = {} = CreateDict()
        self.jsonfile = 'airshow.json'

    def get_links(self):
        url = ''

        res3 = requests.get(url)
        if res3.status_code == 200:
            soup3 = BeautifulSoup(res3.content,'lxml')
            print(f"Cannot load page {url}")

        links = soup3.find_all('a')
        for link in links:
            anode =, link.text.strip())
  , 'url', link.get('href'))

        with open(self.jsonfile, 'w') as fp:
            json.dump(self.airshow_details, fp)

        # following not needed and can be removed (displays dictionary contents)

class CreateDict:
    """ - Contains methods to simplify node and cell creation within
                    a dictionary

        new_dict(dictname) - Creates a new dictionary instance with the name
            contained in dictname

        add_node(parent, nodename) - Creates a new node (nested dictionary)
            named in nodename, in parent dictionary.

        add_cell(nodename, cellname, value) - Creates a leaf node within node
            named in nodename, with a cell name of cellname, and value of value.

        display_dict(dictname) - Recursively displays a nested dictionary.

        Python standard library:
    Author: Larz60+  -- May 2019.
    def __init__(self):

    def new_dict(self, dictname):
        setattr(self, dictname, {})

    def add_node(self, parent, nodename):
        node = parent[nodename] = {}
        return node

    def add_cell(self, nodename, cellname, value):
        cell =  nodename[cellname] = value
        return cell

    def display_dict(self, dictname, level=0):
        indent = " " * (4 * level)
        for key, value in dictname.items():
            if isinstance(value, dict):
                level += 1
                self.display_dict(value, level)
                print(f'{indent}{key}: {value}')
            if level > 0:
                level -= 1

def main():
    airs = airshowdata()

if __name__ == '__main__':

RE: Same Data Showing Several Times With Beautifulsoup Query - eddywinch82 - May-29-2022

Many thanks for that Code Larz60+, its very much appreciated by me, thankyou for taking the time to type it. I chose
the web.archive link, because the Data is from a week ago, from that Website, the 21st May Data was removed from the Website the other day.

Does anyone have any idea, how I can change my Code, to solve the issue I am having with it ?

Any help would be very much appreciated.


Eddie Winch ))