Hello,
I need to loop through a list of URLs to grab each page's title, which might contain a substring I want to ignore.
For some reason, the substring isn't removed:
Thank you.
I need to loop through a list of URLs to grab each page's title, which might contain a substring I want to ignore.
For some reason, the substring isn't removed:
with open('list.txt") as f: for line in f: print(line.replace('\n', '')) n = requests.get(line) al = n.text #Doesn't remove possible ( - dummy)? d = re.search('<\W*title\W*(.*)( - dummy)?</title', al, re.IGNORECASE) title = html.unescape(d.group(1)) print(title)How is my regex wrong?
Thank you.