Match substring using regex - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Match substring using regex (/thread-37750.html) |
Match substring using regex - Pavel_47 - Jul-17-2022 Hello, Here is case that doesn't work: import re video_info = 'Réalisation :\nPierre Lazarus\nScénario :\nEmmanuelle Moreau\nNoémie Parreaux\nProduction :\nFilmakademie Baden-Württemberg\nSWR\nARTE\nProducteur/-trice :\nGiacomo Vernetti Prot\nJennifer Miola\nImage :\nHovig Hagopian\nMontage :\nMathieu Pluquet\nMusique :\nLouis-Ronan Choisy' realisation = re.search(r'Réalisation :\n(.*?)\n', video_info).group(1) scenario = re.search(r'Scénario :\n(.*?) :\n', video_info).group(1) print(realisation) print(scenario)In this example realisation - Ok, scenario - failded. Any suggestions ? Thanks. RE: Match substring using regex - Pavel_47 - Jul-17-2022 I've just tried to test regex on https://regex101.com/ with this (although non-ideal) expression I can select what I want Surprisingly if I use the same expression in Python re.search, it doesn't work. Any comments ? RE: Match substring using regex - snippsat - Jul-17-2022 Like this. import re video_info = 'Réalisation :\nPierre Lazarus\nScénario :\nEmmanuelle Moreau\nNoémie Parreaux\nProduction :\nFilmakademie Baden-Württemberg\nSWR\nARTE\nProducteur/-trice :\nGiacomo Vernetti Prot\nJennifer Miola\nImage :\nHovig Hagopian\nMontage :\nMathieu Pluquet\nMusique :\nLouis-Ronan Choisy' realisation = re.search(r'Réalisation :\n(.*?)\n', video_info).group(1) scenario = re.search(r"Scénario :\n(.*?)\n", video_info).group(1) print(realisation) print(scenario) As this data is from your last Thread ,can also parse element directly then no need for regex.>>> content.find_element(By.CSS_SELECTOR, 'div.css-m3r1o3 > div.css-vhqfin > p:nth-child(1)').text 'Réalisation :' >>> content.find_element(By.CSS_SELECTOR, 'div.css-m3r1o3 > div.css-vhqfin > ul:nth-child(2)').text 'Pierre Lazarus' RE: Match substring using regex - Pavel_47 - Jul-17-2022 Quote:scenario = re.search(r"Scénario :\n(.*?)\n", video_info).group(1)No. What I'm looking for is to get Emmanuelle Moreau, Noémie Parreaux ... i.e. what is located between Scénario : and next template that looks like something : RE: Match substring using regex - bowlofred - Jul-18-2022 I think your regex101 test is succeeding because you have pasted the python string in as the example text. But the paste is using the literal characters \n , while the python string is a newline. Instead you should print that string (so it appears as multiple lines) and then paste those individual lines into the form.The dot does not (normally) match a newline, so the (.*?) can't capture the newlines separating the names you want. You can turn that on by setting DOTALL in the regex.You'll still have the problem of separating out the next template from the names, but at least you can get the capture to happen. import re video_info = 'Réalisation :\nPierre Lazarus\nScénario :\nEmmanuelle Moreau\nNoémie Parreaux\nProduction :\nFilmakademie Baden-Württemberg\nSWR\nARTE\nProducteur/-trice :\nGiacomo Vernetti Prot\nJennifer Miola\nImage :\nHovig Hagopian\nMontage :\nMathieu Pluquet\nMusique :\nLouis-Ronan Choisy' realisation = re.search(r'Réalisation :\n(.*?)\n', video_info).group(1) scenario = re.search(r'(?s)Scénario :\n(.*?) :\n', video_info).group(1) print(realisation) print(scenario)
RE: Match substring using regex - Pavel_47 - Jul-18-2022 Thanks, I've tried your suggestion with Python. Works ... altough it doesn't match original request, i.e. scenario should return only Emmanuelle Moreau, Noémie Parreaux (without Production). BTW 2nd re doesn't work in regex101. RE: Match substring using regex - Pavel_47 - Jul-18-2022 BTW, here is a solution how to solve problem using ordinary Python staff (sure, I'm aware that is probably far from optimal ) video_info = 'Réalisation :\nPierre Lazarus\nScénario :\nEmmanuelle Moreau\nNoémie Parreaux\nProduction :\nFilmakademie Baden-Württemberg\nSWR\nARTE\nProducteur/-trice :\nGiacomo Vernetti Prot\nJennifer Miola\nImage :\nHovig Hagopian\nMontage :\nMathieu Pluquet\nMusique :\nLouis-Ronan Choisy' video_info_list = video_info.split('\n') list_keys = [] list_values = [] for item in video_info_list: if item[-1] == ':': list_keys.append(item[:-2]) new_key = True else: if new_key: list_values.append(item) else: list_values[-1] = list_values[-1] + ', ' + item new_key = False video_dict = dict(zip(list_keys, list_values)) for k, v in video_dict.items(): print(f"{k:<20}{v}") |