Python Forum
Failing regex - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Failing regex (/thread-37966.html)



Failing regex - tester_V - Aug-15-2022

Greetings!
I'm trying to find strings with a particular word(s).
Here are the words:
BG13TPPVxxxx -- where xxxx are the 4 digits
BG13TPPVxxxx(B or C) -- and the letters "B" or "C" at the end

I got this regex but it is failing, it also picks up words with SPPV.
Like this:
BG13SPPVxxxx

if re.search('^[a-zA-Z]{2}\d{2}TPPV\d{4}\.|[b|C]\.',x) 
Thank you!


RE: Failing regex - snippsat - Aug-15-2022

This should work.
>>> import re
>>> 
>>> s = 'BG13TPPV1234B'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
'BG13TPPV1234B'
>>> 
>>> s = 'BG13TPPV9999C'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
'BG13TPPV9999C'
>>> 
>>> s = 'BG13SPPV1245B'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'



RE: Failing regex - tester_V - Aug-15-2022

I never thought about it as three different search words!
Let's I'll try it...

Thank you!


RE: Failing regex - deanhystad - Aug-16-2022

This kind of works.
import re

text = ".BG13TPPV123..BG13TPPV2345...BG13TPPV3456B...BG13TPPV4567C...BG13TPPV7890D"

print(re.findall(r"\w+\d{2}TPPV\d{4}[BC]?", text))
Output:
['BG13TPPV2345', 'BG13TPPV3456B', 'BG13TPPV4567C', 'BG13TPPV7890']
Notice that it matches part of "BG13TPPV7890D" because "BG13TPPV7890" is a match to the pattern.

A stricter match is possible if we are willing to specify the character that follows the string. This pattern says the string must be followed by something that is not normally part of a word (whitespace, punctuation).
import re

text = ".BG13TPPV123..BG13TPPV2345...BG13TPPV3456B...BG13TPPV4567C...BG13TPPV7890D."

print(re.findall(r"(\w+\d{2}TPPV\d{4}[BC]?)\W", text))
Output:
['BG13TPPV2345', 'BG13TPPV3456B', 'BG13TPPV4567C']