After minths of putzing around with toy code and simple example scripts, I decided to get started on some real code I will actually use. Extending my file name filter collection, but in Python (instead of Perl).
Python, I see has a few gotchas, like being finicky about the backslash and requiring extras where I am not used to. \. doesnt seem to work to escape a dot! Need \\. And '.'+foo seemd to want to produce '.(space)/foo' for file operations.
One potential issue here is that these scripts are disastrous if used in the wrong directories.
In Perl I can force the scripts to run at a minimum of 3-4 levels deep in the directory structure. I dont know how to do this in Python (or how to count the bloody backslashes!). I can limit them to a specific directory, but that is not practical here, as these are run from a bunch of directories.
Is re the best library for regexes? In normal practice I often use them in place of split, but I dont see a simple way of doing that with re.sub.
Any pointers on fixing, cleaning up the code would be appreciated, as well as non-trivila links to doing some heavy lifting with real world regexes would be greatly appreciated. The basic tutorials are often so simplistic as to be confusing.
Until I get up to speed, I will be over-commenting with my scripts. A screwup can and has caused the loss of hundreds of files here.
I am over the 'hump' now. Python has passed the *acid test* and I do love the simplicity of file and directory access here. And parsing arrays like strings!
#!c:\python38\python38.exe """ This is designed to be a general filter for renaming files. All lines in FILTER section are meant to be replaceable for the situation at hand. DANGER! DANGER!!! WILL ROBINSON!!!!! This script is designed to be run ONLY in an isolated directory with select files. It *WILL* likely kill a system directory. """ import os import pathlib import time import re from pathlib import Path path = '.' for file in os.listdir(path): dir = [os.path.join(path, file)] # Directory Listing for filename in dir: # Each file - must be in list format else will parse as chars newname = filename # Keep original name if Path(newname).is_file(): # Only if file if re.match(".py\Z",newname) : break #dont do .py files base = os.path.splitext(newname) # split name[0] and extension[1] extension = base[1] #extension if extension == '.py' : break #dont do .py files newfile = base[0] #file name. Remove .extensions for now. ###################################### FILTERS newfile = re.sub("\\-", " ", newfile ) #substitutions here, with mutation newfile = re.sub(r'\d\d\d\d\d+', r' ',newfile) # Kill numbers greater than date newfile = re.sub(r'_', r' ',newfile) newfile = re.sub("(?!^)\\.", r" ",newfile) # Kill dots. Lookbehind to make sure not to delete dots at start of file. newfile = re.sub("\s+"," ",newfile) # Kill extra spaces ################################################################################################### newfilext = newfile+extension # Inelegant way to make sure equal strings match as newfilext cannot equal filename with .(space)\ at start a = newfilext a = re.sub(' ','', newfilext ) a.strip() b = re.sub(' ','', filename ) b.strip() # print(a, b) # For testing if a != b : print(a,b) # print(newfilext+filename) # For testing if a != b : # dont overwrite existing print(f"{filename} is being moved to {newfilext}" ) os.rename(filename,newfilext) # rename old to new # time.sleep(3) # For testingThe code works, albeit with some rough edges that need to be ironed out as this expands to a couple of hundred lines.
Python, I see has a few gotchas, like being finicky about the backslash and requiring extras where I am not used to. \. doesnt seem to work to escape a dot! Need \\. And '.'+foo seemd to want to produce '.(space)/foo' for file operations.
One potential issue here is that these scripts are disastrous if used in the wrong directories.
In Perl I can force the scripts to run at a minimum of 3-4 levels deep in the directory structure. I dont know how to do this in Python (or how to count the bloody backslashes!). I can limit them to a specific directory, but that is not practical here, as these are run from a bunch of directories.
Is re the best library for regexes? In normal practice I often use them in place of split, but I dont see a simple way of doing that with re.sub.
Any pointers on fixing, cleaning up the code would be appreciated, as well as non-trivila links to doing some heavy lifting with real world regexes would be greatly appreciated. The basic tutorials are often so simplistic as to be confusing.
Until I get up to speed, I will be over-commenting with my scripts. A screwup can and has caused the loss of hundreds of files here.
I am over the 'hump' now. Python has passed the *acid test* and I do love the simplicity of file and directory access here. And parsing arrays like strings!