Dec-09-2022, 08:00 PM
Hi. I'm trying to create an excel sheet using data made up from specific, recurring blocks of text in a .txt file and ignoring all other data in the file (table of contents, etc). The blocks of text are in the format shown here and I'm having two different issues with the output in the final excel sheet.
Currently, my code is using a dictionary made by searching for a block pattern, inserting it into a data-frame (pandas) and exporting to excel. The issue I'm having, however, is that it creates a new row for each line in the text block rather than creating a new row for each block. Shown in the attached image.
How can I prevent this with my block pattern?
The second issue is that the top item ("ITEMNAME") is not in the same format as it doesn't have a "key" to use in the dictionary. How would I include it in the block pattern to add to the dictionary and export being in a different format?
Code:
First change:
>
Second Change:
I've been scouring the internet for solutions and have really had no luck. Thanks in advance for any assistance.
Quote:ItemName
Status: StatusItem
Category: CategoryItem
Sub-Category: SubcategoryItem
Date: MM/dd/yyyy
Frequency: FrequenceItem
Language: LanguageItem
Description: DescriptionText
Currently, my code is using a dictionary made by searching for a block pattern, inserting it into a data-frame (pandas) and exporting to excel. The issue I'm having, however, is that it creates a new row for each line in the text block rather than creating a new row for each block. Shown in the attached image.
How can I prevent this with my block pattern?
The second issue is that the top item ("ITEMNAME") is not in the same format as it doesn't have a "key" to use in the dictionary. How would I include it in the block pattern to add to the dictionary and export being in a different format?
Code:
import pandas as pd import re txtFilePath = r'/test.txt' with open(txtFilePath, 'r') as f: text = f.read() # define pattern for block search - first method block_pattern = re.compile(r"^(?:Status|Event Category|Event Sub-Category|" r"Added Date|Update Frequency|Language|Description):.+$", re.MULTILINE) # find all blocks that match the pattern blocks = block_pattern.findall(text) df = pd.DataFrame() # columns=["Status", "Category", "Sub-Category", "Date", "Frequency", "Language", "Description"] # iterate over blocks and extract patterns for block in blocks: # split blocks on individual lines lines = block.strip().split("\n") # dict to store items for this block item_dict = {} # iterate to find type and values for line in lines: item_parts = line.strip().split(":", 1) item_type = item_parts[0].strip() item_value = item_parts[1].strip() # add item to dict item_dict[item_type] = item_value df = df.append(item_dict, ignore_index=True) # print(df) df.to_excel(r'/text.xlsx')I've tried two changes to the append line to fix the extra rows.
First change:
df.loc[0] = df.append(item_dict, ignore_index=True)but it gives this error:
>
Quote:ValueError: cannot set row with mismatched columns
Second Change:
df.loc[0] = item_dictThis change creates the columns but imports no data from the dictionary. The excel sheet has columns but nothing else.
I've been scouring the internet for solutions and have really had no luck. Thanks in advance for any assistance.