Python Forum
Error while parsing tables from docx file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Error while parsing tables from docx file
#2
I see, does this make it better?
The input file:

My Document = some_doc.docx

import os
import re
from docx import Document

f = open("sometext.txt", "w")
def get_inputs():
    global my_doc, my_doc_file

    input_file = open('input.txt','r')                     #reading inputs text file
    names = input_file.readlines()
    input_file.close()
    for i in range(0,len(names)):
        if re.search(r'My document',names[i],re.IGNORECASE):
            pos = str(names[i]).rfind("=")
            my_doc = str(names[i][pos+1:]).strip()
    if os.stat(my_doc).st_size == 0:                       #checking the size of my_doc, if zero then message displays
        print("Empty Document! Please check and retry!")    
    my_doc_file = Document(my_doc)                         #reading the .docx file, throws error if it does not exist
    print(my_doc, my_doc_file)


def extract(my_doc):
    tlist = []
    tab_list = []
    #global my_doc, my_doc_file
    for table in my_doc.tables:                                        #looping through all tables in the .docx file
        if re.search("mystring", table.cell(0,1).text, re.IGNORECASE):            
            for row in table.rows:                                  #looping through all rows in the table under consideration
                for cell in row.cells:                                  #looping through all cells(grid cells are considered, not actual) in the row under consideration
                    tabdata = cell.text
                    tabdata = re.sub(r'\s+',"",tabdata)                     #cell.text is the text in each cell
                    tlist.append(tabdata)                               #appending to tlist(tlist is a list/array) a list of all the text in a row
                tab_list.append(tlist)                                  #list of all rows
                tlist = []
            f.write(tab_list)                                  
            tab_list=[]

def main():
    global my_doc, my_doc_file
    get_inputs()
    extract(my_doc_file)
    print("DONE!")

main()
and the error is:
Error:
Traceback (most recent call last): File "gen_scr.py", line 44, in <module> main() File "gen_scr.py", line 41, in main extract(my_doc_file) File "gen_scr.py", line 27, in extract if re.search("mystring", table.cell(0,1).text, re.IGNORECASE): File "$PYTHONPATH/python3.6/site-packages/docx/table.py", line 81, in cell return self._cells[cell_idx] IndexError: list index out of range
Reply


Messages In This Thread
RE: Error while parsing tables from docx file - by aditi - Jul-14-2020, 09:24 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  no module named 'docx' when importing docx MaartenRo 1 1,162 Dec-31-2023, 11:21 AM
Last Post: deanhystad
  Replace a text/word in docx file using Python Devan 4 4,062 Oct-17-2023, 06:03 PM
Last Post: Devan
Video doing data treatment on a file import-parsing a variable EmBeck87 15 3,119 Apr-17-2023, 06:54 PM
Last Post: EmBeck87
  Use module docx to get text from a file with a table Pedroski55 8 6,617 Aug-30-2022, 10:52 PM
Last Post: Pedroski55
  python-docx regex: replace any word in docx text Tmagpy 4 2,359 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,816 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  Parsing xml file deletes whitespaces. How to avoid it? Paqqno 0 1,093 Apr-01-2022, 10:20 PM
Last Post: Paqqno
  Parsing a syslog file ebolisa 11 4,357 Oct-10-2021, 05:15 PM
Last Post: snippsat
Thumbs Up Parsing a YAML file without changing the string content..?, Flask - solved. SpongeB0B 2 2,344 Aug-05-2021, 08:02 AM
Last Post: SpongeB0B
  Rename docx file from tuple gjack 2 2,284 Oct-20-2020, 05:33 PM
Last Post: gjack

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020