Error while parsing tables from docx file

aditi · (This post was last modified: Jul-14-2020, 09:26 PM by aditi.)

I see, does this make it better?
The input file:

My Document = some_doc.docx

import os
import re
from docx import Document

f = open("sometext.txt", "w")
def get_inputs():
    global my_doc, my_doc_file

    input_file = open('input.txt','r')                     #reading inputs text file
    names = input_file.readlines()
    input_file.close()
    for i in range(0,len(names)):
        if re.search(r'My document',names[i],re.IGNORECASE):
            pos = str(names[i]).rfind("=")
            my_doc = str(names[i][pos+1:]).strip()
    if os.stat(my_doc).st_size == 0:                       #checking the size of my_doc, if zero then message displays
        print("Empty Document! Please check and retry!")    
    my_doc_file = Document(my_doc)                         #reading the .docx file, throws error if it does not exist
    print(my_doc, my_doc_file)


def extract(my_doc):
    tlist = []
    tab_list = []
    #global my_doc, my_doc_file
    for table in my_doc.tables:                                        #looping through all tables in the .docx file
        if re.search("mystring", table.cell(0,1).text, re.IGNORECASE):            
            for row in table.rows:                                  #looping through all rows in the table under consideration
                for cell in row.cells:                                  #looping through all cells(grid cells are considered, not actual) in the row under consideration
                    tabdata = cell.text
                    tabdata = re.sub(r'\s+',"",tabdata)                     #cell.text is the text in each cell
                    tlist.append(tabdata)                               #appending to tlist(tlist is a list/array) a list of all the text in a row
                tab_list.append(tlist)                                  #list of all rows
                tlist = []
            f.write(tab_list)                                  
            tab_list=[]

def main():
    global my_doc, my_doc_file
    get_inputs()
    extract(my_doc_file)
    print("DONE!")

main()

and the error is:

Error:Traceback (most recent call last):
  File "gen_scr.py", line 44, in <module>
    main()
  File "gen_scr.py", line 41, in main
    extract(my_doc_file)
  File "gen_scr.py", line 27, in extract
    if re.search("mystring", table.cell(0,1).text, re.IGNORECASE):            
  File "$PYTHONPATH/python3.6/site-packages/docx/table.py", line 81, in cell
    return self._cells[cell_idx]
IndexError: list index out of range

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	no module named 'docx' when importing docx	MaartenRo	1	1,162	Dec-31-2023, 11:21 AM Last Post: deanhystad
	Replace a text/word in docx file using Python	Devan	4	4,062	Oct-17-2023, 06:03 PM Last Post: Devan
	doing data treatment on a file import-parsing a variable	EmBeck87	15	3,119	Apr-17-2023, 06:54 PM Last Post: EmBeck87
	Use module docx to get text from a file with a table	Pedroski55	8	6,617	Aug-30-2022, 10:52 PM Last Post: Pedroski55
	python-docx regex: replace any word in docx text	Tmagpy	4	2,359	Jun-18-2022, 09:12 AM Last Post: Tmagpy
	Modify values in XML file by data from text file (without parsing)	Paqqno	2	1,816	Apr-13-2022, 06:02 AM Last Post: Paqqno
	Parsing xml file deletes whitespaces. How to avoid it?	Paqqno	0	1,093	Apr-01-2022, 10:20 PM Last Post: Paqqno
	Parsing a syslog file	ebolisa	11	4,357	Oct-10-2021, 05:15 PM Last Post: snippsat
	Parsing a YAML file without changing the string content..?, Flask - solved.	SpongeB0B	2	2,344	Aug-05-2021, 08:02 AM Last Post: SpongeB0B
	Rename docx file from tuple	gjack	2	2,284	Oct-20-2020, 05:33 PM Last Post: gjack

Error while parsing tables from docx file

User Panel Messages

Announcements