Hi all,
I am currently using Python3.6 and I am using the python-docx package to parse document files:
The code:
from docx import Document
my_doc = Document(doc)
def extract(my_doc,w1,w2):
tabdata = []
for table in my_doc.tables: #looping through all tables in the .docx file
if re.search("My String", table.cell(0,1).text, re.IGNORECASE): # table
for row in table.rows: #looping through all rows in the table under consideration
for cell in row.cells:
tabdata = cell.text
For multiple document files, I am facing different errors for the same code. This document contains a combination of text and tables, and I am trying to parse just the tables.
For certain document files, I am able to parse the file when it contains both text and tables.
But for certain other files this error shows up.
All the files are similar and contain the keywords and tables I am searching for using the re.search() function. All the tables in the different files have equal number of rows and columns.
The error doesn’t show up if the document file contains only the tables and no other text/paragraphs.
I am unsure if this issue lies with a corrupted docx file, the docx file contains characters not parsed by my script or if I am missing some part in the script.
The error I am facing:
Traceback (most recent call last):
File "my_python_script.py", line 579, in main_1
extract(mld,w1,w2)
File "my_python_script.py", line 128, in extract
if re.search("My String", table.cell(0,1).text, re.IGNORECASE):
File $PYTHONPATH/python3.6/site-packages/docx/table.py", line 81, in cell
return self._cells[cell_idx]
IndexError: list index out of range
Any help on this issue would be much appreciated!
I am currently using Python3.6 and I am using the python-docx package to parse document files:
The code:
from docx import Document
my_doc = Document(doc)
def extract(my_doc,w1,w2):
tabdata = []
for table in my_doc.tables: #looping through all tables in the .docx file
if re.search("My String", table.cell(0,1).text, re.IGNORECASE): # table
for row in table.rows: #looping through all rows in the table under consideration
for cell in row.cells:
tabdata = cell.text
For multiple document files, I am facing different errors for the same code. This document contains a combination of text and tables, and I am trying to parse just the tables.
For certain document files, I am able to parse the file when it contains both text and tables.
But for certain other files this error shows up.
All the files are similar and contain the keywords and tables I am searching for using the re.search() function. All the tables in the different files have equal number of rows and columns.
The error doesn’t show up if the document file contains only the tables and no other text/paragraphs.
I am unsure if this issue lies with a corrupted docx file, the docx file contains characters not parsed by my script or if I am missing some part in the script.
The error I am facing:
Traceback (most recent call last):
File "my_python_script.py", line 579, in main_1
extract(mld,w1,w2)
File "my_python_script.py", line 128, in extract
if re.search("My String", table.cell(0,1).text, re.IGNORECASE):
File $PYTHONPATH/python3.6/site-packages/docx/table.py", line 81, in cell
return self._cells[cell_idx]
IndexError: list index out of range
Any help on this issue would be much appreciated!