Python Forum
How to summarize an article that is stored in a word document on your laptop? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to summarize an article that is stored in a word document on your laptop? (/thread-40868.html)



How to summarize an article that is stored in a word document on your laptop? - Mikedicenso87 - Oct-05-2023

So I am new here.... I wrote a code in pycharm that summarizes articles online. This is the code below: it works fine. what about if I want to summarize an article that is stored in a word document on my laptop? can somebody help me with the code? Again I am using Anaconda prompt and pycharm

import tkinter as tk
import nltk
from textblob import TextBlob
from newspaper import Article


url = "https://www.news.com/index.html"

article = Article(url)

article.download()
article.parse()

article.nlp()

print(f'Title: {article.title}')
print(f'Authors: {article.authors}')
print(f'Publication Date: {article.publish_date}')
print(f'Summary: {article.summary}')



RE: How to summarize an article that is stored in a word document on your laptop? - Pedroski55 - Oct-06-2023

Word documents have metadata information. You can access that, if that is what you are looking for.

I think a word document will only have a title, author name, etc. if the author actually puts that data in the document metadata.

For personal stuff, I don't think many people will do that.

Maybe the publish date and modified date are recorded automatically.

I copied this from stackoverflow

# if you don't have it, first install python-docx module: pip3 install python-docx
import docx

path2file = "/home/pedro/myStuff/mydocument1.docx"

def getMetaData(doc):
    metadata = {}
    prop = doc.core_properties
    metadata["author"] = prop.author
    metadata["category"] = prop.category
    metadata["comments"] = prop.comments
    metadata["content_status"] = prop.content_status
    metadata["created"] = prop.created
    metadata["identifier"] = prop.identifier
    metadata["keywords"] = prop.keywords
    metadata["last_modified_by"] = prop.last_modified_by
    metadata["language"] = prop.language
    metadata["modified"] = prop.modified
    metadata["subject"] = prop.subject
    metadata["title"] = prop.title
    metadata["version"] = prop.version
    return metadata

doc = docx.Document(path2file)
metadata_dict = getMetaData(doc)
for item in metadata_dict.items():
    print(item)
Sometimes I want to get the text from .docx files. I never needed the metadata!


RE: How to summarize an article that is stored in a word document on your laptop? - Mikedicenso87 - Oct-06-2023

This code basically pulls just high level information. I will try to write a new code and will post it it when done.. Thanks so much Pedro!