How to summarize an article that is stored in a word document on your laptop? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How to summarize an article that is stored in a word document on your laptop? (/thread-40868.html) |
How to summarize an article that is stored in a word document on your laptop? - Mikedicenso87 - Oct-05-2023 So I am new here.... I wrote a code in pycharm that summarizes articles online. This is the code below: it works fine. what about if I want to summarize an article that is stored in a word document on my laptop? can somebody help me with the code? Again I am using Anaconda prompt and pycharm import tkinter as tk import nltk from textblob import TextBlob from newspaper import Article url = "https://www.news.com/index.html" article = Article(url) article.download() article.parse() article.nlp() print(f'Title: {article.title}') print(f'Authors: {article.authors}') print(f'Publication Date: {article.publish_date}') print(f'Summary: {article.summary}') RE: How to summarize an article that is stored in a word document on your laptop? - Pedroski55 - Oct-06-2023 Word documents have metadata information. You can access that, if that is what you are looking for. I think a word document will only have a title, author name, etc. if the author actually puts that data in the document metadata. For personal stuff, I don't think many people will do that. Maybe the publish date and modified date are recorded automatically. I copied this from stackoverflow # if you don't have it, first install python-docx module: pip3 install python-docx import docx path2file = "/home/pedro/myStuff/mydocument1.docx" def getMetaData(doc): metadata = {} prop = doc.core_properties metadata["author"] = prop.author metadata["category"] = prop.category metadata["comments"] = prop.comments metadata["content_status"] = prop.content_status metadata["created"] = prop.created metadata["identifier"] = prop.identifier metadata["keywords"] = prop.keywords metadata["last_modified_by"] = prop.last_modified_by metadata["language"] = prop.language metadata["modified"] = prop.modified metadata["subject"] = prop.subject metadata["title"] = prop.title metadata["version"] = prop.version return metadata doc = docx.Document(path2file) metadata_dict = getMetaData(doc) for item in metadata_dict.items(): print(item)Sometimes I want to get the text from .docx files. I never needed the metadata! RE: How to summarize an article that is stored in a word document on your laptop? - Mikedicenso87 - Oct-06-2023 This code basically pulls just high level information. I will try to write a new code and will post it it when done.. Thanks so much Pedro! |