HTML Decoder pandas dataframe column - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: HTML Decoder pandas dataframe column (/thread-40800.html) |
HTML Decoder pandas dataframe column - mbrown009 - Sep-27-2023 I am getting html that I want to decode. If I do it with an example it works but not with my pandas dataframe. Any suggestions? #!/usr/bin/env python # coding: utf-8 # import statements import requests import pandas as pd import html # constants url = "https://chartexp1.sha.maryland.gov/CHARTExportClientService/getDMSMapDataJSON.do" # getting response response = requests.request("GET", url).json() # converting to dataframe df = pd.DataFrame(response['data']) #adding new column/converting msgHTML Encoded to decoded df['decodedHtml'] = html.unescape(df['msgHTML']) # saving dataframe to csv df.to_csv('output/response_python.csv') ##TESTING ONLY## myHtml = "<body><h1> How to use html.unescape() in Python </h1></body>" encodedHtml = html.escape(myHtml) print("Encoded HTML: ", encodedHtml) decodedHtml = html.unescape(encodedHtml) print("Decoded HTML: ", decodedHtml) print(html.unescape('© 2023')) RE: HTML Decoder pandas dataframe column - noisefloor - Sep-27-2023 Hi, the information provided is a bit thin... Is `response['data'] really HTML? I tried to make an API call with the URL from you post, but I receive a time-out error... What do you get instead when exporting your dataframe to CSV? Regards, noisefloor RE: HTML Decoder pandas dataframe column - mbrown009 - Sep-27-2023 Thanks for the reply. I apologize for the lack of information The issue is with the following line: #adding new column/converting msgHTML Encoded to decoded df['decodedHtml'] = html.unescape(df['msgHTML'])The issue is df['msgHTML'] has content similiar to the following <table class='dmsMsg'><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'>I-695 15 MILES</td></tr><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'>&nbsp;</td></tr><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'> 14 MINUTES</td></tr></table>What I am attempting to do is the convert that to the following format <table class='dmsMsg'><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'>I-695 15 MILES</td></tr><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'> </td></tr><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'> 14 MINUTES</td></tr></table> RE: HTML Decoder pandas dataframe column - deanhystad - Sep-29-2023 html.unsescape(str) cannot be used in a vectorized solution. Have to fall back to using DataFrame.apply(func) df["msgHTML"] = df["msgHTML"].apply(html.unescape) |