Apr-26-2018, 05:37 AM
I have created a pandas DataFrame which stores the html content of a product description. The html content is like below-
<p><img src="//ad.xyz.com/s/files/1/2352/2977/files/logo-3_large.png?v=1512189111" alt="10mois 5 in 1 Convertible Baby Bed & Desk"><br><br></p>\n<h1><strong>10 mois 5 in 1 Convertible Baby Bed & Desk<br><br></strong></h1>
Now I need to write a function which can parse the html tags using BeautifulSoup and can return a filtered version with whitelisted tags only.
Here whitelisted tags is basically a list of desired tags as below-
whitelist = ['p', 'h1','b','i','u','br','li']
Can anyone please help me to achieve this using Python 3.6?
Thanks!
<p><img src="//ad.xyz.com/s/files/1/2352/2977/files/logo-3_large.png?v=1512189111" alt="10mois 5 in 1 Convertible Baby Bed & Desk"><br><br></p>\n<h1><strong>10 mois 5 in 1 Convertible Baby Bed & Desk<br><br></strong></h1>
Now I need to write a function which can parse the html tags using BeautifulSoup and can return a filtered version with whitelisted tags only.
Here whitelisted tags is basically a list of desired tags as below-
whitelist = ['p', 'h1','b','i','u','br','li']
Can anyone please help me to achieve this using Python 3.6?
Thanks!