python - How to find all comments with Beautiful Soup -
this question asked 4 years ago, answer out of date bs4.
i want delete comments in html file using beautiful soup. since bs4 makes each comment special type of navigable string, thought code work:
for comments in soup.find_all('comment'): comments.decompose()
so didn't work.... how find comments using bs4?
you can pass function find_all() check whether string comment.
for example have below html:
<body> <!-- branding , main navigation --> <div class="branding">the science & safety behind favorite products</div> <div class="l-branding"> <p>just brand</p> </div> <!-- test comment here --> <div class="block_content"> <a href="https://www.google.com">google</a> </div> </body>
code:
from bs4 import beautifulsoup bs bs4 import comment .... soup=bs(html,'html.parser') comments=soup.find_all(string=lambda text:isinstance(text,comment)) c in comments: print c print "===========" c.decompose()
the output be:
branding , main navigation ============ test comment here ============
btw, think reason why find_all('comment')
doesn't work (from beautifulsoup document):
pass in value name , you’ll tell beautiful soup consider tags names. text strings ignored, tags names don’t match.
Comments
Post a Comment