python - Parse answer in Quora that contains code -
i want parse post quora or generic post code.
example: http://qr.ae/rkplrt
through using selenium, python library, can html inside of post:
h = html2text.html2text() content = ans.find_element_by_class_name('inline_editor_value') html_string = content.get_attribute('innerhtml') text = h.handle(html_string) print text
i single chunk of text. in case of tables contain code, html2text inserts many \n
, not handle indices of rows.
so can see this:
https://imageshack.com/i/paekbzt4p (this principal div contains table code.) https://imageshack.com/i/hlixfayop (the text html2text extracts)
https://imageshack.com/i/hlhfbxvqp (instead, final print of text, problems index rows , \n
s.)
i had tried different settings, bypasse_tables, present in guide on github: (https://github.com/alir3z4/html2text/blob/master/docs/usage.md#available-options), had no success.
could tell me how use html2text in case?
you don't need use html2text
@ all.
selenium
can "text" directly:
from selenium import webdriver driver = webdriver.chrome() driver.get("http://qr.ae/rkplrt") print(driver.find_element_by_class_name('inline_editor_content').text)
it prints content of post:
the single line of code must useful, not meant confusing or obfuscating. ... examples have created or encountered ?
Comments
Post a Comment