python - Parse answer in Quora that contains code -


i want parse post quora or generic post code.
example: http://qr.ae/rkplrt

through using selenium, python library, can html inside of post:

 h = html2text.html2text()  content = ans.find_element_by_class_name('inline_editor_value')  html_string = content.get_attribute('innerhtml')  text = h.handle(html_string)  print text 

i single chunk of text. in case of tables contain code, html2text inserts many \n , not handle indices of rows.

so can see this:
https://imageshack.com/i/paekbzt4p (this principal div contains table code.) https://imageshack.com/i/hlixfayop (the text html2text extracts)
https://imageshack.com/i/hlhfbxvqp (instead, final print of text, problems index rows , \ns.)

i had tried different settings, bypasse_tables, present in guide on github: (https://github.com/alir3z4/html2text/blob/master/docs/usage.md#available-options), had no success.

could tell me how use html2text in case?

you don't need use html2text @ all.

selenium can "text" directly:

from selenium import webdriver  driver = webdriver.chrome() driver.get("http://qr.ae/rkplrt")  print(driver.find_element_by_class_name('inline_editor_content').text) 

it prints content of post:

the single line of code must useful, not meant confusing or obfuscating.  ...  examples have created or encountered ? 

Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -