upsies.utils.html

HTML parsing

Functions

upsies.utils.html.as_text(html)[source]

Strip HTML tags from string and return text without markup

upsies.utils.html.dump(html, filepath)[source]

Write html to filepath for debugging

Parameters:

html – String or BeautifulSoup instance

upsies.utils.html.get(soup, *attributes)[source]

Get attributes from soup

These two calls are equivalent if all attributes exist:

>>> soup.table.tr.td
"td value"
>>> html.get(soup, "table", "tr", "td")
"td value"

But if any attribute is None (which is what BeautifulSoup returns for unknown tags), you get None instead of forcing you to catch an AttributeError:

>>> soup.table.no_such_attribute.td
AttributeError: 'NoneType' object has no attribute 'td'
>>> html.get(soup, "table", "no_such_attribute", "td")
None
upsies.utils.html.parse(string)[source]

Return BeautifulSoup instance

Parameters:

string – HTML document

Raises:

ContentError – if string is invalid HTML

upsies.utils.html.purge_tags(html)[source]

Return html with <script> and <style> tags removed

Parameters:

html (str) – HTML string