API

The primary Parser class and serialize decorator are available from the models and serializers modules.

>>> from soupstars.models import Parser
>>> from soupstars.serializers import serialize

Those objects are also on the top-level api.

>>> from soupstars import Parser, serialize

Models

The primary model provided by soupstars is the Parser class. It should generally be subclassed when building your own parsers.

When you initialize a parser with a url, it automatically downloads the webpage at that url and stores both the request and response as attributes.

>>> from soupstars import Parser, serialize
>>> class MyParser(Parser):
...     @serialize
...     def item(self):
...         return 'An item!'
>>> parser = MyParser('https://jsonplaceholder.typicode.com/todos/1')
>>> print(parser.response)
<Response [200]>
>>> print(parser.request)
<PreparedRequest [GET]>
class soupstars.models.Parser(url)[source]

Primary class for building parsers.

Parameters:url (str) – The url to parse
serializer_names()[source]

Returns a list of the names of the functions to be serialized.

serializer_functions()[source]

Returns a list of the functions to be serialized.

to_tuples()[source]

Returns a list of (name, value) tuples of each function to be serialized.

to_dict()[source]

Convert the parser to a dictionary, with keys the names of each serializer and values the value of each serializer

to_json()[source]

Convert the parser to a JSON object

Serializers

Serializers help convert parsers into storable objects. The functions defined in this module are used to instruct soupstars about how to perform the serialization.

soupstars.serializers.serialize(function)[source]

Decorating a function defined on a parser with serialize instructs soupstars to include that function’s return value when building its own serialization.

>>> from soupstars import Parser, serialize
>>> class MyParser(Parser):
...     @serialize
...     def length(self):
...         return len(self.response.content)
...
>>> parser = MyParser('https://jsonplaceholder.typicode.com/todos/1')
>>> parser.serializer_names()
['length']
>>> 'length' in parser.to_dict()
True