Getting Started

This tutorial will give you a quick tour of the Pogam scraping library. It is written for macOS but should work for Linux.

Requirements

Make sure you have the following installed:

Installation

The easiest way to install the library is by using pip:

$ pip install git+https://github.com/ludaavics/pogam.git

Verify your installation …

$ pogam --version
pogam, version 0.1.0

Alternatively, if you want to modify the library, you can do a developper install and initialize the project using the provided make recipe:

$ git clone https://github.com/ludaavics/pogam.git
$ cd pogam
$ make init
$ pip install -e .

Configuration

By default, the scrape results are saved in a SQLite database stored in the .pogam/ folder of your user directory. You can point to a different database by setting the POGAM_DATABASE_URL environment variable to a valid database URL.

Usage

Command Line

You can kick off a scrape directly from the command line:

$ pogam scrape rent 75009 75010
Scraping seloger...
...

Because the process can take a long time, it is often useful to turn on verbose output:

$ pogam -v DEBUG scrape rent 75009 75010 --min-price=1000
Scraping seloger...
Starting the scrape of 12 listings fetched from https://www.seloger.com/list.html?projects=2&types=1,2&places=[{cp:75009}|{cp:75010}]&price=0/NaN&surface=0/NaN&rooms=0,1,2,3,4,5,6,7,8,9&bedrooms=2,3,4,5,6,7,8&enterprise=0&qsVersion=1.0&natures=1,2 .
Scraping https://www.seloger.com/annonces/achat/appartement/paris-10eme-75/louis-blanc-aqueduc/153106473.htm ...
Scraping https://www.seloger.com/annonces/achat/appartement/paris-10eme-75/louis-blanc-aqueduc/150587457.htm ...
Scraping https://www.seloger.com/annonces/achat/appartement/paris-9eme-75/lorette-martyrs/145989607.htm ...
...

You can list all the supported query options with pogam scrape --help:

$ pogam scrape --help
Usage: pogam scrape [OPTIONS] TRANSACTION [POST_CODES]...

  Scrape offers for a TRANSACTION in the given POST_CODES.

  TRANSACTION is 'rent' or 'buy'. POSTCODES are postal or zip codes of the
  search.

Options:
  --type [apartment|house|parking|store]
                                  Type of property.
  --min-price FLOAT               Minimum property price.
  --max-price FLOAT               Maximum property price.
  --min-size FLOAT                Minimum property size, in square meters.
  --max-size FLOAT                Maximum property size, in square meters.
  --min-rooms FLOAT               Minimum number of rooms.
  --max-rooms FLOAT               Maximum number of rooms.
  --min-beds FLOAT                Minimum number of bedrooms.
  --max-beds FLOAT                Maximum number of bedrooms.
  --num-results INTEGER           Approximate maximum number of listings to
                                  add to the database.  [default: 100]
  --max-duplicates INTEGER        Stop further scrapes once we see this many
                                  consecutive results that are already in the
                                  database.
  --sources [seloger]             Sources to scrape.
  --help                          Show this message and exit.

Library

Alternatively, you can use Pogam as a library in your Python code:

In [1]: from pogam import create_app, db, scrapers

In [2]: app = create_app()

In [3]: with app.app_context():
   ...:     results = scrapers.seloger("rent", "92130", min_size=29, max_size=31)
   ...:     db.session.commit()
   ...:     print(results)
   ...:     print(results['added'][0].to_dict() if results['added'] else None)
   ...: 
27Dec2019 21:36:58 - pogam.scrapers.seloger.254 - INFO - Starting the scrape of 5 listings fetched from https://www.seloger.com/list.html?projects=1&types=1,2&places=[{cp:92130}]&price=0/NaN&surface=29/31&rooms=0,1,2,3,4,5,6,7,8,9&bedrooms=0,1,2,3,4,5,6,7,8&enterprise=0&qsVersion=1.0 .
27Dec2019 21:36:58 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #0: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154722451.htm ...
27Dec2019 21:37:00 - pogam.scrapers.seloger.296 - DEBUG - 💫Scrape suceeded.💫
27Dec2019 21:37:00 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #1: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154231671.htm ...
27Dec2019 21:37:02 - pogam.scrapers.seloger.296 - DEBUG - 💫Scrape suceeded.💫
27Dec2019 21:37:02 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #2: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/epinettes/153620347.htm ...
27Dec2019 21:37:05 - pogam.scrapers.seloger.296 - DEBUG - 💫Scrape suceeded.💫
27Dec2019 21:37:05 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #3: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154659507.htm ...
27Dec2019 21:37:10 - pogam.scrapers.seloger.296 - DEBUG - 💫Scrape suceeded.💫
27Dec2019 21:37:10 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #4: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154635449.htm ...
27Dec2019 21:37:15 - pogam.scrapers.seloger.282 - DEBUG - 👻Failed to retrieve the page (ConnectTimeout).👻
27Dec2019 21:37:15 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #4: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154635449.htm ...
27Dec2019 21:37:21 - pogam.scrapers.seloger.282 - DEBUG - 👻Failed to retrieve the page (ProxyError).👻
27Dec2019 21:37:36 - pogam.scrapers.seloger.316 - DEBUG - Failed to scrape https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154635449.htm.
{'added': [<Listing 1>, <Listing 2>, <Listing 3>, <Listing 4>], 'seen': [], 'failed': ['https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154635449.htm']}
{'id': 1, 'transaction': 'rent', 'source': 'seloger', 'first_publication_date': None, 'price': 762.0, 'currency': '€', 'broker_fee': 446.1, 'broker_fee_is_included': None, 'security_deposit': None, 'is_furnished': None, 'description': "ISSY - QUARTIER Citeaux. 5'Métro, RER et T2.20 RUE D'ESTIENNE D'ORVES.. Dans immeuble de 1960, studio de 29.74 m² situé au 4ème et dernier étage avec ascenseur offrant: entrée avec placard, cuisine indépendante, séjour ouvrant sur balcon, salle de bains avec placard.. Honoraires d'agence: 446.10euros / dépôt de garantie: 642.30 euros. Loyer mensuel 642 euros - Charges locatives 120 euros - Honoraire TTC à la charge du locataire 446.1 euros dont 89.22 euros d'honoraires d'état des lieux.", 'property': {'id': 1, 'type': 'apartment', 'size': 29.74, 'floor': 4, 'floors': 1, 'rooms': 1.0, 'bedrooms': None, 'bathrooms': 1.0, 'balconies': 1, 'terraces': None, 'heating': 'central', 'kitchen': 'séparée', 'has_lawn': False, 'has_pool': False, 'has_elevator': True, 'has_fireplace': False, 'has_hardwood_floors': False, 'has_view': False, 'exposure': None, 'has_cellar': False, 'parkings': None, 'has_super': False, 'dpe_consumption': 450, 'dpe_emissions': 80, 'postal_code': '92130', 'city': 'issy les moulineaux', 'neighborhood': 'espaces jeunes', 'latitude': None, 'longitude': None, 'north_east_lat': 48.82523, 'north_east_long': 2.26799, 'south_west_lat': 48.82116, 'south_west_long': 2.26127}, 'url': 'https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154722451.htm', 'external_listing_id': '154722451'}

Check out the API section for a complete reference.

Scheduled Tasks

The command line tool can be used with a task scheduler to periodically fetch new listings matching criteria of interest. For example, let’s set up a cron job that will look for 2 bedrooms for sale in the 9th arrondissement for less than 800,000€ every hour on the hour. Open your crontab file..

$ crontab -e

… and add the following line

0 * * * * pogam scrape buy 75009 --min-beds=2 --max-price=800000