Getting Started¶
This tutorial will give you a quick tour of the Pogam scraping library. It is written for macOS but should work for Linux.
Installation¶
The easiest way to install the library is by using pip:
$ pip install git+https://github.com/ludaavics/pogam.git
Verify your installation …
$ pogam --version
pogam, version 0.1.0
Alternatively, if you want to modify the library, you can do a developper
install and initialize the project using the provided make recipe:
$ git clone https://github.com/ludaavics/pogam.git
$ cd pogam
$ make init
$ pip install -e .
Configuration¶
By default, the scrape results are saved in a SQLite database stored in the
.pogam/ folder of your user directory. You can point to a different
database by setting the POGAM_DATABASE_URL environment variable to
a valid database URL.
Usage¶
Command Line¶
You can kick off a scrape directly from the command line:
$ pogam scrape rent 75009 75010
Scraping seloger...
...
Because the process can take a long time, it is often useful to turn on verbose output:
$ pogam -v DEBUG scrape rent 75009 75010 --min-price=1000
Scraping seloger...
Starting the scrape of 12 listings fetched from https://www.seloger.com/list.html?projects=2&types=1,2&places=[{cp:75009}|{cp:75010}]&price=0/NaN&surface=0/NaN&rooms=0,1,2,3,4,5,6,7,8,9&bedrooms=2,3,4,5,6,7,8&enterprise=0&qsVersion=1.0&natures=1,2 .
Scraping https://www.seloger.com/annonces/achat/appartement/paris-10eme-75/louis-blanc-aqueduc/153106473.htm ...
Scraping https://www.seloger.com/annonces/achat/appartement/paris-10eme-75/louis-blanc-aqueduc/150587457.htm ...
Scraping https://www.seloger.com/annonces/achat/appartement/paris-9eme-75/lorette-martyrs/145989607.htm ...
...
You can list all the supported query options with pogam scrape --help:
$ pogam scrape --help
Usage: pogam scrape [OPTIONS] TRANSACTION [POST_CODES]...
Scrape offers for a TRANSACTION in the given POST_CODES.
TRANSACTION is 'rent' or 'buy'. POSTCODES are postal or zip codes of the
search.
Options:
--type [apartment|house|parking|store]
Type of property.
--min-price FLOAT Minimum property price.
--max-price FLOAT Maximum property price.
--min-size FLOAT Minimum property size, in square meters.
--max-size FLOAT Maximum property size, in square meters.
--min-rooms FLOAT Minimum number of rooms.
--max-rooms FLOAT Maximum number of rooms.
--min-beds FLOAT Minimum number of bedrooms.
--max-beds FLOAT Maximum number of bedrooms.
--num-results INTEGER Approximate maximum number of listings to
add to the database. [default: 100]
--max-duplicates INTEGER Stop further scrapes once we see this many
consecutive results that are already in the
database.
--sources [seloger] Sources to scrape.
--help Show this message and exit.
Library¶
Alternatively, you can use Pogam as a library in your Python code:
In [1]: from pogam import create_app, db, scrapers
In [2]: app = create_app()
In [3]: with app.app_context():
...: results = scrapers.seloger("rent", "92130", min_size=29, max_size=31)
...: db.session.commit()
...: print(results)
...: print(results['added'][0].to_dict() if results['added'] else None)
...:
27Dec2019 21:36:58 - pogam.scrapers.seloger.254 - INFO - Starting the scrape of 5 listings fetched from https://www.seloger.com/list.html?projects=1&types=1,2&places=[{cp:92130}]&price=0/NaN&surface=29/31&rooms=0,1,2,3,4,5,6,7,8,9&bedrooms=0,1,2,3,4,5,6,7,8&enterprise=0&qsVersion=1.0 .
27Dec2019 21:36:58 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #0: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154722451.htm ...
27Dec2019 21:37:00 - pogam.scrapers.seloger.296 - DEBUG - 💫Scrape suceeded.💫
27Dec2019 21:37:00 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #1: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154231671.htm ...
27Dec2019 21:37:02 - pogam.scrapers.seloger.296 - DEBUG - 💫Scrape suceeded.💫
27Dec2019 21:37:02 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #2: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/epinettes/153620347.htm ...
27Dec2019 21:37:05 - pogam.scrapers.seloger.296 - DEBUG - 💫Scrape suceeded.💫
27Dec2019 21:37:05 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #3: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154659507.htm ...
27Dec2019 21:37:10 - pogam.scrapers.seloger.296 - DEBUG - 💫Scrape suceeded.💫
27Dec2019 21:37:10 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #4: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154635449.htm ...
27Dec2019 21:37:15 - pogam.scrapers.seloger.282 - DEBUG - 👻Failed to retrieve the page (ConnectTimeout).👻
27Dec2019 21:37:15 - pogam.scrapers.seloger.271 - DEBUG - Scraping link #4: https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154635449.htm ...
27Dec2019 21:37:21 - pogam.scrapers.seloger.282 - DEBUG - 👻Failed to retrieve the page (ProxyError).👻
27Dec2019 21:37:36 - pogam.scrapers.seloger.316 - DEBUG - Failed to scrape https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154635449.htm.
{'added': [<Listing 1>, <Listing 2>, <Listing 3>, <Listing 4>], 'seen': [], 'failed': ['https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154635449.htm']}
{'id': 1, 'transaction': 'rent', 'source': 'seloger', 'first_publication_date': None, 'price': 762.0, 'currency': '€', 'broker_fee': 446.1, 'broker_fee_is_included': None, 'security_deposit': None, 'is_furnished': None, 'description': "ISSY - QUARTIER Citeaux. 5'Métro, RER et T2.20 RUE D'ESTIENNE D'ORVES.. Dans immeuble de 1960, studio de 29.74 m² situé au 4ème et dernier étage avec ascenseur offrant: entrée avec placard, cuisine indépendante, séjour ouvrant sur balcon, salle de bains avec placard.. Honoraires d'agence: 446.10euros / dépôt de garantie: 642.30 euros. Loyer mensuel 642 euros - Charges locatives 120 euros - Honoraire TTC à la charge du locataire 446.1 euros dont 89.22 euros d'honoraires d'état des lieux.", 'property': {'id': 1, 'type': 'apartment', 'size': 29.74, 'floor': 4, 'floors': 1, 'rooms': 1.0, 'bedrooms': None, 'bathrooms': 1.0, 'balconies': 1, 'terraces': None, 'heating': 'central', 'kitchen': 'séparée', 'has_lawn': False, 'has_pool': False, 'has_elevator': True, 'has_fireplace': False, 'has_hardwood_floors': False, 'has_view': False, 'exposure': None, 'has_cellar': False, 'parkings': None, 'has_super': False, 'dpe_consumption': 450, 'dpe_emissions': 80, 'postal_code': '92130', 'city': 'issy les moulineaux', 'neighborhood': 'espaces jeunes', 'latitude': None, 'longitude': None, 'north_east_lat': 48.82523, 'north_east_long': 2.26799, 'south_west_lat': 48.82116, 'south_west_long': 2.26127}, 'url': 'https://www.seloger.com/annonces/locations/appartement/issy-les-moulineaux-92/espaces-jeunes/154722451.htm', 'external_listing_id': '154722451'}
Check out the API section for a complete reference.
Scheduled Tasks¶
The command line tool can be used with a task scheduler to periodically fetch new listings matching criteria of interest. For example, let’s set up a cron job that will look for 2 bedrooms for sale in the 9th arrondissement for less than 800,000€ every hour on the hour. Open your crontab file..
$ crontab -e
… and add the following line
0 * * * * pogam scrape buy 75009 --min-beds=2 --max-price=800000