python-stdnum branch master updated. 1.12-4-gde50109
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
python-stdnum branch master updated. 1.12-4-gde50109
- From: Commits of the python-stdnum project <python-stdnum-commits [at] lists.arthurdejong.org>
- To: python-stdnum-commits [at] lists.arthurdejong.org
- Reply-to: python-stdnum-users [at] lists.arthurdejong.org, python-stdnum-commits [at] lists.arthurdejong.org
- Subject: python-stdnum branch master updated. 1.12-4-gde50109
- Date: Fri, 27 Dec 2019 15:07:46 +0100 (CET)
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "python-stdnum".
The branch, master has been updated
via de501093728d1b106e912923ad711adf06a6d29e (commit)
from 831c66990ab8fb3040e5b7fc74668836a1d4ef64 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
https://arthurdejong.org/git/python-stdnum/commit/?id=de501093728d1b106e912923ad711adf06a6d29e
commit de501093728d1b106e912923ad711adf06a6d29e
Author: Arthur de Jong <arthur@arthurdejong.org>
Date: Fri Dec 27 15:01:35 2019 +0100
Switch to using lxml for HTML parsing
This avoids an extra dependency on BeautifulSoup and makes the code more
consistent.
diff --git a/stdnum/do/ncf.py b/stdnum/do/ncf.py
index 1b55133..8a9038f 100644
--- a/stdnum/do/ncf.py
+++ b/stdnum/do/ncf.py
@@ -161,11 +161,8 @@ def check_dgii(rnc, ncf, timeout=30): # pragma: no cover
}
Will return None if the number is invalid or unknown."""
+ import lxml.html
import requests
- try:
- from bs4 import BeautifulSoup
- except ImportError:
- from BeautifulSoup import BeautifulSoup
from stdnum.do.rnc import compact as rnc_compact
rnc = rnc_compact(rnc)
ncf = compact(ncf)
@@ -173,10 +170,11 @@ def check_dgii(rnc, ncf, timeout=30): # pragma: no cover
headers = {
'User-Agent': 'Mozilla/5.0 (python-stdnum)',
}
- result = BeautifulSoup(
+ # Get the page to pick up needed form parameters
+ document = lxml.html.fromstring(
requests.get(url, headers=headers, timeout=timeout).text)
- validation = result.find('input', {'name': '__EVENTVALIDATION'})['value']
- viewstate = result.find('input', {'name': '__VIEWSTATE'})['value']
+ validation =
document.find('.//input[@name="__EVENTVALIDATION"]').get('value')
+ viewstate = document.find('.//input[@name="__VIEWSTATE"]').get('value')
data = {
'__EVENTVALIDATION': validation,
'__VIEWSTATE': viewstate,
@@ -184,14 +182,15 @@ def check_dgii(rnc, ncf, timeout=30): # pragma: no cover
'ctl00$cphMain$txtNCF': ncf,
'ctl00$cphMain$txtRNC': rnc,
}
- result = BeautifulSoup(
+ # Do the actual request
+ document = lxml.html.fromstring(
requests.post(url, headers=headers, data=data, timeout=timeout).text)
- results = result.find(id='ctl00_cphMain_pResultado')
- if results:
+ result = document.find('.//div[@id="ctl00_cphMain_pResultado"]')
+ if result is not None:
data = {
- 'validation_message':
result.find(id='ctl00_cphMain_lblInformacion').get_text().strip(),
+ 'validation_message':
document.findtext('.//*[@id="ctl00_cphMain_lblInformacion"]').strip(),
}
data.update(zip(
- [x.get_text().strip().rstrip(':') for x in
results.find_all('strong')],
- [x.get_text().strip() for x in results.find_all('span')]))
+ [x.text.strip().rstrip(':') for x in result.findall('.//strong')],
+ [x.text.strip() for x in result.findall('.//span')]))
return _convert_result(data)
-----------------------------------------------------------------------
Summary of changes:
stdnum/do/ncf.py | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
hooks/post-receive
--
python-stdnum
- python-stdnum branch master updated. 1.12-4-gde50109,
Commits of the python-stdnum project