lists.arthurdejong.org
RSS feed

python-stdnum branch master updated. 1.12-4-gde50109

[Date Prev][Date Next] [Thread Prev][Thread Next]

python-stdnum branch master updated. 1.12-4-gde50109



This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "python-stdnum".

The branch, master has been updated
       via  de501093728d1b106e912923ad711adf06a6d29e (commit)
      from  831c66990ab8fb3040e5b7fc74668836a1d4ef64 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://arthurdejong.org/git/python-stdnum/commit/?id=de501093728d1b106e912923ad711adf06a6d29e

commit de501093728d1b106e912923ad711adf06a6d29e
Author: Arthur de Jong <arthur@arthurdejong.org>
Date:   Fri Dec 27 15:01:35 2019 +0100

    Switch to using lxml for HTML parsing
    
    This avoids an extra dependency on BeautifulSoup and makes the code more
    consistent.

diff --git a/stdnum/do/ncf.py b/stdnum/do/ncf.py
index 1b55133..8a9038f 100644
--- a/stdnum/do/ncf.py
+++ b/stdnum/do/ncf.py
@@ -161,11 +161,8 @@ def check_dgii(rnc, ncf, timeout=30):  # pragma: no cover
         }
 
     Will return None if the number is invalid or unknown."""
+    import lxml.html
     import requests
-    try:
-        from bs4 import BeautifulSoup
-    except ImportError:
-        from BeautifulSoup import BeautifulSoup
     from stdnum.do.rnc import compact as rnc_compact
     rnc = rnc_compact(rnc)
     ncf = compact(ncf)
@@ -173,10 +170,11 @@ def check_dgii(rnc, ncf, timeout=30):  # pragma: no cover
     headers = {
         'User-Agent': 'Mozilla/5.0 (python-stdnum)',
     }
-    result = BeautifulSoup(
+    # Get the page to pick up needed form parameters
+    document = lxml.html.fromstring(
         requests.get(url, headers=headers, timeout=timeout).text)
-    validation = result.find('input', {'name': '__EVENTVALIDATION'})['value']
-    viewstate = result.find('input', {'name': '__VIEWSTATE'})['value']
+    validation = 
document.find('.//input[@name="__EVENTVALIDATION"]').get('value')
+    viewstate = document.find('.//input[@name="__VIEWSTATE"]').get('value')
     data = {
         '__EVENTVALIDATION': validation,
         '__VIEWSTATE': viewstate,
@@ -184,14 +182,15 @@ def check_dgii(rnc, ncf, timeout=30):  # pragma: no cover
         'ctl00$cphMain$txtNCF': ncf,
         'ctl00$cphMain$txtRNC': rnc,
     }
-    result = BeautifulSoup(
+    # Do the actual request
+    document = lxml.html.fromstring(
         requests.post(url, headers=headers, data=data, timeout=timeout).text)
-    results = result.find(id='ctl00_cphMain_pResultado')
-    if results:
+    result = document.find('.//div[@id="ctl00_cphMain_pResultado"]')
+    if result is not None:
         data = {
-            'validation_message': 
result.find(id='ctl00_cphMain_lblInformacion').get_text().strip(),
+            'validation_message': 
document.findtext('.//*[@id="ctl00_cphMain_lblInformacion"]').strip(),
         }
         data.update(zip(
-            [x.get_text().strip().rstrip(':') for x in 
results.find_all('strong')],
-            [x.get_text().strip() for x in results.find_all('span')]))
+            [x.text.strip().rstrip(':') for x in result.findall('.//strong')],
+            [x.text.strip() for x in result.findall('.//span')]))
         return _convert_result(data)

-----------------------------------------------------------------------

Summary of changes:
 stdnum/do/ncf.py | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)


hooks/post-receive
-- 
python-stdnum