Python

【Python】Not getting all rows with BeautifulSoup

投稿日:

Overview

When Scraping with Beautiful Soup a problem occurred like not getting all rows of the table but a few of them. This example shows how to fix it

Environment

Python 3.7.3

Problem occurred example

This is a example table you want to scrape.

NumberName
1Sato
2Kato
3Ito
4Goto

I saw the html code in a web browser by pushing F12 key.

<table class="test_table">
<thead>
<tr>
<th>Number</th>
<th>Name</th>
</tr>
</thead>
<tbody>
    <tr>
    <td>1</td>
    <td>Sato</td>
    </tr>
</tbody>
<tbody>
    <tr>
    <td>2</td>
    <td>Kato</td>
    </tr>
</tbody>
<tbody>
    <tr>
    <td>3</td>
    <td>Ito</td>
    </tr>
</tbody>
<tbody>
    <tr>
    <td>4</td>
    <td>Goto</td>
    </tr>
</tbody>
</table>

When I use the following code to scrape the table with BeautifulSoup, the result is just a few of rows. I didn’t know why.

import requests
from bs4 import BeautifulSoup

r = requests.get('http://example.com',headers = headers)
soup = BeautifulSoup(r.content, "html.parser")

table = soup.findAll('table',{'class':"test_table"})[0]

rows = table.findAll('tr')
for row in rows:
    print(row)

Result

<tr>
<th>Number</th>
<th>Name</th>
</tr>
</thead>
<tbody>
    <tr>
    <td>1</td>
    <td>Sato</td>
    </tr>
</tbody>
<tbody>
    <tr>
    <td>2</td>
    <td>Kato</td>
    </tr>
</tbody>

The way to fix it

I used lxml for Beautiful Soup parser instead of html.parser, then it works.

import requests
from bs4 import BeautifulSoup

r = requests.get('http://example.com',headers = headers)
soup = BeautifulSoup(r.content, "lxml")

table = soup.findAll('table',{'class':"test_table"})[0]

rows = table.findAll('tr')
for row in rows:
    print(row)

I don’t understand completely about the difference between lxml and html.parser, but I will remember this way if I encounter the same problem in the future.

-Python

執筆者:


comment

Your email address will not be published. Required fields are marked *

関連記事

【Python】Changing images periodically with tkinter

Contents1 Overview2 Python library to be used3 Step1 : Display a image with tkintertkinter4 Step2 : Changing images5 Summary Overview There is a good library in python to make GUI, that is tkinter. It can display images too. Here, we are trying to change images periodically with tkinter. Python library to be used Following three libraries are used. tkinter ‘tkinter’ is used to make GUI. PIL(pillow) ‘PIL’ is used to deal with images in the python. threading ‘threading’ is used to change images automatically. Detail about that will be explained later. By the way, if you are using anaconda which …

【Python】Transforming datetime to date and time with pandas dateframe

Contents1 Original Data2 Transforming dataframe to date and time. Original Data This code is to make sample dataframe. import pandas as pd import datetime as dt df = pd.DataFrame([dt.datetime(2020,6,1,0,0,0),dt.datetime(2020,6,2,10,0,0),dt.datetime(2020,6,3,15,0,0)],columns=[’datetimes’])     datetimes 0 2020-06-01 00:00:00 1 2020-06-02 10:00:00 2 2020-06-03 15:00:00 Transforming dataframe to date and time. It can realize to use lambda & apply functions like following. df[’dates’] = df[’datetimes’].apply(lambda x : dt.date(x.year,x.month,x.day)) df[’times’] = df[’datetimes’].apply(lambda x : dt.time(x.hour,x.minute,x.second))      datetimes      dates   times 0 2020-06-01 00:00:00 2020-06-01 00:00:00 1 2020-06-02 10:00:00 2020-06-02 10:00:00 2 2020-06-03 15:00:00 2020-06-03 15:00:00 If you want to get with string, these code is better. df[’dates’] …

Python : Insert dataframe data into MySQL table 

Contents1 Overview2 Environments3 Step1 : Making the table4 Step2 : Making data5 Step3 : Inserting data into MySQL table6 Additional explanation Overview Dataframe type in python is so useful to data processing and it’s possible to insert data as dataframe into MySQL . There is a sample of that. Environments Python 3.7.3MySQL 5.5.62 Step1 : Making the table Defining a table like the following. > CREATE DATABASE testdb; > CREATE TABLE testdb.mysql_table( col1 int ,col2 int ,col3 int ); Step2 : Making data Making data for insert from python. the data should be the same type as the table you …

【Python】From DataFrame To list type

Contents1 Overview2 Example table in databse3 Solution Overview Pandas can get data from a database with read_sql easily.Here we can show how to convert dataframe to list type for only one row from database. Example table in databse a,b,c are columns of this test_table. abc110100220200330300440400test_table Let’s get only “a” column from this table with pandas.These are sample code. import pandas as pd import psycopg2 connection = psycopg2.connect(host=’host’, dbname=’database’, user=’username’, password=’password’) df = pd.read_sql("SELECT a FROM test_table", connection) df.head() Result If you want to convert this result from dataframe to list type, you may think like [1, 2, 3, 4]. There …

Connection to PostgreSQL, Oracle&MySQL from Python

Contents1 Overview2 How to connect PostgreSQL3 Connect to MySQL4 Oracleの場合 Overview There are some samples to connect PostgreSQL, Oracle, MySQL from Python. How to connect PostgreSQL Package installation pip install psycopg2 Example import psycopg2 HOST = ‘your_host’ PORT = ‘5432’ DB_NAME = ‘your_db_name’ USER = ‘your_user_name’ PASSWORD = ‘your_password’ conn = psycopg2.connect("host=" + HOST + " port=" + PORT + " dbname=" + DB_NAME + " user=" + USER + " password=" + PASSWORD ) cur = conn.cursor() cur.execute("select version()") rows = cur.fetchall() cur.close() conn.close() print(rows) Connect to MySQL Package installation pip install mysqlclient Example import MySQLdb HOST = ‘your_host’ …

Language Switcher

Categories