【Python】Not getting all rows with BeautifulSoup

by SEBONE · 2020年9月9日

Table of Contents

Overview

When Scraping with Beautiful Soup a problem occurred like not getting all rows of the table but a few of them. This example shows how to fix it

Environment

Python 3.7.3

Problem occurred example

This is a example table you want to scrape.

Number	Name
1	Sato
2	Kato
3	Ito
4	Goto

I saw the html code in a web browser by pushing F12 key.

<table class="test_table">
<thead>
<tr>
<th>Number</th>
<th>Name</th>
</tr>
</thead>
<tbody>
    <tr>
    <td>1</td>
    <td>Sato</td>
    </tr>
</tbody>
<tbody>
    <tr>
    <td>2</td>
    <td>Kato</td>
    </tr>
</tbody>
<tbody>
    <tr>
    <td>3</td>
    <td>Ito</td>
    </tr>
</tbody>
<tbody>
    <tr>
    <td>4</td>
    <td>Goto</td>
    </tr>
</tbody>
</table>

When I use the following code to scrape the table with BeautifulSoup, the result is just a few of rows. I didn’t know why.

import requests
from bs4 import BeautifulSoup

r = requests.get('http://example.com',headers = headers)
soup = BeautifulSoup(r.content, "html.parser")

table = soup.findAll('table',{'class':"test_table"})[0]

rows = table.findAll('tr')
for row in rows:
    print(row)

Result

<tr>
<th>Number</th>
<th>Name</th>
</tr>
</thead>
<tbody>
    <tr>
    <td>1</td>
    <td>Sato</td>
    </tr>
</tbody>
<tbody>
    <tr>
    <td>2</td>
    <td>Kato</td>
    </tr>
</tbody>

The way to fix it

I used lxml for Beautiful Soup parser instead of html.parser, then it works.

import requests
from bs4 import BeautifulSoup

r = requests.get('http://example.com',headers = headers)
soup = BeautifulSoup(r.content, "lxml")

table = soup.findAll('table',{'class':"test_table"})[0]

rows = table.findAll('tr')
for row in rows:
    print(row)

I don’t understand completely about the difference between lxml and html.parser, but I will remember this way if I encounter the same problem in the future.

【Python】Not getting all rows with BeautifulSoup

Overview

Environment

Problem occurred example

Result

The way to fix it

You may also like...

Leave a Reply Cancel reply

【Python】Not getting all rows with BeautifulSoup

Overview

Environment

Problem occurred example

Result

The way to fix it

You may also like...

【Python】Transforming datetime to date and time with pandas dateframe

Python : Insert dataframe data into MySQL table

Multiple Regression Analysis by Python statsmodels

Leave a Reply Cancel reply

Python : Insert dataframe data into MySQL table