7.数据存储

第一章 `txt`文件存储

需求: 爬取知乎“好问题广场”の标题数据

showLineNumbers

# =================================
# @Time    : 2024年12月04日
# @Author  : 明廷盛
# @File    : 1.text文件读写.py
# @Software: PyCharm
# @ProjectBackground: 需求: 爬取[知乎](https://www.zhihu.com/explore)"好问题广场"の标题数据
# =================================

import requests
import os
from lxml import etree

# 爬虫STEP1:确定目标网站
aim_url = "https://www.zhihu.com/explore"
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
}

# 爬虫STEP2:模拟发送请求,并获取数据
result = requests.get(aim_url, headers=headers)

element = etree.HTML(result.content.decode())

# 爬虫STEP3:数据提取
# xpath 这里每个组内只需要一个信息, 就可以不用分组
titles = element.xpath('//div[@class="css-1g4zjtl"]//a/text()')

# 爬虫STEP4:数据存储
# 创建file文件夹
folder_path = "./file"
if not os.path.exists(folder_path):
    os.mkdir(folder_path)

# 存入txt文件
for title in titles:
    with open("./file/1.txt", 'a', encoding="utf-8") as f:
        f.write(title + "\n")

第二章 `json`文件存储

需求: 爬取4399的游戏数据, 需要①游戏url链接 ②游戏名称 ③标签 ④日期

语法: [[7.数据提取方式(json,正则)]] 看json那一章

# =================================
# @Time    : 2024年12月04日
# @Author  : 明廷盛
# @File    : 1.text文件读写.py
# @Software: PyCharm
# @ProjectBackground: 需求: 爬取[知乎](https://www.zhihu.com/explore)"好问题广场"の标题数据
# =================================

import requests
import os
from lxml import etree

# 爬虫STEP1:确定目标网站
aim_url = "https://www.zhihu.com/explore"
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
}

# 爬虫STEP2:模拟发送请求,并获取数据
result = requests.get(aim_url, headers=headers)

element = etree.HTML(result.content.decode())

# 爬虫STEP3:数据提取
# xpath 这里每个组内只需要一个信息, 就可以不用分组
titles = element.xpath('//div[@class="css-1g4zjtl"]//a/text()')

# 爬虫STEP4:数据存储
# 创建file文件夹
folder_path = "./file"
if not os.path.exists(folder_path):
    os.mkdir(folder_path)

# 存入txt文件
for title in titles:
    with open("./file/1.txt", 'a', encoding="utf-8") as f:
        f.write(title + "\n")

第三章 csv文件存储`csv包`

第一节字符串写入csv文件

语法: 直接字符串拼接, 每列之间为,分割, 最后\n换行; 同时保存文件为.csv即可

# 方式一: 不使用`csv包`
with open("./file/3.csv", 'w', encoding="utf-8") as f:
    f.write(f"{'姓名'},{'年龄'},{'性别'}\n")
    f.write(f"{'Tom'},{'12'},{'boy'}\n")
    f.write(f"{'Jack'},{'15'},{'boy'}\n")
    f.write(f"{'Alice'},{'23'},{'girl'}\n")

第二节列表写入csv文件`csv包`

步骤: ①获取csv对象csv.writer(f) ②使用writerow()/writerows()进行写入
! 如果出现多余换行, 需要在文件open()添加newline=’’参数

showLineNumbers ins

# 方式二: 使用csv包 [推荐]
# 其他: 如果出现多余换行, 需要在文件open()添加newline=''参数
with open("./file/3.csv", 'w', encoding="utf-8", newline='') as f:
    # 单行写入: `writerow([])`传入一维列表
    csv_f = csv.writer(f)  # 获取 写入的csv格式
    csv_f.writerow(["姓名", "年龄", "性别"])
    csv_f.writerow(["Tom", "12", "boy"])
    csv_f.writerow(["Jack", "15", "boy"])
    csv_f.writerow(["Alice", "23", "girl"])

    # 多行写入 `writerow([[]])`传入二维列表
    csv_f.writerows([["Porrty", "21", "boy"], ["Jock", "17", "girl"]])

第三节字典写入csv文件 `csv包`

语法:
1. csv.DictWriter(f, fieldnames=[表头列表])：获取可以将dict写入csv的对象
2. .writeheader(): 将fieldnames设置为表头
3. writerow()/writerows()进行写入
步骤: ①确定表头字段, 并获取csv对象，设置表头 ②写入数据
! 如果生成的csv有多余的换行, 在打开文件时设置open(,newline="")即可
! 注意⚠️:writerow(字典类型数据)方法只能写入字典, 不能写入列表

showLineNumbers ins

# 方式三: 写入字典
with open("./file/3.csv", 'w', encoding="utf-8", newline='') as f:
	# STEP1:确定表头字段, 并获取csv对象，设置表头
	field_list = ['姓名', '年龄', '性别']  # 表头(字段)
	csv_f = csv.DictWriter(f, fieldnames=field_list)  # 具有csv字典写入能力 对象
	csv_f.writeheader()  # 设置表头

	# STEP2:写入数据
	csv_f.writerow({"姓名": "Tom", "年龄": 18, "性别": "un"})

第四节作业

需求: 爬取b站搜索内容, 下的每个视频的①标题 ②地址 ③作者, 并以csv数据存储

ins

# =================================
# @Time    : 2024年12月04日
# @Author  : 明廷盛
# @File    : 4.作业(爬取b站搜索内容).py
# @Software: PyCharm
# @ProjectBackground: 需求: 爬取[b站](https://search.bilibili.com/all?keyword=%E7%91%9E%E5%B9%B8&page=3&o=72)搜索内容,
#                          下的每个视频的①标题 ②地址 ③作者, 并以csv数据存储
# =================================
""" 分析
==>动态数据
请求api为:
https://api.bilibili.com/x/web-interface/wbi/search/type?category_id=&search_type=video&ad_resource=5654&__refresh__=true&_extra=&context=&page=2&page_size=42&pubtime_begin_s=0&pubtime_end_s=0&from_source=&from_spmid=333.337&platform=pc&highlight=1&single_column=0&keyword=%E7%91%9E%E5%B9%B8&qv_id=ljobsqyJP7uWz9E6zgbYBxfp7LSuIgOl&source_tag=3&gaia_vtoken=&dynamic_offset=36&web_location=1430654&w_rid=e26d79f7f43bbb50b521c9e6d865fdad&wts=1733320986
①其中page控制页数; ②page_size的值为总页数
"""
import json
import requests
import csv
import re

# 爬虫STEP1:确定请求连接
aim_url = "https://api.bilibili.com/x/web-interface/wbi/search/type?category_id=&search_type=video&ad_resource=5654&__refresh__=true&_extra=&context=&page=2&page_size=42&pubtime_begin_s=0&pubtime_end_s=0&from_source=&from_spmid=333.337&platform=pc&highlight=1&single_column=0&keyword=%E7%91%9E%E5%B9%B8&qv_id=ljobsqyJP7uWz9E6zgbYBxfp7LSuIgOl&source_tag=3&gaia_vtoken=&dynamic_offset=36&web_location=1430654&w_rid=e26d79f7f43bbb50b521c9e6d865fdad&wts=1733320986"
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "cookie": "buvid3=B5801C99-7015-70D9-E2D0-9D5476DF94BB13236infoc; b_nut=1719627613; _uuid=E92D23104-EB19-29BB-E10E2-EE9BA10ACDCF316609infoc; buvid4=58E21993-8A2A-3732-7EB5-6B3F7369C53B36245-024062904-yZeVw5CDQwfmfzQfZ3ofirCI%2BLVEXNz%2BF70NOIo%2Borwd%2B3xxrAcsC4yerkwHOdDB; rpdid=0zbfVH0xZm|day9lIel|BqN|3w1SnpqS; buvid_fp_plain=undefined; enable_web_push=DISABLE; header_theme_version=CLOSE; hit-dyn-v2=1; LIVE_BUVID=AUTO3517233801782781; opus-goback=1; home_feed_column=5; DedeUserID=353998150; DedeUserID__ckMd5=554ee2cca5c0478d; fingerprint=aafd1ec12c184b75ae0ef4ffd0adfbce; buvid_fp=aafd1ec12c184b75ae0ef4ffd0adfbce; CURRENT_QUALITY=112; CURRENT_FNVAL=4048; browser_resolution=2276-376; SESSDATA=67a190ea%2C1748791958%2C2b080%2Ac2CjCqmV5NR1pzGFQvxJeX7XP65FeZULd5DxQVQa7EirZyvzaGs1hiEiCTQDLOSmFIuDgSVkxpVnhRVnVhOGRQdXNBTk11dnJhd0gtWXZqYXVPVnpDa2RaV19nSVBfQnR5MXlEMTFsVGRhRFJBcmlXUndzOUhReVdkVFUzWTBUTWRmdUdEUFc1LVhnIIEC; bili_jct=8e21aa490c50745d25863685cf9ed9cb; bp_t_offset_353998150=1006937009324818432; bili_ticket=eyJhbGciOiJIUzI1NiIsImtpZCI6InMwMyIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3MzM1NjI0MzksImlhdCI6MTczMzMwMzE3OSwicGx0IjotMX0.0B7g5BjNA9nckC-6Qwo5rTSRwirrP-2KUgIRiydhjGw; bili_ticket_expires=1733562379; PVID=3; b_lsid=1C6210110C_19391F788D2; sid=6exyhw10"
}

# 爬虫STEP2:模拟发送请求,并获取数据
result = requests.get(aim_url, headers=headers)

# 爬虫STEP3:数据处理
video_list = result.json()  # 获取到json格式的数据

# 爬虫STEP4:数据存储
with open("./file/4(b站).csv", "w", encoding="utf-8", newline="") as f:
    # csv①:确定表头(字段),获取csv对象,设置表头
    fieldname = ['title', 'arcurl', 'author']  # 确定表头
    f_csv = csv.DictWriter(f, fieldnames=fieldname)  # 获取csv对象
    f_csv.writeheader()  # 设置表头

    # csv②:遍历数据,写入csv文件
    for video in video_list['data']['result']:
        one_dict = {}
        one_dict['title'] = re.sub('<em class="keyword">|</em>', "", video['title'])  # 简单清洗下
        one_dict['arcurl'] = video['arcurl']
        one_dict['author'] = video['author']
        f_csv.writerow(one_dict) # 写入数据
        print(one_dict)

第四章 Python和Mysql

第一节建立连接和获取对象`pymysql`

STEP1: 建立连接

STEP2: 获取游标对象

STEP3:使用游标对象

STEP4:关闭连接

import pymysql

# STEP1:建立连接
db=pymysql.connect(host="localhost", user="root", password="root", db="py_spider") # db为需要连接的数据库

# STEP2:创建游标对象
cursor = db.cursor()

# STEP3:使用游标对象,运行sql语句
cursor.execute("select version();")
res = cursor.fetchone() # 获取当前sql语句返回的结果(一个)
print(res)

# STEP4:关闭连接
db.close()

第二节创建数据库/表

4.2.1 建数据库

1
2
3

# STEP3:使用cursor对象
sql = "create database if not exists py_spider charset = utf8;"
cursor.execute(sql)

4.2.2 建表

@ 心得: 没啥不一样的, 和建立数据库一样, 都是使用execute()方法

import pymysql

def create_table(sql):
    # STEP1:创建连接
    db = pymysql.connect(host="localhost", user="root", password="root", db="py_spider")

    # STEP2:获取游标对象
    cursor = db.cursor()

    # STEP3:使用游标对象
    try:
        res = cursor.execute(sql)  # execute的返回值, 感觉是查询数据的条数
        print("表创建成功", res)
    except Exception as e:
        print("error=>表创建失败", e)
    finally:
        # STEP4:关闭连接
        db.close()


if __name__ == '__main__':
    sql_statement = """ 
    create table if not exists students(
        id int auto_increment primary key, 
        name varchar(255) not null,
        age tinyint unsigned
    ) engine =innodb;
    """
    create_table(sql_statement)  # 创建表

第三节执行sql实例

4.3.1 自己封装的用于执行sql语句的方法

关于无限参数[[2.需要用到的python基础知识#七. 函数传递无限参数*args]]

import pymysql
def my_execute(sql, *params):
    # STEP1: 建立连接
    db = pymysql.connect(host="localhost", user="root", password="root", db="py_spider")
    # STEP2: 获取游标
    cursor = db.cursor()
    # STEP3: 使用游标
    try:
        res = cursor.execute(sql, params)
        db.commit()  # 提交事务(innodb的引擎)
        print("success ", res)
        [print(i) for i in cursor.fetchall()]  # 数据fetchall的内容(当有
    except Exception as e:
        print("==>error", e)
        db.rollback()  # 事务回滚
    finally:
        # STEP4: 关闭连接
        db.close()

4.3.1 插入数据

! 注意python中的(防止sql注入)的占位符是’%s’, 不是’?’

1
2
3

# 新增数据
sql = "insert into students(name, age) values(%s, %s);"  # 注意python中的(防止sql注入)的占位符是%s, 不是?
my_execute(sql, "mts", 20)

4.3.2 修改数据

! 可以用(键盘上左上角的’`‘符号) 来避免字段和数据库中的系统字段重名

1
2
3

# 修改数据
sql = "update students set `name` = %s, `age` = %s where `id` = %s"
my_execute(sql, "xixi", "21", "1")

4.3.3 查询数据

1
2
3

# 新增数据
sql = "select * from students"
my_execute(sql)

4.3.4 删除数据

1
2
3

# 删除数据
sql = "delete from students where name = %s"
my_execute(sql, "mts")

第四节作业

需求: 使用 面向对象的思路 爬取腾讯招聘, 并存储到数据库中

# =================================
# @Time    : 2024年12月31日
# @Author  : 明廷盛
# @File    : 8.作业(爬取腾讯招聘).py
# @Software: PyCharm
# @ProjectBackground: 需求: 使用 面向对象的思路 爬取 [腾讯招聘](https://careers.tencent.com/search.html?keyword=python), 并存储到数据库中
# =================================

import pymysql
import requests


class TxWork:
    # 爬虫STEP1:获取目标网站
    aim_url = "https://careers.tencent.com/tencentcareer/api/post/Query?timestamp=1736323379245&countryId=&cityId=&bgIds=&productId=&categoryId=&parentCategoryId=&attrId=&keyword=python" \
              "&pageIndex={}&pageSize=10&language=zh-cn&area=cn"
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
    }

    def __init__(self):
        self.db = pymysql.connect(host="localhost", user="root", password="root", db="py_spider")
        self.cursor = self.db.cursor()

    @classmethod
    def get_information(cls):
        """获取所有职位信息
        Returns:
            return_type: <class 'generator'>
        """
        for page in range(1,100):
            # 爬虫STEP2:模拟发送请求
            res = requests.get(cls.aim_url.format(page)).json()
            if res['Data']['Count'] == 0:
                print(f"全部数据已经插入, 共有{page}页数据")
                break
            print(f"正在抓取第{page}页的数据")
            yield res['Data']['Posts']

    def creat_table(self):
        """建tx_work表"""
        sql = """
        create table if not exists tx_work(
            id  int auto_increment not null primary key,
            post_id long ,
            recruit_postName varchar(100) ,
            responsibility text,
            post_url varchar(100)
        ) charset =utf8mb4 engine=innodb;
        """
        try:
            self.cursor.execute(sql)
            print("tx_work建表成功")
        except Exception as e:
            print("==>tx_work建表失败", e)

    def insert_data(self, *params):
        """插入数据
        :arg
            post_id(long): 职位ID
            recruit_postName(str): 职位名称
            responsibility(str): 负责内容
            post_url(str): 职位链接
        :return
           void
        :raises
             如果插入错误, 会直接抛出异常
        """
        print(params)
        sql = "insert into tx_work(post_id, recruit_postName,responsibility,post_url) values(%s,%s,%s,%s);"
        try:
            self.cursor.execute(sql, params)
            self.db.commit()
            print("成功插入")
        except Exception as e:
            self.db.rollback()
            print("==>插入数据失败", params, e)

    def main(self):
        # 建表
        self.creat_table()

        # 爬虫STEP3:清洗数据
        work_list = self.get_information()
        for works in work_list:
            for work in works:
                # 爬虫STEP4:存储数据
                self.insert_data(work['PostId'], work['RecruitPostName'], work['Responsibility'], work['PostURL'])
                print("数据插入中: PostId:{}, RecruitPostName:{}, Responsibility:{}, PostURL:{}".format(work['PostId'],
                                                                                                  work['RecruitPostName'],
                                                                                                  work['Responsibility'],
                                                                                                  work['PostURL']))
        self.db.close() # 别忘了关闭连接

if __name__ == '__main__':
    tx_work = TxWork()
    tx_work.main()

第五章 sqlalchemy

第六章 mongodb

下载

第一节简单使用

# =================================
# @Time    : 2025年01月08日
# @Author  : 明廷盛
# @File    : 9.使用mongodb数据库.py
# @Software: PyCharm
# @ProjectBackground: 
# =================================

import pymongo

# 获取连接
client = pymongo.MongoClient()  # 这里可以省略Host(localhost")和端口(27017)

# 相当于确定"哪张数据库"中的"哪张表"(mongodb无需create)
collection = client['stu']['info']  

# 插入一条数据
stu_info = {"_id": 1, "name": "安娜", "age": 18}
collection.insert_one(stu_info)

# 插入多条数据
students = [{"_id": 2, "name": "xixi", "age": 19}, {"_id": 3, "name": "haha", "age": 20}]
collection.insert_many(students)

# 查询数据
res = collection.find()
for stu in res:
    print(stu)

第二节作业

爬取爱奇艺影视信息(片名, 简介, 链接)

# =================================
# @Time    : 2025年01月09日
# @Author  : 明廷盛
# @File    : 10.作业(爬取爱奇艺影视信息).py
# @Software: PyCharm
# @ProjectBackground: 爱奇艺官网网址: https://list.iqiyi.com/
# =================================

import requests
import pymongo


class AQY:
    aim_url = "https://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&" \
              "page_id={}&ret_num=48&session=6edd98b29ba0a0950a4d3556849e8506"
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
    }

    def __init__(self):
        self.client = pymongo.MongoClient()
        self.collection = self.client["aqy"]["item"]  # 类别aqy数据库, item这张表

    @classmethod
    def get_information(cls, page):
        """爬取数据
        :arg:
            page(int):需要第几页的数据
        :return
            如果当前页有数据, 返回数据, 否则返回None
        """
        res = requests.get(cls.aim_url.format(page)).json()
        # 当前页没有数据, 返回None
        return None if res['code'] == 'A00003' else res['data']['list']

    def save(self, data):
        """存储数据到mongodb
        :arg
            data(dict): 需要存储的数据
        """
        try:
            self.collection.insert_one(data)
            print("插入成果")
        except Exception as e:
            print("==>插入失败:", e)

    def main(self):
        """外部调用"""
        for page in range(1, 100):
            lists = self.get_information(page)
            if lists is None:
                break
            print("当前是第{}页的数据".format(page))
            for one_data in lists:
                data = {}
                data['name'] = one_data['name']
                data['playUrl'] = one_data['playUrl']
                data['description'] = one_data['description']
                self.save(data)
                print(data, "正在存储中...")


if __name__ == '__main__':
    aqy = AQY()
    aqy.main()

第七章数据去重

第一节去重的原理

在线加解密网址

7.1.1 使用Set去重

缺陷: 如果程序崩溃, 已经保存的set就没有了, 下一次继续内容将无法(和之前set中的数据)去重

7.1.2使用Redis

如何解决7.1.1的问题, 前提是先暂时不给redis设置限制时间, 这样就算程序崩溃, 之前的数据还存在Redis中, 所以不会”丢失去重”;
如何存入Redis: 肯定是用Redis的Set; 存入的数据, 肯定不能是所有字段的字符串拼接(可以, 但不推荐)
==使用MD5算法:== MD5算法, 可以将任何东西(字符串/字典)==>定长的哈希值(32位) import hashlib

第二节作业

需求: 爬取芒果TV的视频信息数据

# =================================
# @Time    : 2025年01月10日
# @Author  : 明廷盛
# @File    : 11.取出(爬取芒果TV影视信息).py
# @Software: PyCharm
# @ProjectBackground: > 需求: 爬取 [芒果TV](https://www.mgtv.com/lib/2?lastp=list_index&lastp=ch_tv&kind=19&area=10&year=all&sort=c2&chargelnfo=a1&fpa=2912&fpos=)的视频信息数据
# =================================

import requests
import hashlib
import redis
import pymongo


class MovieInfo:
    aim_url = "https://pianku.api.mgtv.com/rider/list/pcweb/v3"
    headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
    }

    def __init__(self):
        self.mongo_client = pymongo.MongoClient()  # 获取mongodb的Client
        self.collection = self.mongo_client['mg']['item']  # mongodb的表
        self.redis_client = redis.Redis()  # 获取redis的Client

    @classmethod
    def get_information(cls, params):
        """
        返回爬取的数据
        :arg
            params(dict): get请求携带的参数
        :return:
            如果当前页有数据, 返回数据,否则返回None
        """
        res = requests.get(cls.aim_url, headers=cls.headers, params=params).json()
        return None if len(res['data']['hitDocs']) == 0 else res['data']["hitDocs"]

    @staticmethod
    def convert_by_MD5(dict):
        """将字典格式的数据, 转为MD5的定长32位字符串
        :arg
            dict(dict): 需要转换的字典数据
        """
        return hashlib.md5(str(dict).encode()).hexdigest()

    def save(self, data):
        insert_data = {}
        insert_data['title'] = data['title']
        insert_data['story'] = data['story']
        print("当前准备插入的数据", insert_data, end=" ")
        md5_value = self.convert_by_MD5(insert_data)
        # 存入Redis的Set中, 如果Redis中已有, 返回0, 否则返回1
        flag = self.redis_client.sadd("movie:filter", md5_value)
        if flag:
            self.collection.insert_one(insert_data)
            print("插入成功!")
        else:
            print("数据已经存在!!!")

    def main(self):
        for page in range(1, 10000):
            params = {
                "allowedRC": "1",
                "platform": "pcweb",
                "channelId": "2",
                "pn": page,
                "pc": "80",
                "hudong": "1",
                "_support": "10000000",
                "kind": "19",
                "area": "10",
                "year": "all",
                "sort": "c2"
            }
            res = self.get_information(params)
            if res is None:
                print("读取结束____end____读取结束")
                break
            print("当前读取到第{}页".format(page))
            for movies in res:
                print(type(res), len(res))
                self.save(movies)


if __name__ == '__main__':
    movie_info = MovieInfo()
    movie_info.main()

居然能找到这来, 真是厉害~

7.数据存储

第一章 txt文件存储

第二章 json文件存储

第三章 csv文件存储csv包

第一节 字符串写入csv文件

第二节 列表写入csv文件csv包

第三节 字典写入csv文件 csv包

第四节 作业

第四章 Python和Mysql

第一节 建立连接和获取对象pymysql

STEP1: 建立连接

STEP2: 获取游标对象

STEP3:使用游标对象

STEP4:关闭连接

第二节 创建数据库/表

4.2.1 建数据库

4.2.2 建表

第三节 执行sql实例

4.3.1 自己封装的用于执行sql语句的方法

4.3.1 插入数据

4.3.2 修改数据

4.3.3 查询数据

4.3.4 删除数据

第四节 作业

第五章 sqlalchemy

第六章 mongodb

第一节 简单使用

第二节 作业

第七章 数据去重

第一节 去重的原理

7.1.1 使用Set去重

7.1.2使用Redis

第二节 作业

第一章 `txt`文件存储

第二章 `json`文件存储

第三章 csv文件存储`csv包`

第一节字符串写入csv文件

第二节列表写入csv文件`csv包`

第三节字典写入csv文件 `csv包`

第四节作业

第一节建立连接和获取对象`pymysql`

第二节创建数据库/表

第三节执行sql实例

第四节作业

第一节简单使用

第二节作业

第七章数据去重

第一节去重的原理

第二节作业