17.从CSV数据上传到数据分析报告的全流程解析.md

# %%--- [html] cell-9fdcc99f7a17
# properties:
#   run_on_load: true
# ---%%
请上传一个CSV文件：<br><input class="btn btn-primary" type="file" id="fileInput" multiple />
# %%--- [python] cell-f396f9a44240
# properties:
#   run_on_load: true
# ---%%
import pandas as pd
# %%--- [javascript] cell-287e48d98051
# properties:
#   run_on_load: true
# ---%%
initFileInputs("fileInput", {
    "csv": "/tourist_arrivals_countries.csv"
})
# %% [python] cell-f922fe6ed3b9
import js

prompt = '''
你是一个精通Python Pandas的分析师，请编写Python代码用于以下任务：
将以下数据集读取到一个 pandas DataFrame 中，文件路径为 `/tourist_arrivals_countries.csv`，并将 Date 字段解析为日期格式。  
数据集包含以下字段：Date、IT、FR、DE、PT、ES、UK。除 Date 字段外，其他字段均为国家代码。  
过滤掉 1994 年之前和 2018 年之后的记录。  
从 Date 字段中提取年份，并创建一个名为 Year 的新列。  
按年份对数据进行分组，并计算每年每个国家的旅游到访总和，分组后重置索引，并将结果存储在名为 `yearly_arrivals` 的 DataFrame 中。
'''

content = await js.window.chatDeepseek(prompt)
js.window.saveCodeBlock(content)
# %% [python] cell-1f8f638e95d8
# Read the dataset and parse the Date field
df = pd.read_csv('/tourist_arrivals_countries.csv', parse_dates=['Date'])

# Filter rows between 1994 and 2018 (inclusive)
df = df[(df['Date'].dt.year >= 1994) & (df['Date'].dt.year <= 2018)]

# Extract year from Date and create new Year column
df['Year'] = df['Date'].dt.year


# Group by Year and calculate sum of tourist arrivals for each country
yearly_arrivals = df.groupby('Year').sum(numeric_only=True).reset_index()

yearly_arrivals['Year'] = yearly_arrivals['Year'].astype(str)

# Display the result
print(yearly_arrivals)
# %% [python] cell-d5b3652017fd
import requests
import micropip
await micropip.install('https://shuyouqi.com/shuyouqi-0.0.0-py3-none-any.whl')
from shuyouqi import profiling

report_url = profiling.build(yearly_arrivals)
print("分析报告", report_url)