20.第2部分、从数据到信息的七个提示词流程：分析与可视化按大陆洲划分的全球人口数据

# %% [python] cell-ae508cef942a
import pyodide, os, requests, js, json, micropip, pandas as pd
await micropip.install('shuyouqi-0.0.0-py3-none-any.whl')
await micropip.install('et_xmlfile-2.0.0-py3-none-any.whl')
from shuyouqi import profiling
await micropip.install('openpyxl-3.1.5-py2.py3-none-any.whl')
# %% [plaintext] cell-9ba6fcad55b1
《各国人口数据示例》
Country Name,1960,1961,1962,1963
Aruba,54922,55578,56320,57002
Africa Eastern and Southern,130072080,133534923,137171659,140945536
Afghanistan,9035043,9214083,9404406,9604487
Africa Western and Central,97630925,99706674,101854756,104089175
# %% [prompt] cell-3bd85190e2a9
我需要从pyodide文件系统的根目录加载一个名为population.csv的CSV文件。该文件包含从1960年到2023年各国的人口数据，其中第一列是国家名称（Country Name）、其它每一列都是具体年份的各国人口数量。表格示例参看《资料》。请生成相应的代码，使用pandas库读取这个CSV文件，并将其存储到一个名为df的DataFrame中。
# %% [python] cell-cca0c96ebdd1
import pandas as pd

# 从pyodide的根文件系统读取CSV文件
df = pd.read_csv('/population.csv')
print(df)
# %% [prompt] cell-b7c9339dd81f
我需要从df中列出所有国家名称，表格示例参看《各国人口数据示例》，请生成Python代码。
# %% [python] cell-64b49fbe9c55
# 提取国家名称列并转换为列表
country_list = df['Country Name'].tolist()

# 打印结果
print(country_list)
# %% [plaintext] cell-ae1a1c97c601
['Aruba', 'Africa Eastern and Southern', 'Afghanistan', 'Africa Western and Central', 'Angola', 'Albania', 'Andorra', 'Arab World', 'United Arab Emirates', 'Argentina', 'Armenia', 'American Samoa', 'Antigua and Barbuda', 'Australia', 'Austria', 'Azerbaijan', 'Burundi', 'Belgium', 'Benin', 'Burkina Faso', 'Bangladesh', 'Bulgaria', 'Bahrain', 'Bahamas, The', 'Bosnia and Herzegovina', 'Belarus', 'Belize', 'Bermuda', 'Bolivia', 'Brazil', 'Barbados', 'Brunei Darussalam', 'Bhutan', 'Botswana', 'Central African Republic', 'Canada', 'Central Europe and the Baltics', 'Switzerland', 'Channel Islands', 'Chile', 'China', "Cote d'Ivoire", 'Cameroon', 'Congo, Dem. Rep.', 'Congo, Rep.', 'Colombia', 'Comoros', 'Cabo Verde', 'Costa Rica', 'Caribbean small states', 'Cuba', 'Curacao', 'Cayman Islands', 'Cyprus', 'Czechia', 'Germany', 'Djibouti', 'Dominica', 'Denmark', 'Dominican Republic', 'Algeria', 'East Asia & Pacific (excluding high income)', 'Early-demographic dividend', 'East Asia & Pacific', 'Europe & Central Asia (excluding high income)', 'Europe & Central Asia', 'Ecuador', 'Egypt, Arab Rep.', 'Euro area', 'Eritrea', 'Spain', 'Estonia', 'Ethiopia', 'European Union', 'Fragile and conflict affected situations', 'Finland', 'Fiji', 'France', 'Faroe Islands', 'Micronesia, Fed. Sts.', 'Gabon', 'United Kingdom', 'Georgia', 'Ghana', 'Gibraltar', 'Guinea', 'Gambia, The', 'Guinea-Bissau', 'Equatorial Guinea', 'Greece', 'Grenada', 'Greenland', 'Guatemala', 'Guam', 'Guyana', 'High income', 'Hong Kong SAR, China', 'Honduras', 'Heavily indebted poor countries (HIPC)', 'Croatia', 'Haiti', 'Hungary', 'IBRD only', 'IDA & IBRD total', 'IDA total', 'IDA blend', 'Indonesia', 'IDA only', 'Isle of Man', 'India', 'Not classified', 'Ireland', 'Iran, Islamic Rep.', 'Iraq', 'Iceland', 'Israel', 'Italy', 'Jamaica', 'Jordan', 'Japan', 'Kazakhstan', 'Kenya', 'Kyrgyz Republic', 'Cambodia', 'Kiribati', 'St. Kitts and Nevis', 'Korea, Rep.', 'Kuwait', 'Latin America & Caribbean (excluding high income)', 'Lao PDR', 'Lebanon', 'Liberia', 'Libya', 'St. Lucia', 'Latin America & Caribbean', 'Least developed countries: UN classification', 'Low income', 'Liechtenstein', 'Sri Lanka', 'Lower middle income', 'Low & middle income', 'Lesotho', 'Late-demographic dividend', 'Lithuania', 'Luxembourg', 'Latvia', 'Macao SAR, China', 'St. Martin (French part)', 'Morocco', 'Monaco', 'Moldova', 'Madagascar', 'Maldives', 'Middle East & North Africa', 'Mexico', 'Marshall Islands', 'Middle income', 'North Macedonia', 'Mali', 'Malta', 'Myanmar', 'Middle East & North Africa (excluding high income)', 'Montenegro', 'Mongolia', 'Northern Mariana Islands', 'Mozambique', 'Mauritania', 'Mauritius', 'Malawi', 'Malaysia', 'North America', 'Namibia', 'New Caledonia', 'Niger', 'Nigeria', 'Nicaragua', 'Netherlands', 'Norway', 'Nepal', 'Nauru', 'New Zealand', 'OECD members', 'Oman', 'Other small states', 'Pakistan', 'Panama', 'Peru', 'Philippines', 'Palau', 'Papua New Guinea', 'Poland', 'Pre-demographic dividend', 'Puerto Rico', "Korea, Dem. People's Rep.", 'Portugal', 'Paraguay', 'West Bank and Gaza', 'Pacific island small states', 'Post-demographic dividend', 'French Polynesia', 'Qatar', 'Romania', 'Russian Federation', 'Rwanda', 'South Asia', 'Saudi Arabia', 'Sudan', 'Senegal', 'Singapore', 'Solomon Islands', 'Sierra Leone', 'El Salvador', 'San Marino', 'Somalia', 'Serbia', 'Sub-Saharan Africa (excluding high income)', 'South Sudan', 'Sub-Saharan Africa', 'Small states', 'Sao Tome and Principe', 'Suriname', 'Slovak Republic', 'Slovenia', 'Sweden', 'Eswatini', 'Sint Maarten (Dutch part)', 'Seychelles', 'Syrian Arab Republic', 'Turks and Caicos Islands', 'Chad', 'East Asia & Pacific (IDA & IBRD countries)', 'Europe & Central Asia (IDA & IBRD countries)', 'Togo', 'Thailand', 'Tajikistan', 'Turkmenistan', 'Latin America & the Caribbean (IDA & IBRD countries)', 'Timor-Leste', 'Middle East & North Africa (IDA & IBRD countries)', 'Tonga', 'South Asia (IDA & IBRD)', 'Sub-Saharan Africa (IDA & IBRD countries)', 'Trinidad and Tobago', 'Tunisia', 'Turkiye', 'Tuvalu', 'Tanzania', 'Uganda', 'Ukraine', 'Upper middle income', 'Uruguay', 'United States', 'Uzbekistan', 'St. Vincent and the Grenadines', 'Venezuela, RB', 'British Virgin Islands', 'Virgin Islands (U.S.)', 'Viet Nam', 'Vanuatu', 'World', 'Samoa', 'Kosovo', 'Yemen, Rep.', 'South Africa', 'Zambia', 'Zimbabwe'] 
# %% [prompt] cell-787d5cf9272b
根据这个国家列表，将每个国家/地区名按照其地理位置对应到大陆洲东南西北四部分的中文名称，例如美国对应到北美，英国对应到西欧，中国对应到东亚，印度对应到南亚。无法明确分类的条目保留为空字符串。生成一个Python JSON对象country_to_continent。
# %% [python] cell-779516905e7a
country_to_continent = {
    "Aruba": "",
    "Africa Eastern and Southern": "",
    "Afghanistan": "南亚",
    "Africa Western and Central": "",
    "Angola": "中非",
    "Albania": "南欧",
    "Andorra": "南欧",
    "Arab World": "",
    "United Arab Emirates": "西亚",
    "Argentina": "南美",
    "Armenia": "西亚",
    "American Samoa": "",
    "Antigua and Barbuda": "加勒比",
    "Australia": "",
    "Austria": "西欧",
    "Azerbaijan": "西亚",
    "Burundi": "东非",
    "Belgium": "西欧",
    "Benin": "西非",
    "Burkina Faso": "西非",
    "Bangladesh": "南亚",
    "Bulgaria": "东欧",
    "Bahrain": "西亚",
    "Bahamas, The": "加勒比",
    "Bosnia and Herzegovina": "南欧",
    "Belarus": "东欧",
    "Belize": "中美洲",
    "Bermuda": "",
    "Bolivia": "南美",
    "Brazil": "南美",
    "Barbados": "加勒比",
    "Brunei Darussalam": "东南亚",
    "Bhutan": "南亚",
    "Botswana": "南非",
    "Central African Republic": "中非",
    "Canada": "北美",
    "Central Europe and the Baltics": "",
    "Switzerland": "西欧",
    "Channel Islands": "",
    "Chile": "南美",
    "China": "东亚",
    "Cote d'Ivoire": "西非",
    "Cameroon": "中非",
    "Congo, Dem. Rep.": "中非",
    "Congo, Rep.": "中非",
    "Colombia": "南美",
    "Comoros": "东非",
    "Cabo Verde": "西非",
    "Costa Rica": "中美洲",
    "Caribbean small states": "",
    "Cuba": "加勒比",
    "Curacao": "",
    "Cayman Islands": "",
    "Cyprus": "西亚",
    "Czechia": "东欧",
    "Germany": "西欧",
    "Djibouti": "东非",
    "Dominica": "加勒比",
    "Denmark": "北欧",
    "Dominican Republic": "加勒比",
    "Algeria": "北非",
    "East Asia & Pacific (excluding high income)": "",
    "Early-demographic dividend": "",
    "East Asia & Pacific": "",
    "Europe & Central Asia (excluding high income)": "",
    "Europe & Central Asia": "",
    "Ecuador": "南美",
    "Egypt, Arab Rep.": "北非",
    "Euro area": "",
    "Eritrea": "东非",
    "Spain": "南欧",
    "Estonia": "东欧",
    "Ethiopia": "东非",
    "European Union": "",
    "Fragile and conflict affected situations": "",
    "Finland": "北欧",
    "Fiji": "",
    "France": "西欧",
    "Faroe Islands": "",
    "Micronesia, Fed. Sts.": "",
    "Gabon": "中非",
    "United Kingdom": "西欧",
    "Georgia": "西亚",
    "Ghana": "西非",
    "Gibraltar": "",
    "Guinea": "西非",
    "Gambia, The": "西非",
    "Guinea-Bissau": "西非",
    "Equatorial Guinea": "中非",
    "Greece": "南欧",
    "Grenada": "加勒比",
    "Greenland": "",
    "Guatemala": "中美洲",
    "Guam": "",
    "Guyana": "南美",
    "High income": "",
    "Hong Kong SAR, China": "东亚",
    "Honduras": "中美洲",
    "Heavily indebted poor countries (HIPC)": "",
    "Croatia": "南欧",
    "Haiti": "加勒比",
    "Hungary": "东欧",
    "IBRD only": "",
    "IDA & IBRD total": "",
    "IDA total": "",
    "IDA blend": "",
    "Indonesia": "东南亚",
    "IDA only": "",
    "Isle of Man": "",
    "India": "南亚",
    "Not classified": "",
    "Ireland": "西欧",
    "Iran, Islamic Rep.": "西亚",
    "Iraq": "西亚",
    "Iceland": "北欧",
    "Israel": "西亚",
    "Italy": "南欧",
    "Jamaica": "加勒比",
    "Jordan": "西亚",
    "Japan": "东亚",
    "Kazakhstan": "中亚",
    "Kenya": "东非",
    "Kyrgyz Republic": "中亚",
    "Cambodia": "东南亚",
    "Kiribati": "",
    "St. Kitts and Nevis": "加勒比",
    "Korea, Rep.": "东亚",
    "Kuwait": "西亚",
    "Latin America & Caribbean (excluding high income)": "",
    "Lao PDR": "东南亚",
    "Lebanon": "西亚",
    "Liberia": "西非",
    "Libya": "北非",
    "St. Lucia": "加勒比",
    "Latin America & Caribbean": "",
    "Least developed countries: UN classification": "",
    "Low income": "",
    "Liechtenstein": "西欧",
    "Sri Lanka": "南亚",
    "Lower middle income": "",
    "Low & middle income": "",
    "Lesotho": "南非",
    "Late-demographic dividend": "",
    "Lithuania": "东欧",
    "Luxembourg": "西欧",
    "Latvia": "东欧",
    "Macao SAR, China": "东亚",
    "St. Martin (French part)": "",
    "Morocco": "北非",
    "Monaco": "西欧",
    "Moldova": "东欧",
    "Madagascar": "东非",
    "Maldives": "南亚",
    "Middle East & North Africa": "",
    "Mexico": "北美",
    "Marshall Islands": "",
    "Middle income": "",
    "North Macedonia": "南欧",
    "Mali": "西非",
    "Malta": "南欧",
    "Myanmar": "东南亚",
    "Middle East & North Africa (excluding high income)": "",
    "Montenegro": "南欧",
    "Mongolia": "东亚",
    "Northern Mariana Islands": "",
    "Mozambique": "东非",
    "Mauritania": "西非",
    "Mauritius": "东非",
    "Malawi": "东非",
    "Malaysia": "东南亚",
    "North America": "",
    "Namibia": "南非",
    "New Caledonia": "",
    "Niger": "西非",
    "Nigeria": "西非",
    "Nicaragua": "中美洲",
    "Netherlands": "西欧",
    "Norway": "北欧",
    "Nepal": "南亚",
    "Nauru": "",
    "New Zealand": "",
    "OECD members": "",
    "Oman": "西亚",
    "Other small states": "",
    "Pakistan": "南亚",
    "Panama": "中美洲",
    "Peru": "南美",
    "Philippines": "东南亚",
    "Palau": "",
    "Papua New Guinea": "",
    "Poland": "东欧",
    "Pre-demographic dividend": "",
    "Puerto Rico": "",
    "Korea, Dem. People's Rep.": "东亚",
    "Portugal": "南欧",
    "Paraguay": "南美",
    "West Bank and Gaza": "",
    "Pacific island small states": "",
    "Post-demographic dividend": "",
    "French Polynesia": "",
    "Qatar": "西亚",
    "Romania": "东欧",
    "Russian Federation": "东欧",
    "Rwanda": "东非",
    "South Asia": "",
    "Saudi Arabia": "西亚",
    "Sudan": "北非",
    "Senegal": "西非",
    "Singapore": "东南亚",
    "Solomon Islands": "",
    "Sierra Leone": "西非",
    "El Salvador": "中美洲",
    "San Marino": "南欧",
    "Somalia": "东非",
    "Serbia": "南欧",
    "Sub-Saharan Africa (excluding high income)": "",
    "South Sudan": "东非",
    "Sub-Saharan Africa": "",
    "Small states": "",
    "Sao Tome and Principe": "中非",
    "Suriname": "南美",
    "Slovak Republic": "东欧",
    "Slovenia": "南欧",
    "Sweden": "北欧",
    "Eswatini": "南非",
    "Sint Maarten (Dutch part)": "",
    "Seychelles": "东非",
    "Syrian Arab Republic": "西亚",
    "Turks and Caicos Islands": "",
    "Chad": "中非",
    "East Asia & Pacific (IDA & IBRD countries)": "",
    "Europe & Central Asia (IDA & IBRD countries)": "",
    "Togo": "西非",
    "Thailand": "东南亚",
    "Tajikistan": "中亚",
    "Turkmenistan": "中亚",
    "Latin America & the Caribbean (IDA & IBRD countries)": "",
    "Timor-Leste": "东南亚",
    "Middle East & North Africa (IDA & IBRD countries)": "",
    "Tonga": "",
    "South Asia (IDA & IBRD)": "",
    "Sub-Saharan Africa (IDA & IBRD countries)": "",
    "Trinidad and Tobago": "加勒比",
    "Tunisia": "北非",
    "Turkiye": "西亚",
    "Tuvalu": "",
    "Tanzania": "东非",
    "Uganda": "东非",
    "Ukraine": "东欧",
    "Upper middle income": "",
    "Uruguay": "南美",
    "United States": "北美",
    "Uzbekistan": "中亚",
    "St. Vincent and the Grenadines": "加勒比",
    "Venezuela, RB": "南美",
    "British Virgin Islands": "",
    "Virgin Islands (U.S.)": "",
    "Viet Nam": "东南亚",
    "Vanuatu": "",
    "World": "",
    "Samoa": "",
    "Kosovo": "南欧",
    "Yemen, Rep.": "西亚",
    "South Africa": "南非",
    "Zambia": "东非",
    "Zimbabwe": "东非"
}
# %% [prompt] cell-d2349d1b1c7
根据country_to_continent字典和DataFrame变量df中的字段“Country Name”，在df中增加一个字段“大陆洲”。
# %% [python] cell-dd797fc2e19a
# 使用map方法根据国家名称映射对应的大陆洲
df['大陆洲'] = df['Country Name'].map(country_to_continent)

print(df)
# %% [prompt] cell-2ce768ef3af1
DataFrame变量df最初加载的数据示例参看《各国人口数据示例》，经过预处理增加了“大陆洲”字段。现在，我需要将数据进行透视处理，将原始的宽格式数据转换为长格式，将年份作为独立的列（列名为Year），保留“Country Name”和“大陆洲”两列，每年人口的value值则作为独立的“Population”列。将来可用于绘制每个大陆洲每年人口变化折线图。此外，我还需要将Year列的数据类型转换为整数。请生成相应的代码，使用pandas库来完成这一任务，并将处理后的数据存储回一个名为df2的DataFrame中。
# %% [python] cell-9784ff138a60
# 将宽格式数据转换为长格式，生成Year列
df2 = df.melt(
    id_vars=['Country Name', '大陆洲'],
    var_name='Year',
    value_name='Population'
)

# 将Year列的数据类型转换为整数
df2['Year'] = df2['Year'].astype(int)

print(df2)
# %% [prompt] cell-a29849c8d5f2
我要把df2作进一步处理，大陆洲为空值（包括空字符串）的记录直接筛选掉，然后统计出大陆洲各国每年的人口数量总和，并将处理后的数据存储回一个名为df3的DataFrame中。。请生成Python代码。
# %% [python] cell-c07e5244ef8e
# 过滤掉大陆洲为空值或空字符串的记录
df_filtered = df2[df2['大陆洲'].notna() & (df2['大陆洲'] != '')]

# 按大陆洲和年份分组，计算人口总和
df3 = df_filtered.groupby(['大陆洲', 'Year'])['Population'].sum().reset_index()

# 查看结果
print(df3)
# %% [prompt] cell-f37e6193c73
我已经完成了人口数据的加载和处理。现在，我想使用VegaLite语言绘制一个折线图，展示从1960年到2023年所有大陆洲的人口趋势线。x轴表示年份（Year），y轴表示人口数量（Population），并根据大陆洲名称进行颜色区分。数据来自df3变量。请生成相应的VegaLite JSON代码。
# %% [vegalite] cell-cf47fd177a3b
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "description": "全球人口趋势（1960-2023）",
  "width": 800,
  "height": 500,
  "data": {
    "values":  "df3"// 您的实际数据应替换此处    
  },
  "mark": {
    "type": "line",
    "strokeWidth": 3,
    "tooltip": true
  },
  "encoding": {
    "x": {
      "field": "Year",
      "type": "quantitative",
      "axis": {"title": "年份", "labelFlush": false}
    },
    "y": {
      "field": "Population",
      "type": "quantitative",
      "title": "人口数量",
      "axis": {"format": "~s"} // 自动转换为百万/十亿单位
    },
    "color": {
      "field": "大陆洲",
      "type": "nominal",
      "title": "国家/大洲",
      "scale": {"scheme": "category20"} // 使用20色分类调色板
    }
  },
  "config": {
    "view": {"stroke": "transparent"},
    "legend": {
      "columns": 2,
      "symbolType": "stroke",
      "title": null
    }
  }
}