[英]Pull two columns from Excel and append key value pairs to dictionary
My apologies if similar questions have been asked -- I dug through quite a few, but they did not match my specific issue.如果有人问过类似的问题,我深表歉意——我挖掘了很多,但它们与我的具体问题不符。
Basically, I have an Excel spreadsheet with 2 columns;基本上,我有一个包含 2 列的 Excel 电子表格; Name and Email.
姓名和电子邮件。 I'm using
pandas
to grab the two columns from the file.我正在使用
pandas
从文件中获取两列。 I want to grab the values from the columns in order, and append them to a dictionary so that I can easily reference name and email pairs later on.我想按顺序从列中获取值,并将它们附加到字典中,以便以后可以轻松引用姓名和电子邮件对。
I currently have two functions in two files.我目前在两个文件中有两个函数。 One is my
main
file/function, and the other is a file named readExcel
with a function named read
:一个是我的
main
文件/函数,另一个是名为readExcel
的文件,其函数名为read
:
# readExcel.py
import pandas as pd
def read(fileName: str, sheetName: str):
f = pd.read_excel(fileName, sheet_name = sheetName)
return f
# __main__.py
import readExcel as re
from pathlib import Path
def main():
contacts = {}
p = Path(__file__).with_name('contacts.xlsx')
f = re.read(p, "Sheet1")
for n in f["Name"]:
for e in f["Email"]:
contacts[n] = e
print(contacts)
The issue I'm facing here is that the resulting dictionary is un-ordered, eg, Bob Testerson: jim.tester@gmailcom, Jim Tester: bob.testerson@gmail.com
我在这里面临的问题是生成的字典是无序的,例如
Bob Testerson: jim.tester@gmailcom, Jim Tester: bob.testerson@gmail.com
How would I go about properly ordering the data I'm pulling from the spreadsheet?我将如何正确排序我从电子表格中提取的数据?
EDIT: Per request, I'll add more information regarding the Excel file and preferred order.编辑:根据请求,我将添加有关 Excel 文件和首选顺序的更多信息。
The Excel file looks like this: Excel image preview Excel 文件如下所示: Excel 图像预览
As for the ordering of the data, it seems it would be best done before adding it to the dictionary, but that's not a requirement for me.至于数据的排序,似乎最好在将其添加到字典之前完成,但这对我来说不是必需的。 Also, I don't specifically care about the order in which the key / value pairs appear in the dictionary, but rather that the key /values pairs appear as they do in the Excel file, eg,
此外,我并不特别关心键/值对出现在字典中的顺序,而是键/值对出现在 Excel 文件中,例如,
{
"Jon Testerson": "jon.test@gmail.com",
"Henry": "henrytest@gmail.com",
"Bryce Testington": "brycetestington@gmail.com",
"Greg": "greg_test@yahoo.com",
"Jerry Testerfield", "jerrytester@hotmail.com"
}
Try this using the pandas to dict method.试试这个使用 pandas to dict 方法。 Just change the column names if you need to.
如果需要,只需更改列名称。
import pandas as pd
def read_excel(path_to_file):
df = pd.read_excel(path_to_file)
return df
def dataframe_to_dict(df, key_column, value_column):
name_email_dict = df.set_index(key_column)[value_column].to_dict()
return name_email_dict
if __name__ == "__main__":
path_to_file = 'C:\projects\scratchwork\excel_dict.xlsx'
df = read_excel(path_to_file)
name_email_dict = dataframe_to_dict(df,'Name','Email')
print(name_email_dict)
I'm sure there's an easier way to do it but I would put the data into a data frame and then use the sort_values method to sort them.我确信有一种更简单的方法可以做到,但我会将数据放入数据框中,然后使用 sort_values 方法对它们进行排序。 This would look something like:
这看起来像:
# readExcel.py
import pandas as pd
def read(fileName: str, sheetName: str):
f = pd.read_excel(fileName, sheet_name = sheetName)
return f
# __main__.py
import readExcel as re
from pathlib import Path
def main():
df = pd.DataFrame()
contacts = {}
p = Path(__file__).with_name('contacts.xlsx')
f = re.read(p, "Sheet1")
df = df.append(f,ignore_index=True)
print(df.sort_values(by=["Name","Email"]))
Again may not be the best way to do it but it should work if there is extra information on Sheet 1 then prior to the print I would do:同样可能不是最好的方法,但如果第 1 页上有额外的信息,那么它应该可以工作,然后在打印之前我会这样做:
df = df[['Name','Email']]
Which would then only select name and email然后只选择姓名和电子邮件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.