[英]Python/Pandas: Create column in appended file based on Excel cell
I appended information from several Excel files into a single data frame.我将来自多个 Excel 文件的信息附加到单个数据框中。 Each Excel file has the same structure but corresponds to a different city.
每个 Excel 文件具有相同的结构,但对应于不同的城市。 The city name is always located in the same cell (C2).
城市名称始终位于同一单元格 (C2) 中。
How can I extract the city name in each file so that it appears as a column for the corresponding rows in my newly created data frame?如何提取每个文件中的城市名称,使其显示为新创建的数据框中相应行的列?
My appended data frame looks like this:我附加的数据框如下所示:
Col1 Col2
40 34
104 108
23 1
43 21
Hence, I can't tell which rows belong to file X or file Y. Ideally, I'd like to have a data frame such as:因此,我无法分辨哪些行属于文件 X 或文件 Y。理想情况下,我想要一个数据框,例如:
Col1 Col2 Col3
City A 40 34
City A 104 108
City B 23 1
City B 43 21
I'm not sure if I should edit/write directly to the Excel files before I append them in order to add the corresponding city column.我不确定是否应该在追加之前直接编辑/写入 Excel 文件以添加相应的城市列。 Or if I should this after or in the process of appending to my data frame.
或者如果我应该在附加到我的数据框之后或过程中这样做。
Any guidance would be great.任何指导都会很棒。
Edit : This is my best attempt at reproducing the structure of my Excel sheets.编辑:这是我重现 Excel 工作表结构的最佳尝试。 Note the column A and rows 5, 6 and 7 are blank.
请注意,A 列和第 5、6 和 7 行为空白。 The city name is located in row 2 column C.
城市名称位于第 2 行 C 列。
I want to extract the information in rows 8 through 11 and add the city name in cell C3 as a column next to these rows.我想提取第 8 行到第 11 行中的信息,并将单元格 C3 中的城市名称添加为这些行旁边的列。
ColA ColB ColC ColD ColE ColF ColG
Row1 Type XYZ
Row2 CityName XXX
Row3 CityCode 10
Row4 RYear 13
Row5
Row6
Row7
Row8 Rank Cat. 88 89 90 91
Row9 11 A 111 106 102 101
Row10 12 B 121 144 126 121
Row11 13 C 100 107 100 101
Edit2 : Following ALollz's advice, I tried the following code unsuccessfully. Edit2 :按照 ALollz 的建议,我尝试了以下代码失败。 I get an error " 'DataFrame' object has no attribute 'ColC' ".
我收到错误消息“'DataFrame' 对象没有属性 'ColC'”。 Note that
files_xlsx
is a list that includes all Excel files.请注意,
files_xlsx
是一个包含所有 Excel 文件的列表。
all_data = pd.DataFrame()
for f in files_xlsx:
city_name = pd.read_excel(f, "SheetA", nrows=2).ColC[1]
data = pd.read_excel(f, "SheetA", parse_cols="B:J")
data['col_city'] = city_name
all_data = all_data.append(data,ignore_index=True)
Edit3: Kept trying and finally found something that works. Edit3:不断尝试,终于找到了一些有用的东西。 The only issue is that cityname is only set to one row and not the entire column, which is what I want.
唯一的问题是 cityname 仅设置为一行而不是整列,这正是我想要的。 Any help?
有什么帮助吗?
df = pd.DataFrame()
for f in files_xlsx:
city_name = pd.read_excel(f, "Sheet1", nrows=2, parse_cols="C", header=None, skiprows=1, skip_footer=264)
data = pd.read_excel(f, "Sheet1", parse_cols="B:J", header=None, skiprows=8)
data['City'] = city_name
df = df.append(data)
You can use nrows=1
for read only one value to one element df
and then select value by DataFrame.iat
:您可以使用
nrows=1
只读取一个值到一个元素df
,然后通过DataFrame.iat
选择值:
f = 'file.xlsx'
city_name = pd.read_excel(f, "Sheet1", nrows=1, parse_cols="C", header=None, skiprows=1)
print (city_name)
0
0 XXX
data = pd.read_excel(f, "Sheet1", parse_cols="B:J", header=None, skiprows=8)
data['City'] = city_name.iat[0,0]
print (data)
0 1 2 3 4 5 City
0 11 A 111 106 102 101 XXX
1 12 B 121 144 126 121 XXX
2 13 C 100 107 100 101 XXX
In loop:在循环中:
dfs = []
for f in files_xlsx:
city_name = pd.read_excel(f, "Sheet1", nrows=1, parse_cols="C", header=None, skiprows=1)
data = pd.read_excel(f, "Sheet1", parse_cols="B:J", header=None, skiprows=8)
data['City'] = city_name.iat[0,0]
dfs.append(data)
df = pd.concat(dfs, ignore_index=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.