简体   繁体   English

Python/Pandas:在基于 Excel 单元格的附加文件中创建列

[英]Python/Pandas: Create column in appended file based on Excel cell

I appended information from several Excel files into a single data frame.我将来自多个 Excel 文件的信息附加到单个数据框中。 Each Excel file has the same structure but corresponds to a different city.每个 Excel 文件具有相同的结构,但对应于不同的城市。 The city name is always located in the same cell (C2).城市名称始终位于同一单元格 (C2) 中。

How can I extract the city name in each file so that it appears as a column for the corresponding rows in my newly created data frame?如何提取每个文件中的城市名称,使其显示为新创建的数据框中相应行的列?

My appended data frame looks like this:我附加的数据框如下所示:

 Col1     Col2      
 40       34
 104      108
 23        1
 43        21

Hence, I can't tell which rows belong to file X or file Y. Ideally, I'd like to have a data frame such as:因此,我无法分辨哪些行属于文件 X 或文件 Y。理想情况下,我想要一个数据框,例如:

Col1   Col2     Col3      
City A   40       34
City A  104      108
City B   23        1
City B   43       21

I'm not sure if I should edit/write directly to the Excel files before I append them in order to add the corresponding city column.我不确定是否应该追加之前直接编辑/写入 Excel 文件以添加相应的城市列。 Or if I should this after or in the process of appending to my data frame.或者如果我应该在附加到我的数据框之后或过程中这样做。

Any guidance would be great.任何指导都会很棒。

Edit : This is my best attempt at reproducing the structure of my Excel sheets.编辑:这是我重现 Excel 工作表结构的最佳尝试。 Note the column A and rows 5, 6 and 7 are blank.请注意,A 列和第 5、6 和 7 行为空白。 The city name is located in row 2 column C.城市名称位于第 2 行 C 列。

I want to extract the information in rows 8 through 11 and add the city name in cell C3 as a column next to these rows.我想提取第 8 行到第 11 行中的信息,并将单元格 C3 中的城市名称添加为这些行旁边的列。

     ColA     ColB       ColC     ColD  ColE  ColF ColG
Row1          Type       XYZ                
Row2      CityName       XXX                
Row3      CityCode        10                
Row4         RYear        13                
Row5                        
Row6                        
Row7                        
Row8          Rank       Cat.       88    89   90    91
Row9            11         A       111   106  102   101
Row10           12         B       121   144  126   121
Row11           13         C       100   107  100   101

Edit2 : Following ALollz's advice, I tried the following code unsuccessfully. Edit2 :按照 ALollz 的建议,我尝试了以下代码失败。 I get an error " 'DataFrame' object has no attribute 'ColC' ".我收到错误消息“'DataFrame' 对象没有属性 'ColC'”。 Note that files_xlsx is a list that includes all Excel files.请注意, files_xlsx是一个包含所有 Excel 文件的列表。

all_data = pd.DataFrame()

 for f in files_xlsx:
    city_name = pd.read_excel(f, "SheetA", nrows=2).ColC[1]
    data = pd.read_excel(f, "SheetA", parse_cols="B:J")
    data['col_city'] = city_name
 all_data = all_data.append(data,ignore_index=True)

Edit3: Kept trying and finally found something that works. Edit3:不断尝试,终于找到了一些有用的东西。 The only issue is that cityname is only set to one row and not the entire column, which is what I want.唯一的问题是 cityname 仅设置为一行而不是整列,这正是我想要的。 Any help?有什么帮助吗?

  df = pd.DataFrame()

for f in files_xlsx:
    city_name = pd.read_excel(f, "Sheet1", nrows=2, parse_cols="C", header=None, skiprows=1, skip_footer=264)    
    data = pd.read_excel(f, "Sheet1", parse_cols="B:J", header=None, skiprows=8) 
    data['City'] = city_name
    df = df.append(data)

You can use nrows=1 for read only one value to one element df and then select value by DataFrame.iat :您可以使用nrows=1只读取一个值到一个元素df ,然后通过DataFrame.iat选择值:

f = 'file.xlsx'
city_name = pd.read_excel(f, "Sheet1", nrows=1, parse_cols="C", header=None, skiprows=1)    
print (city_name)
     0
0  XXX

data = pd.read_excel(f, "Sheet1", parse_cols="B:J", header=None, skiprows=8) 
data['City'] = city_name.iat[0,0]
print (data)
    0  1    2    3    4    5 City
0  11  A  111  106  102  101  XXX
1  12  B  121  144  126  121  XXX
2  13  C  100  107  100  101  XXX

In loop:在循环中:

dfs = []
for f in files_xlsx:
    city_name = pd.read_excel(f, "Sheet1", nrows=1, parse_cols="C", header=None, skiprows=1)
    data = pd.read_excel(f, "Sheet1", parse_cols="B:J", header=None, skiprows=8)
    data['City'] = city_name.iat[0,0]
    dfs.append(data)

df = pd.concat(dfs, ignore_index=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM