带循环附加数据框

Question

Morning, 早上，

I have 3 excels that i have imported via from excel. 我有3个从excel导入的excel。 I am trying to create a DataFrame which has taken the name ('Ticker') column from each import, add the title of the excel ('Secto') and append it to eachother to create a new DataFrame. 我正在尝试创建一个DataFrame，该数据帧已从每次导入中提取了名称（“ Ticker”）列，添加了excel的标题（“ Secto”），并将其附加到彼此之间以创建一个新的DataFrame。 This new DataFrame will then be exported to excel. 然后将这个新的DataFrame导出到excel。

AA  = ['Aero&Def','REITs', 'Auto&Parts']

File = 'FTSEASX_'+AA[0]+'_Price.xlsx'
xlsx = pd.ExcelFile('C:/Users/Ben/'+File)
df = pd.read_excel(xlsx, 'Price_Data')
df = df[df.Identifier.notnull()]
df.fillna(0)
a = []
b = []
for i in df['Ticker']:
    a.append(i)
    b.append(AA[0])
raw_data = {'Ticker': a, 'Sector': b}
df2 = pd.DataFrame(raw_data, columns = ['Ticker', 'Sector'])

del AA[0]

for j in AA:
    File = 'FTSEASX_'+j+'_Price.xlsx'
    xlsx = pd.ExcelFile('C:/Users/Ben/'+File)
    df3 = pd.read_excel(xlsx, 'Price_Data')
    df3 = df3[df3.Identifier.notnull()]
    df3.fillna(0)
    a = []
    b = []
    for i in df3['Ticker']:
        a.append(i)
        b.append(j)
    raw_data = {'Ticker': a, 'Sector': b}
    df4 = pd.DataFrame(raw_data, columns = ['Ticker', 'Sector'])
    df5 = df2.append(df4)

I am currently getting the below but obviously the 2nd import, titled 'REITs' is not getting captured. 我目前正在获取以下内容，但显然没有捕获名为“ REITs”的第二个导入。

Ticker  Sector
0   AVON-GB Aero&Def
1   BA-GB   Aero&Def
2   COB-GB  Aero&Def
3   MGGT-GB Aero&Def
4   SNR-GB  Aero&Def
5   ULE-GB  Aero&Def
6   QQ-GB   Aero&Def
7   RR-GB   Aero&Def
8   CHG-GB  Aero&Def
0   GKN-GB  Auto&Parts

How would i go about achieving this? 我将如何实现这一目标？ or is there a better more pythonic way of achieving this? 还是有更好的Python方式来实现这一目标？

Answer 1

I would do it this way: 我会这样：

import pandas as pd

AA  = ['Aero&Def','REITs', 'Auto&Parts']

# assuming that ['Ticker','Sector','Identifier'] columns are in 'B,D,E' Excel columns
xl_cols='B,D,E'

dfs = [ pd.read_excel('FTSEASX_{0}_Price.xlsx'.format(f),
                      'Price_Data',
                      parse_cols=xl_cols,
                     ).query('Identifier == Identifier')
        for f in AA]

df = pd.concat(dfs, ignore_index=True)

print(df[['Ticker', 'Sector']])

Explanation: 说明：

.query('Identifier == Identifier') - gives you only those rows where Identifier is NOT NULL (using the fact that value == NaN will always be False ) .query('Identifier == Identifier') -仅为您提供Identifier不为NULL的行（使用value == NaN始终为False的事实）

PS You don't want to loop through your data frames when working with Pandas... PS：在使用Pandas时，您不想遍历数据框...

带循环附加数据框

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-03-16 09:43:50

带循环附加数据框

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-03-16 09:43:50

解决方案1
1 已采纳 2016-03-16 09:43:50