[英]Append a dataframe with a loop
Morning, 早上,
I have 3 excels that i have imported via from excel. 我有3个从excel导入的excel。 I am trying to create a DataFrame which has taken the name ('Ticker') column from each import, add the title of the excel ('Secto') and append it to eachother to create a new DataFrame.
我正在尝试创建一个DataFrame,该数据帧已从每次导入中提取了名称(“ Ticker”)列,添加了excel的标题(“ Secto”),并将其附加到彼此之间以创建一个新的DataFrame。 This new DataFrame will then be exported to excel.
然后将这个新的DataFrame导出到excel。
AA = ['Aero&Def','REITs', 'Auto&Parts']
File = 'FTSEASX_'+AA[0]+'_Price.xlsx'
xlsx = pd.ExcelFile('C:/Users/Ben/'+File)
df = pd.read_excel(xlsx, 'Price_Data')
df = df[df.Identifier.notnull()]
df.fillna(0)
a = []
b = []
for i in df['Ticker']:
a.append(i)
b.append(AA[0])
raw_data = {'Ticker': a, 'Sector': b}
df2 = pd.DataFrame(raw_data, columns = ['Ticker', 'Sector'])
del AA[0]
for j in AA:
File = 'FTSEASX_'+j+'_Price.xlsx'
xlsx = pd.ExcelFile('C:/Users/Ben/'+File)
df3 = pd.read_excel(xlsx, 'Price_Data')
df3 = df3[df3.Identifier.notnull()]
df3.fillna(0)
a = []
b = []
for i in df3['Ticker']:
a.append(i)
b.append(j)
raw_data = {'Ticker': a, 'Sector': b}
df4 = pd.DataFrame(raw_data, columns = ['Ticker', 'Sector'])
df5 = df2.append(df4)
I am currently getting the below but obviously the 2nd import, titled 'REITs' is not getting captured. 我目前正在获取以下内容,但显然没有捕获名为“ REITs”的第二个导入。
Ticker Sector
0 AVON-GB Aero&Def
1 BA-GB Aero&Def
2 COB-GB Aero&Def
3 MGGT-GB Aero&Def
4 SNR-GB Aero&Def
5 ULE-GB Aero&Def
6 QQ-GB Aero&Def
7 RR-GB Aero&Def
8 CHG-GB Aero&Def
0 GKN-GB Auto&Parts
How would i go about achieving this? 我将如何实现这一目标? or is there a better more pythonic way of achieving this?
还是有更好的Python方式来实现这一目标?
I would do it this way: 我会这样:
import pandas as pd
AA = ['Aero&Def','REITs', 'Auto&Parts']
# assuming that ['Ticker','Sector','Identifier'] columns are in 'B,D,E' Excel columns
xl_cols='B,D,E'
dfs = [ pd.read_excel('FTSEASX_{0}_Price.xlsx'.format(f),
'Price_Data',
parse_cols=xl_cols,
).query('Identifier == Identifier')
for f in AA]
df = pd.concat(dfs, ignore_index=True)
print(df[['Ticker', 'Sector']])
Explanation: 说明:
.query('Identifier == Identifier')
- gives you only those rows where Identifier
is NOT NULL (using the fact that value == NaN
will always be False
) .query('Identifier == Identifier')
-仅为您提供Identifier
不为NULL的行(使用value == NaN
始终为False
的事实)
PS You don't want to loop through your data frames when working with Pandas... PS:在使用Pandas时,您不想遍历数据框...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.