[英]Append data into existing pandas dataframe at specific location
I have found a separate solutions for parts of what I want to do but nothing that has worked together.我已经为我想做的部分事情找到了一个单独的解决方案,但没有一起工作。
df=pd.DataFrame(columns=['Date','Dump Number','VC5 Start', 'VC2 Start'])
for files in VC5filelist:
#print(files)
filedate_df=files[18:26] #save date of file to a variable
filedump_df=files[31:33] #save dump number of file to a variable
ds=netCDF4.Dataset(files, 'r') #read each netcdf file into a data set
VC5gpsstart=ds.variables['TIME'][0] # save gps first timestamp of VC5 file into a variable
#append file data to main dataframe
df = df.append({'Date' : filedate_df, 'Dump' : filedump_df, 'VC5 Start' : VC5gpsstart},ignore_index = True)
for vc2files in VC2filelist:
print(vc2files)
vc2filedate_df=vc2files[18:26] #save date of file to a variable
vc2filedump_df=vc2files[31:33] #save dump number of file to a variable
print(vc2filedate_df+':'+vc2filedump_df)
dsvc2=netCDF4.Dataset(vc2files, 'r') #read each netcdf file into a data set
VC2gpstart=dsvc2.variables['time'][0] # save gps first timestamp of VC5 file into a variable
VC2df = VC2df.append({'Date' : vc2filedate_df, 'Dump' : vc2filedump_df, 'VC2 Start' : VC2gpsstart},ignore_index = True)
I want to append/insert the VC2time data into the last column(VC2 Start) and use the date and dump numbers of the second set of files to designate where in the dataframe the starttime should go.我想将 VC2time 数据附加/插入到最后一列(VC2 开始),并使用第二组文件的日期和转储号来指定 dataframe 中的开始时间应为 go。 example例子
Date Dump vc5start vc2start
2022.001 05 121651215 ***456447156***
the bold and italic data is the only thing i cannot produce right now.粗体和斜体数据是我现在唯一无法生成的内容。 I have been trying a find the correct row to insert my data with我一直在尝试找到正确的行来插入我的数据
row=df.index.get_loc(df.query('Date' == vc2filedate_df) and ('Dump'==vc2filedump_df).index[0])
to no avail.无济于事。 my next step was to be我的下一步是
df.loc[row:'VC2 Start']=VC2gpsstart
what I want to know is我想知道的是
A: given the date and dump number of the file from my set2, how do I find the row of the dataframe with the same date and dump number? A:给定我set2中文件的日期和转储号,我如何找到具有相同日期和转储号的dataframe的行?
B: how do I then add the VC2 start data into the VC2 start column of the data frame on the row found in question A? B:那我如何将VC2起始数据添加到问题A中找到的行的数据框的VC2起始列中?
@Larrybird @拉里伯德
VC5df VC2df
Date Dump VC5time Date Dump VC2time
2022.001 01 125 2022.001 01 125
2022.001 02 128 2022.001 02 130
2022.001 05 260 2022.001 05 261
2022.002 01 035 2022.002 01 035
@LarryBird, I after researching merge I found the (a) solution @LarryBird,我在研究合并后找到了(a)解决方案
creating datframes创建数据框
VC5df=pd.DataFrame(columns=['Date','Dump','VC5 Start'])
VC2df=pd.DataFrame(columns=['Date','Dump','VC2 Start'])
appending data to them within loops (as above), then using在循环中将数据附加到它们(如上),然后使用
merged_df=pd.merge(VC5df,VC2df,on=["Date","Dump"])
creates the following (looking at first and second day of 2022)创建以下内容(查看 2022 年的第一天和第二天)
Date Dump VC5 Start VC2 Start
0 2022.001 01 1325029429.0 1325029440.0
1 2022.001 02 1325030705.0 1325030760.0
2 2022.001 03 1325034031.0 1325034060.0
3 2022.001 04 1325035511.0 1325035560.0
4 2022.001 05 1325036791.0 1325036879.0
.. ... ... ... ...
103 2022.002 48 1325188946.0 1325188980.0
104 2022.002 49 1325191628.0 1325191680.0
105 2022.002 50 1325192627.0 1325192640.0
106 2022.002 51 1325195052.0 1325195100.0
107 2022.002 52 1325198890.0 1325198940.0
It sounds like you might be better off doing a .join()
/ .merge()
instead of trying to explicitly find the row index yourself.听起来您最好执行.join()
/ .merge()
而不是尝试自己显式查找行索引。 For example if both dataframes are indexed by Date
and dump
, you could do df1.merge(df2, on=['Date', 'dump'])
(or something to that effect).例如,如果两个数据帧都由Date
和dump
索引,您可以执行df1.merge(df2, on=['Date', 'dump'])
(或类似的操作)。
If you are interested, there is an excellent summary of join()
and merge()
on this answer .如果你有兴趣,这个答案上有一个关于join()
和merge()
的精彩总结。 Basically if both dataframes have matching index, and you wish to join on the index, you can use df1.join(df2)
to save typing.基本上,如果两个数据框都有匹配的索引,并且您希望加入索引,则可以使用df1.join(df2)
来保存输入。 merge()
is more flexible in that you can specify various combinations of index or columns to do the join on. merge()
更灵活,因为您可以指定索引或列的各种组合来进行连接。
Also worth knowing is pd.concat
, (see docshere ) which is another useful function when you are combining data.同样值得知道的是pd.concat
,(请参阅此处的文档),当您合并数据时,它是另一个有用的 function。 In particular, it can be more efficient (and more readable) if you need to join many dataframes, since you can call it on a list of dataframes in one line instead of having to loop through and join multiple times.特别是,如果您需要连接许多数据帧,它会更有效(并且更具可读性),因为您可以在一行中的数据帧列表中调用它,而不必循环并多次连接。
Hope this helps.希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.