[英]PYTHON: df.drop does not work in function
I have some problems with pandas DataFrame. 我对Pandas DataFrame有一些问题。 I hope that anyone can help me.
我希望任何人都能帮助我。 I downloaded some data from cryptocompare and wrote it to a cvs file.
我从cryptocompare下载了一些数据,并将其写入了cvs文件。 My goal is to update this cvs file constantly on a daily basis.
我的目标是每天不断更新此cvs文件。
After downloading the new data in an extra DataFrame I want to merge it with the existing data. 在额外的DataFrame中下载了新数据之后,我想将其与现有数据合并。 Therefore I wrote a function (read_dataset) that read the existing data of the cvs file in a DataFrame.
因此,我编写了一个函数(read_dataset)来读取DataFrame中cvs文件的现有数据。 The next step should to merge the new data with the existing data.
下一步应该将新数据与现有数据合并。 I tried pd.merge and pd.concate but all this does not work.
我尝试了pd.merge和pd.concate,但是所有这些都不起作用。
My DataFrames looks like: 我的DataFrames看起来像:
open time volumefrom volumeto Timestamp
0 0.04951 1279324800 20.00 9.902000e-01 2010-07-17
1 0.04951 1279411200 75.01 5.090000e+00 2010-07-18
2 0.08584 1279497600 574.00 4.966000e+01 2010-07-19
3 0.08080 1279584000 262.00 2.059000e+01 2010-07-20
4 0.07474 1279670400 575.00 4.226000e+01 2010-07-21
5 0.07921 1279756800 2160.00 1.297800e+02 2010-07-22
6 0.05050 1279843200 2402.50 1.410700e+02 2010-07-23
7 0.06262 1279929600 496.32 2.673000e+01 2010-07-24
8 0.05454 1280016000 1551.48 8.506000e+01 2010-07-25
9 0.05050 1280102400 877.00 4.691000e+01 2010-07-26
10 0.05600 1280188800 3373.69 1.969200e+02 2010-07-27
11 0.06000 1280275200 4390.29 2.557600e+02 2010-07-28
12 0.05890 1280361600 8058.49 5.283200e+02 2010-07-29
13 0.06990 1280448000 3020.85 1.985300e+02 2010-07-30
14 0.06270 1280534400 4022.25 2.439000e+02 2010-07-31
15 0.06785 1280620800 2601.00 1.626500e+02 2010-08-01
16 0.06110 1280707200 3599.00 2.212000e+02 2010-08-02
17 0.06000 1280793600 9821.46 6.060500e+02 2010-08-03
18 0.06000 1280880000 3494.00 2.107700e+02 2010-08-04
19 0.05700 1280966400 5034.07 3.036100e+02 2010-08-05
20 0.06100 1281052800 1395.00 8.591000e+01 2010-08-06
21 0.06230 1281139200 2619.00 1.573400e+02 2010-08-07
22 0.05900 1281225600 2201.00 1.326000e+02 2010-08-08
23 0.06090 1281312000 13631.09 8.869300e+02 2010-08-09
24 0.07100 1281398400 1310.39 8.887000e+01 2010-08-10
25 0.07000 1281484800 14061.18 1.015640e+03 2010-08-11
26 0.06700 1281571200 2062.31 1.344900e+02 2010-08-12
27 0.07000 1281657600 3591.77 2.338000e+02 2010-08-13
28 0.06450 1281744000 4404.20 2.953100e+02 2010-08-14
29 0.06700 1281830400 4462.87 2.949500e+02 2010-08-15
... ... ... ... ...
2791 9928.56000 1520467200 154879.22 1.492236e+09 2018-03-08
2792 9316.77000 1520553600 233598.15 2.081621e+09 2018-03-09
2793 9252.76000 1520640000 117409.38 1.084926e+09 2018-03-10
2794 8797.27000 1520726400 149877.66 1.374815e+09 2018-03-11
2795 9543.98000 1520812800 152959.80 1.435404e+09 2018-03-12
2796 9142.27000 1520899200 133768.47 1.228556e+09 2018-03-13
2797 9160.12000 1520985600 161775.05 1.385573e+09 2018-03-14
2798 8216.22000 1521072000 187365.71 1.519850e+09 2018-03-15
2799 8267.95000 1521158400 129688.11 1.082790e+09 2018-03-16
2800 8283.23000 1521244800 111641.32 9.019394e+08 2018-03-17
2801 7882.67000 1521331200 198796.34 1.535519e+09 2018-03-18
2802 8215.50000 1521417600 171829.52 1.447813e+09 2018-03-19
2803 8623.14000 1521504000 131959.66 1.150462e+09 2018-03-20
2804 8920.53000 1521590400 109985.22 9.913764e+08 2018-03-21
2805 8911.37000 1521676800 116522.98 1.023287e+09 2018-03-22
2806 8724.98000 1521763200 109649.39 9.399973e+08 2018-03-23
2807 8935.51000 1521849600 93296.24 8.276632e+08 2018-03-24
2808 8548.39000 1521936000 76775.64 6.576435e+08 2018-03-25
2809 8472.56000 1522022400 131859.97 1.079039e+09 2018-03-26
2810 8152.18000 1522108800 116523.10 9.307550e+08 2018-03-27
2811 7808.42000 1522195200 82590.62 6.577121e+08 2018-03-28
2812 7959.78000 1522281600 185805.88 1.379180e+09 2018-03-29
2813 7106.62000 1522368000 229837.79 1.584675e+09 2018-03-30
2814 6853.75000 1522454400 129526.48 9.154006e+08 2018-03-31
2815 6943.77000 1522540800 131344.01 8.898877e+08 2018-04-01
2816 6835.58000 1522627200 106513.22 7.488614e+08 2018-04-02
2817 7074.65000 1522713600 122807.02 9.053268e+08 2018-04-03
2818 7434.30000 1522800000 123910.33 8.771998e+08 2018-04-04
2819 6815.50000 1522886400 114426.84 7.771452e+08 2018-04-05
2820 6790.45000 1522972800 72568.93 4.848647e+08 2018-04-06
And the existing and new DataFrame should be merged on the key 'time', which is a unix timestamp. 并且现有的和新的DataFrame应该在键“时间”上合并,该时间是unix时间戳。
# Read the old data
df_old = read_dataset('BTC_historical_data_daily')
# Download the new data
df_new = download_historical_data('BTC', 'USD', 'CCCAGG', 'day')
# Merge the two DataFrames on 'time'
df_merged_inner = pd.merge(left=df_old, right=df_new, how='left', left_on='time', right_on='time')
# Convert Unix Timestamp into a readable format
df_merged_inner['Timestamp'] = pd.to_datetime(df_merged_inner['time'], unit='s')
# Drop the Unix Timestamp
df_merged_inner = df_merged_inner.drop('time', axis=1)
# Save the new DataFrame as cvs file
df_merged_inner.to_csv('BTC_historical_data_daily_' + current_datetime)
This code returns a DataFrame with no updated data but doubled values for each key. 此代码返回一个DataFrame,其中没有更新数据,但每个键的值加倍。
pd.concate gives back the following error: pd.concate返回以下错误:
d = pd.concat(df_old,df_new)
Traceback (most recent call last):
File "/Users/audiodeep/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-50-891cefa897e1>", line 1, in <module>
d = pd.concat(df_old,df_new)
File "/Users/audiodeep/anaconda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 212, in concat
copy=copy)
File "/Users/audiodeep/anaconda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 227, in __init__
'"{name}"'.format(name=type(objs).__name__))
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
Has anyone a solution for me? 有没有人为我解决方案? Thanks a lot :D
非常感谢:D
pd.concat([df_old, df_new])
The error message is basically that your group of DataFrames have to be in an iterable object; 错误消息基本上是您的DataFrame组必须位于可迭代的对象中。 list.
清单。
As czr mentioned in a comment, pd.concat
should work for your example when you supply it with a tuple (df_old, df_new)
. 正如czr在评论中提到的那样,当为
pd.concat
提供元组(df_old, df_new)
时, pd.concat
应该适用于您的示例。 That is because it expects an iterable such as for example a tuple or a list. 那是因为它期望一个可迭代的对象,例如元组或列表。 The way you supplied df_old and df_new does not work, as you supplied each as an individual positional argument, ie
pd.concat(df_old, df_new)
. 您提供df_old和df_new的方式不起作用,因为您分别将其作为单独的位置参数提供,即
pd.concat(df_old, df_new)
。 Any of the following should work: 以下任何一项都可以工作:
d = pd.concat((df_old, df_new))
d = pd.concat([df_old, df_new])
The official documentation mentions this iterable as objs. 官方文档将此可迭代对象称为objs。
Additionally you might want to think about keeping only one data point for time points that you have multiple rows for. 此外,您可能要考虑只保留一个数据点作为您有多个行的时间点。 You can do this the following way:
您可以通过以下方式执行此操作:
d = d.drop_duplicates('time')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.