简体   繁体   English

Python Pandas从现有列和另一个数据框中的数据创建新列

[英]Python pandas make new column from data in existing column and from another dataframe

I have a DataFrame called 'mydata', and if I do 我有一个名为“ mydata”的DataFrame,如果我这样做

len(mydata.loc['2015-9-2'])

It counts the number of rows in mydata that have that date, and returns a number like 它计算mydata中具有该日期的行数,并返回类似

1067

I have another DataFrame called 'yourdata' which looks something like 我还有一个名为“ yourdata”的数据框,看起来像

     timestamp
51   2015-06-22
52   2015-06-23
53   2015-06-24
54   2015-06-25
43   2015-07-13

Now I want use each date in yourdata so instead of typing in each date 现在我要使用数据中的每个日期,而不是键入每个日期

len(mydata.loc['2015-9-2'])

I can iterate through 'yourdata' using them like 我可以像这样使用它们遍历“ yourdata”

len(mydata.loc[yourdata['timestamp']])

and produce a new DataFrame with the results or just add a new column to yourdata with the result for each date, but I'm lost as how to do this? 并使用结果生成一个新的DataFrame或仅将每个日期的结果添加到您的数据中的新列,但是我不知道该怎么做?

The following does not work 以下不起作用

yourdata['result'] = len(mydata.loc[yourdata['timestamp']])

neither does this 这也不

yourdata['result'] = len(mydata.loc[yourdata.iloc[:,-3]])

this does work 这确实有效

yourdata['result'] = len(mydata.loc['2015-9-2'])

buts that no good as I want to use the date in each row not some fixed date. 但是那不好,因为我想使用每一行中的日期而不是某个固定日期。

Edit : first few rows of mydata 编辑 :mydata的前几行

    timestamp            BPM
 0  2015-08-30 16:48:00   65
 1  2015-08-30 16:48:10   65
 2  2015-08-30 16:48:15   66
 3  2015-08-30 16:48:20   67
 4  2015-08-30 16:48:30   70
import numpy as np
import pandas as pd

mydata = pd.DataFrame({'timestamp': ['2015-06-22 16:48:00']*3 +
                                    ['2015-06-23 16:48:00']*2 +
                                    ['2015-06-24 16:48:00'] +
                                    ['2015-06-25 16:48:00']*4 +
                                    ['2015-07-13 16:48:00',
                                     '2015-08-13 16:48:00'],
                       'BPM': [65]*8 + [70]*4})
mydata['timestamp'] = pd.to_datetime(mydata['timestamp'])
print(mydata)

#     BPM           timestamp
# 0    65 2015-06-22 16:48:00
# 1    65 2015-06-22 16:48:00
# 2    65 2015-06-22 16:48:00
# 3    65 2015-06-23 16:48:00
# 4    65 2015-06-23 16:48:00
# 5    65 2015-06-24 16:48:00
# 6    65 2015-06-25 16:48:00
# 7    65 2015-06-25 16:48:00
# 8    70 2015-06-25 16:48:00
# 9    70 2015-06-25 16:48:00
# 10   70 2015-07-13 16:48:00
# 11   70 2015-08-13 16:48:00

yourdata = pd.Series(['2015-06-22', '2015-06-23', '2015-06-24',
                      '2015-06-25', '2015-07-13'], name='timestamp')
yourdata = pd.to_datetime(yourdata).to_frame()
print(yourdata)

# 0   2015-06-22
# 1   2015-06-23
# 2   2015-06-24
# 3   2015-06-25
# 4   2015-07-13

result = (mydata.set_index('timestamp').resample('D')
                .size().loc[yourdata['timestamp']]
                .reset_index())
result.columns = ['timestamp', 'result']
print(result)

#    timestamp  result
# 0 2015-06-22       3
# 1 2015-06-23       2
# 2 2015-06-24       1
# 3 2015-06-25       4
# 4 2015-07-13       1

I think you need value_counts , but first convert to dates by dt.date , convert to to_datetime and last use join : 我认为您需要value_counts ,但首先要通过dt.date转换为日期, dt.date转换为to_datetime ,最后使用join

print (yourdata.join(pd.to_datetime(mydata.timestamp.dt.date)
                       .value_counts()
                       .rename('len'), on='timestamp'))

Sample: 样品:

print (mydata)
             timestamp  BPM
0  2015-06-23 16:48:00   65
1  2015-06-23 16:48:10   65
2  2015-06-23 16:48:15   66
3  2015-06-23 16:48:20   67
4  2015-06-22 16:48:30   70

print (yourdata)
     timestamp
51  2015-06-22
52  2015-06-23
53  2015-06-24
54  2015-06-25
43  2015-07-13

#if dtype not datetime
mydata['timestamp'] = pd.to_datetime(mydata['timestamp'])
yourdata['timestamp'] = pd.to_datetime(yourdata['timestamp'])

print (yourdata.join(pd.to_datetime(mydata.timestamp.dt.date)
                       .value_counts()
                       .rename('len'), on='timestamp'))
    timestamp  len
51 2015-06-22  1.0
52 2015-06-23  4.0
53 2015-06-24  NaN
54 2015-06-25  NaN
43 2015-07-13  NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法从现有的两个列在 Pandas dataframe 中创建新列 - Unable to make a new column in Pandas dataframe from two existing columns 基于现有 python pandas 列的新数据框 - new dataframe based on column from existing python pandas 将列添加到现有 dataframe 并将数据导入到 Python 中的新列 Pandas - Add column to existing dataframe and import data into new column in Python Pandas 如何从 pandas dataframe 中的现有列创建新列 - How to create a new column from an existing column in a pandas dataframe Pandas :从现有列中创建一个以字符结尾的新列 - Pandas : make a new column from existing column that ends with character 根据其他列 ID 从现有 dataframe 中获取新 pandas dataframe 中的汇总数据列 - Get summary data columns in new pandas dataframe from existing dataframe based on other column-ID 向 pandas dataframe 添加一个新列,其中包含来自另一列的转换值? - Add a new column to pandas dataframe with coverted values from another column? 如何向现有 dataframe 添加新列并用另一列的部分数据填充它? - How do I add a new column to an existing dataframe and fill it with partial data from another column? 使用来自另一个数据帧的 if 条件在 Pandas 数据帧中创建一个新列 - create a new column in pandas dataframe using if condition from another dataframe 从另一个具有不同索引的 dataframe 在 pandas dataframe 添加新列 - Adding a new column in pandas dataframe from another dataframe with differing indices
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM