简体   繁体   English

根据唯一值创建 pandas DataFrame 的新列?

[英]Creating new columns of pandas DataFrame based on unique values?

I have a DataFrame in pandas that has a date, a stock symbol (ie 'MSFT'), and the Open and Close and other datapoints of that stock on that particular day.我在 pandas 中有一个 DataFrame,它有一个日期、一个股票代码(即“MSFT”),以及该股票在该特定日期的开盘价和收盘价以及其他数据点。 Thus, there is essentially a copy of the dates for each stock symbol in my dataset.因此,我的数据集中每个股票代码基本上都有一个日期副本。

I want to convert my DataFrame:我想转换我的 DataFrame:


    Open    High    Low Close   Adj Close   Volume  Name
Date                            
2006-12-04  0.06508 0.06508 0.06508 0.06508 -0.098360   193352.0    AAIT
2006-12-05  0.06464 0.06464 0.06464 0.06464 -0.097695   81542.0 AAIT
2006-12-06  0.06596 0.06596 0.06552 0.06596 -0.099690   158115.0    AAIT
2006-12-07  0.06596 0.06596 0.06596 0.06596 -0.099690   65731.0 AAIT
2006-12-11  0.06596 0.06596 0.06596 0.06596 -0.099690   542561.0    AAIT

into something like:变成类似的东西:


    ADBE_Adj Close  ADBE_Close  ADBE_High   ADBE_Low    ADBE_Open   ADBE_Volume ADXS_Adj Close  ADXS_Close  ADXS_High   ADXS_Low    ... 
2019-12-19  327.630005  327.630005  327.959991  324.26001   324.380005  2561400.0   0.581   0.581   0.59    0.550   ...
2020-11-17  467.950012  467.950012  469.910004  460.00000   461.660004  2407600.0   0.393   0.393   0.40    0.383   ...

I'm doing it manually with the code that I wrote:我正在使用我编写的代码手动执行此操作:

df = pd.DataFrame() # init empty dataframe
dates_set = set(stocks_df.index)
print('Going through {} days of data.'.format(len(dates_set)))
for _date in tqdm(dates_set):
    row = {}
    for symbol in filtered_stock_list:
        stock_at_date = stocks_df.loc[(stocks_df['Name']==symbol) &
                                     (stocks_df.index==_date)]
        for attribute in ['Open','High','Low','Close','Adj Close','Volume']:
            try:
                row[symbol + '_' + attribute] = float(stock_at_date[attribute])
            except Exception as e:
                row[symbol + '_' + attribute] = None
    #print(row)
    ser = pd.Series(data=row, name=_date)
    df = df.append(ser)

but unfortunately, this code is very unoptimized and will take hours to run.但不幸的是,这段代码非常未经优化,需要几个小时才能运行。 I've been looking at all kinds of different pandas operations, but I can't figure out how to do it.我一直在研究各种不同的 pandas 操作,但我不知道该怎么做。

Use:利用:

new_df = (df.set_index('Name', append=True)
            .loc[:, ['Open','High','Low','Close','Adj Close','Volume']]
            .unstack('Name'))
new_df.columns = [f'{x}_{y}' for x, y in new_df.columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较 2 个 pandas 数据框列并根据值是否相同创建新列 - Comparing 2 pandas dataframe columns and creating new column based on if the values are same or not Pandas:根据现有dataframe中列的名称和数据创建新的dataframe - Pandas: creating new dataframe based on the names and data of columns in existing dataframe 根据列中的唯一值从Pandas DataFrame列创建字典 - Creating Dictionary from Pandas DataFrame Column Based on Unique Values in Column 获取多列的唯一值作为 Pandas 中的新数据框 - Get unique values of multiple columns as a new dataframe in pandas 从熊猫数据框中的唯一行值创建新列 - Create new columns from unique row values in a pandas dataframe Pandas:基于两个不同的列创建唯一值的索引 - Pandas: Creating an index of unique values based off of two different columns 根据熊猫中各行的唯一值创建新列 - Creating new columns from unique values across rows in pandas 根据python pandas数据框中其他列的值计算新列 - Calculates new columns based on other columns' values in python pandas dataframe Pandas:根据列中的值向DataFrame添加新列 - Pandas: Add new columns to DataFrame based on values in columns 基于现有列的唯一值(列表)的 Pandas DataFrame 中的新列 - new column in pandas DataFrame based on unique values (lists) of an existing column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM