简体   繁体   English

从多级 DataFrame 中提取索引作为列表中的字符串

[英]Extracting indices from Multi-level DataFrame as strings in a list

I have a Multi-level DataFrame with 36 entries: It's a pandas DF that has 36 levels (36 stocks) .我有一个具有 36 个条目的多级 DataFrame:它是一个具有 36 个级别(36 个股票)的 pandas DF Attached to a single-date time.附加到单个日期时间。 For anyone curious, this is output that is created by using Zipline API, or Quantopians Pipeline API.对于任何好奇的人,这是使用 Zipline API 或 Quantopians Pipeline API 创建的 output。 So I don't have much control on how the dataframe is created.所以我对如何创建 dataframe 没有太多控制权。

As you see in the DF, each level is represented by an equity, or stock, for example: Equity(1251 [CAJ])正如您在 DF 中看到的,每个级别都由股票或股票表示,例如: Equity(1251 [CAJ])

I'm trying to figure out how to extract the ticker symbol for each level in a string format and add them to a list!我试图弄清楚如何以字符串格式提取每个级别的股票代码并将它们添加到列表中! Such as list = ['CAJ', 'CBT', 'GILD', ...] followed by all other equities the program spit out.例如list = ['CAJ', 'CBT', 'GILD', ...]后跟程序吐出的所有其他股票。

This has caused me a huge headache and I've tried proceeding with:这让我非常头疼,我尝试继续:
result = df asset_list = result.index.levels[1] stocks = asset_list.get_level_values(0).unique()

But this does not seem to work and I get something way off, it might be an API issue as len(stocks) provides a value of 8830. I don't even know how that is possible.但这似乎不起作用,我得到了一些东西,它可能是一个 API 问题,因为len(stocks)提供了 8830 的值。我什至不知道这怎么可能。

Would greatly appreciate some coding help here.非常感谢这里的一些编码帮助。 Please let me know if you need more information from my side.如果您需要我方面的更多信息,请告诉我。

As mentioned from the thread below thanks to @sammywemmy, here is a dictionary of the first 5 rows of the table above.感谢@sammywemmy,正如下面的线程中提到的,这里是上表前 5 行的字典。

result.head().to_dict()

reproduces:再现:

{'current ratio': {(Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1251 [CAJ])): 1.921883,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1315 [CBT])): 2.0836239999999999,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3212 [GILD])): 3.1044160000000001,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3460 [HAS])): 5.3676060000000003,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3798 [IDA])): 1.5076229999999999},
 'dividend yield': {(Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1251 [CAJ])): 7.4899999999999993,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1315 [CBT])): 5.3900000000000006,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3212 [GILD])): 3.29,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3460 [HAS])): 4.0599999999999996,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3798 [IDA])): 3.04},
 'interest coverage': {(Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1251 [CAJ])): 227.99559500000001,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1315 [CBT])): 4.5714290000000002,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3212 [GILD])): 8.8230450000000005,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3460 [HAS])): 9.5895290000000006,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3798 [IDA])): 3.1692089999999999},
 'marketcap': {(Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1251 [CAJ])): 21652842873.0,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1315 [CBT])): 1471969134.0,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3212 [GILD])): 98467576445.0,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3460 [HAS])): 9171245233.0,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3798 [IDA])): 4308534238.0},
 'payout ratio': {(Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1251 [CAJ])): 138.62,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1315 [CBT])): 63.009999999999998,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3212 [GILD])): 59.719999999999999,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3460 [HAS])): 65.930000000000007,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3798 [IDA])): 55.530000000000001},
 'pe_ratio': {(Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1251 [CAJ])): 18.307486999999998,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1315 [CBT])): 11.858447,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3212 [GILD])): 18.533175,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3460 [HAS])): 16.528395,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3798 [IDA])): 18.540130000000001},
 'price': {(Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1251 [CAJ])): 19.949999999999999,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1315 [CBT])): 25.969999999999999,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3212 [GILD])): 78.210000000000008,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3460 [HAS])): 67.060000000000002,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3798 [IDA])): 85.480000000000004},
 'sector': {(Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1251 [CAJ])): True,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(1315 [CBT])): True,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3212 [GILD])): True,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3460 [HAS])): True,
  (Timestamp('2020-04-06 00:00:00+0000', tz='UTC', offset='C'),
   Equity(3798 [IDA])): True}}

Use a regular expression with str.extract使用带有str.extractregular expression

# I'm not sure of the column name, but it appears to be
# the second column if you reset the index
df.reset_index().iloc[:, 1].str.extract(r'\[([A-Z_]+)\]')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM