简体   繁体   中英

Ignore lines in pandas DataFrame

I have a list called reassembly organized like this:

['AFLT', 228468.0, 'B'],
['TATN', 1108.6, 'B'],
['TATN', 4434.4, 'B'],
['MOEX', 3480.0, 'S'],
['YNDX', 5934.0, 'B'],
['MTSS', 36003.0, 'S'],
['SBERP', 33837.1, 'S'],
['SBERP', 1780.8, 'S'],
['MTSS', 3273.0, 'S'],
['AFLT', 124356.0, 'B'],
['AFLT', 20244.0, 'B'],
['MGNT', 72990.0, 'B'],
['NLMK', 230917.0, 'B'],
['NLMK', 156050.0, 'B'],
['NLMK', 31220.0, 'B'],
['MGNT', 36450.0, 'S'],
['TCSG', 14045.2, 'S'],
['TCSG', 2160.4, 'S'],

Also there is a dictionary called medians with data:

{'TATNP': 11968.05, 'TCSG': 8647.2, 'TRNFP': 130250.0, 'UPRO': 7941.0, 'VTBR': 3828.28, 'YNDX': 17660.4}

Keys in dictionary are equivalent to first values in list ( 'AFLT', 'VTBR' and others)

I convert reassembly to pandas:

df = pd.DataFrame(reassembly, columns=['ticker','vol','operation'])

Now I want to do something like this:

df = df[df['vol'] < median['ticker']]

I mean if vol < median for this ticker script should ignore it.

Help me please to write this code correctly.

You want map :

high_volumes = df[df['vol'] > df['ticker'].map(medians)]

# do suff with high volume transaction

Note that the above can fail if you don't have all the tickers in medians . In which case, let say you want to keep all those tickers that are not in medians :

meds = df['ticker'].map(medians)
high_volumes = df[(df['vol']>meds)|(meds.isna())]

df = df[df['vol'] > df['ticker'].map(median)]

I suggest solving this with a list comprehension and pipe the result into panda instead.

reassembly = [['AFLT', 228468.0, 'B'],
['TATN', 1108.6, 'B'],
['TATN', 4434.4, 'B'],
['MOEX', 3480.0, 'S'],
['YNDX', 5934.0, 'B'],
['MTSS', 36003.0, 'S'],
['SBERP', 33837.1, 'S'],
['SBERP', 1780.8, 'S'],
['MTSS', 3273.0, 'S'],
['AFLT', 124356.0, 'B'],
['AFLT', 20244.0, 'B'],
['MGNT', 72990.0, 'B'],
['NLMK', 230917.0, 'B'],
['NLMK', 156050.0, 'B'],
['NLMK', 31220.0, 'B'],
['MGNT', 36450.0, 'S'],
['TCSG', 14045.2, 'S'],
['TCSG', 2160.4, 'S']]

medians = {'TATNP': 11968.05, 'TCSG': 8647.2, 'TRNFP': 130250.0, 'UPRO': 7941.0, 'VTBR': 3828.28, 'YNDX': 17660.4}

ready_for_panda = [x for x in reassembly if x[0] in medians and x[1] > medians[x[0]]]

pd.DataFrame(ready_for_panda, columns=["ticker", "vol", "operation"])

ticker  vol      operation
TCSG    14045.2  S

I have assumed that you want to filter out any element from reassembly where the volume is less than the current median for this ticker.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM