简体   繁体   中英

How to filter a Pandas series while writing to list

For a given column 'A' in a pandas dataframe 'tbl' , I have been trying to write the column .tolist() and filter out the items that are '.' . This works, but doesn't seem very readable:

list_of_A = tbl['A'][~tbl['A'].isin(['.'])].tolist()

Also, checking a list of one seems needlessly slow, though str.contains('.') seems like it would be slower due to pattern matching. Am I missing a better method?

Update: @jpp, @piRSquared and @Scott-Boston all had good methods, so it came down to a test:

>>> tbl = pd.DataFrame(np.random.randn(50000, 3), columns=list('ABC'))
>>> tbl.loc[tbl.sample(10000).index, 'A'] = '.'
>>> min(timeit.repeat("list_of_A = tbl.loc[tbl['A'].ne('.'), 'A'].tolist()", repeat=1000, number=100, globals=globals()))
0.37328900000102294
>>> min(timeit.repeat("list_of_A = tbl.A.values[tbl.A.values != '.'].tolist()", repeat=1000, number=100, globals=globals()))
0.1470019999997021
>>> min(timeit.repeat("tbl.query('A != \".\"')['A'].tolist()", repeat=1000, number=100, globals=globals()))
0.45748099999946135

Discovering .loc has opened up a world of possibilities for me, but to smash 'n' grab a filtered column list, looks like converting to an ndarray is fastest.

Two things to note:

  • Chained indexing is explicitly discouraged ; avoid as it is never necessary.
  • isin is only recommended if you are comparing to a list / set / pd.Series . loc accessor with == operator should be sufficient.

Try this instead:

list_of_A = tbl.loc[~(tbl['A'] == '.'), 'A'].tolist()

As @BradSolomon points out, there is an alternative:

list_of_A = tbl.loc[tbl['A'].ne('.'), 'A'].tolist()

Use a list comprehension

list_of_A = [x for x in df['A'] if x != '.']

Probably more performant

list_of_A = [x for x in df['A'].values.tolist() if x != '.']

Faster still

tbl.A.values[tbl.A.values != '.'].tolist()

您可以使用query来提高可读性:

tbl.query('A != "."')['A'].tolist()

Using a temporary variable and two lines of code makes this more readable:

exclude   = tbl['A'].isin(['.'])
list_of_A = tbl['A'][~exclude].tolist()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM