简体   繁体   中英

Pythonic way to create a dictionary by iterating

I'm trying to write something that answers "what are the possible values in every column?"

I created a dictionary called all_col_vals and iterate from 1 to however many columns my dataframe has. However, when reading about this online, someone stated this looked too much like Java and the more pythonic way would be to use zip. I can't see how I could use zip here.

all_col_vals = {}
for index in range(RCSRdf.shape[1]):
    all_col_vals[RCSRdf.iloc[:,index].name] = set(RCSRdf.iloc[:,index])

The output looks like 'CFN Network': {nan, 'N521', 'N536', 'N401', 'N612', 'N204'}, 'Exam': {'EXRC', 'MXRN', 'HXRT', 'MXRC'} and shows all the possible values for that specific column. The key is the column name.

I think @piRSquared's comment is the best option, so I'm going to steal it as an answer and add some explanation.

Answer

Assuming you don't have duplicate columns, use the following:

{k : {*df[k]} for k in df}

Explanation

k represents a column name in df . You don't have to use the .columns attribute to access them because a pandas.DataFrame works similarly to a python dict

df[k] represents the series k

{*df[k]} unpacks the values from the series and places them in a set ( {} ) which only keeps distinct elements by definition ( see definition of a set ).

Lastly, using list comprehension to create the dict is faster than defining an empty dict and adding new keys to it via a for-loop .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM