[英]Create a dataframe with columns and their unique values in pandas
I have tried looking for a way to create a dataframe of columns and their unique values.我尝试寻找一种方法来创建 dataframe 列及其唯一值。 I know this has less use cases but would be a great way to get an initial idea of unique values.我知道这有较少的用例,但将是获得独特价值的初步想法的好方法。 It would look something like this....它看起来像这样......
State State | County县 | City城市 |
---|---|---|
Colorado科罗拉多州 | Denver丹佛 | Denver丹佛 |
Colorado科罗拉多州 | El Paso埃尔帕索 | Colorado Springs科罗拉多斯普林斯 |
Colorado科罗拉多州 | Larimar拉里马尔 | Fort Collins柯林斯堡 |
Colorado科罗拉多州 | Larimar拉里马尔 | Loveland洛夫兰 |
Turns into this...变成这个...
State State | County县 | City城市 |
---|---|---|
Colorado科罗拉多州 | Denver丹佛 | Denver丹佛 |
El Paso埃尔帕索 | Colorado Springs科罗拉多斯普林斯 | |
Larimar拉里马尔 | Fort Collins柯林斯堡 | |
Loveland洛夫兰 |
I would use mask
and a lambda我会使用mask
和 lambda
df.mask(cond=df.apply(lambda x : x.duplicated(keep='first')), other='')
State County City
0 Colorado Denver Denver
1 El Paso Colorado Springs
2 Larimar Fort Collins
3 Loveland
Reproducible example.可重现的例子。 Please add this next time to your future questions to help others answer your question.请下次将此添加到您以后的问题中,以帮助其他人回答您的问题。
import pandas as pd
df = pd.DataFrame({
'State': ['Colorado', 'Colorado', 'Colorado', 'Colorado'],
'County': ['Denver', 'El Paso', 'Larimar', 'Larimar'],
'City': ['Denver', 'Colorado Springs', 'Fort Collins', 'Loveland']
})
df
State County City
0 Colorado Denver Denver
1 Colorado El Paso Colorado Springs
2 Colorado Larimar Fort Collins
3 Colorado Larimar Loveland
Drop duplicates from each column separately and then concatenate.分别从每列中删除重复项,然后连接起来。 Fill NaN
with empty string.用空字符串填充NaN
。
pd.concat([df[col].drop_duplicates() for col in df], axis=1).fillna('')
State County City
0 Colorado Denver Denver
1 El Paso Colorado Springs
2 Larimar Fort Collins
3 Loveland
This is the best solution I have come up with, hope to help others looking for something like it!这是我想出的最好的解决方案,希望能帮助其他人寻找类似的东西!
def create_unique_df(df) -> pd.DataFrame:
""" take a dataframe and creates a new one containing unique values for each column
note, it only works for two columns or more
:param df: dataframe you want see unique values for
:param type: pandas.DataFrame
return: dataframe of columns with unique values
"""
# using list() allows us to combine lists down the line
data_series = df.apply(lambda x: list( x.unique() ) )
list_df = data_series.to_frame()
# to create a df from lists they all neet to be the same leng. so we can append null
# values
# to lists and make them the same length. First find differenc in length of longest list and
# the rest
list_df['needed_nulls'] = list_df[0].str.len().max() - list_df[0].str.len()
# Second create a column of lists with one None value
list_df['null_list_placeholder'] = [[None] for _ in range(list_df.shape[0])]
# Third multiply the null list times the difference to get a list we can add to the list of
# unique values making all the lists the same length. Example: [None] * 3 == [None, None,
# None]
list_df['null_list_needed'] = list_df.null_list_placeholder * list_df.needed_nulls
list_df['full_list'] = list_df[0] + list_df.null_list_needed
unique_df = pd.DataFrame(
list_df['full_list'].to_dict()
)
return unique_df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.