使用 pandas 中的列及其唯一值创建一个 dataframe

Question

I have tried looking for a way to create a dataframe of columns and their unique values.我尝试寻找一种方法来创建 dataframe 列及其唯一值。 I know this has less use cases but would be a great way to get an initial idea of unique values.我知道这有较少的用例，但将是获得独特价值的初步想法的好方法。 It would look something like this....它看起来像这样......

State State	County县	City城市
Colorado科罗拉多州	Denver丹佛	Denver丹佛
Colorado科罗拉多州	El Paso埃尔帕索	Colorado Springs科罗拉多斯普林斯
Colorado科罗拉多州	Larimar拉里马尔	Fort Collins柯林斯堡
Colorado科罗拉多州	Larimar拉里马尔	Loveland洛夫兰

Turns into this...变成这个...

State State	County县	City城市
Colorado科罗拉多州	Denver丹佛	Denver丹佛
	El Paso埃尔帕索	Colorado Springs科罗拉多斯普林斯
	Larimar拉里马尔	Fort Collins柯林斯堡
		Loveland洛夫兰

Answer 1

I would use mask and a lambda我会使用mask和 lambda

df.mask(cond=df.apply(lambda x : x.duplicated(keep='first')), other='')

      State   County              City
0  Colorado   Denver            Denver
1            El Paso  Colorado Springs
2            Larimar      Fort Collins
3                             Loveland

Answer 2

Reproducible example.可重现的例子。 Please add this next time to your future questions to help others answer your question.请下次将此添加到您以后的问题中，以帮助其他人回答您的问题。

import pandas as pd

df = pd.DataFrame({
    'State': ['Colorado', 'Colorado', 'Colorado', 'Colorado'], 
    'County': ['Denver', 'El Paso', 'Larimar', 'Larimar'],
    'City': ['Denver', 'Colorado Springs', 'Fort Collins', 'Loveland']
})

df

    State     County   City
0   Colorado  Denver   Denver
1   Colorado  El Paso  Colorado Springs
2   Colorado  Larimar  Fort Collins
3   Colorado  Larimar  Loveland

Drop duplicates from each column separately and then concatenate.分别从每列中删除重复项，然后连接起来。 Fill NaN with empty string.用空字符串填充NaN 。

pd.concat([df[col].drop_duplicates() for col in df], axis=1).fillna('')

    State     County        City
0   Colorado  Denver        Denver
1   El Paso   Colorado      Springs
2   Larimar   Fort Collins
3             Loveland

Answer 3

This is the best solution I have come up with, hope to help others looking for something like it!这是我想出的最好的解决方案，希望能帮助其他人寻找类似的东西！

def create_unique_df(df) -> pd.DataFrame:
    """ take a dataframe and creates a new one containing unique values for each column
    note, it only works for two columns or more

    :param df: dataframe you want see unique values for
    :param type: pandas.DataFrame
    return: dataframe of columns with unique values
    """
    # using list() allows us to combine lists down the line
    data_series = df.apply(lambda x: list( x.unique() ) )

    list_df = data_series.to_frame()

    # to create a df from lists they all neet to be the same leng. so we can append null 
    # values
    # to lists and make them the same length. First find differenc in length of longest list and
    # the rest
    list_df['needed_nulls'] = list_df[0].str.len().max() - list_df[0].str.len()

    # Second create a column of lists with one None value
    list_df['null_list_placeholder'] = [[None] for _ in range(list_df.shape[0])]

    # Third multiply the null list times the difference to get a list we can add to the list of
    # unique values making all the lists the same length. Example: [None] * 3  == [None, None, 
    # None]
    list_df['null_list_needed'] = list_df.null_list_placeholder * list_df.needed_nulls
    list_df['full_list'] = list_df[0] + list_df.null_list_needed

    unique_df = pd.DataFrame(
        list_df['full_list'].to_dict()
    )

    return unique_df

使用 pandas 中的列及其唯一值创建一个 dataframe

问题描述

3 个解决方案

解决方案1
2 2022-11-10 21:07:34

解决方案2
1 2022-11-10 21:32:05

解决方案3
0 2022-11-10 21:02:32

使用 pandas 中的列及其唯一值创建一个 dataframe

问题描述

3 个解决方案

解决方案1 2 2022-11-10 21:07:34

解决方案2 1 2022-11-10 21:32:05

解决方案3 0 2022-11-10 21:02:32

解决方案1
2 2022-11-10 21:07:34

解决方案2
1 2022-11-10 21:32:05

解决方案3
0 2022-11-10 21:02:32