使用 pandas 中的列及其唯一值创建一个 dataframe

Question

我尝试寻找一种方法来创建 dataframe 列及其唯一值。 我知道这有较少的用例，但将是获得独特价值的初步想法的好方法。 它看起来像这样......

State	县	城市
科罗拉多州	丹佛	丹佛
科罗拉多州	埃尔帕索	科罗拉多斯普林斯
科罗拉多州	拉里马尔	柯林斯堡
科罗拉多州	拉里马尔	洛夫兰

变成这个...

State	县	城市
科罗拉多州	丹佛	丹佛
	埃尔帕索	科罗拉多斯普林斯
	拉里马尔	柯林斯堡
		洛夫兰

Answer 1

我会使用mask和 lambda

df.mask(cond=df.apply(lambda x : x.duplicated(keep='first')), other='')

      State   County              City
0  Colorado   Denver            Denver
1            El Paso  Colorado Springs
2            Larimar      Fort Collins
3                             Loveland

Answer 2

可重现的例子。 请下次将此添加到您以后的问题中，以帮助其他人回答您的问题。

import pandas as pd

df = pd.DataFrame({
    'State': ['Colorado', 'Colorado', 'Colorado', 'Colorado'], 
    'County': ['Denver', 'El Paso', 'Larimar', 'Larimar'],
    'City': ['Denver', 'Colorado Springs', 'Fort Collins', 'Loveland']
})

df

    State     County   City
0   Colorado  Denver   Denver
1   Colorado  El Paso  Colorado Springs
2   Colorado  Larimar  Fort Collins
3   Colorado  Larimar  Loveland

分别从每列中删除重复项，然后连接起来。 用空字符串填充NaN 。

pd.concat([df[col].drop_duplicates() for col in df], axis=1).fillna('')

    State     County        City
0   Colorado  Denver        Denver
1   El Paso   Colorado      Springs
2   Larimar   Fort Collins
3             Loveland

Answer 3

这是我想出的最好的解决方案，希望能帮助其他人寻找类似的东西！

def create_unique_df(df) -> pd.DataFrame:
    """ take a dataframe and creates a new one containing unique values for each column
    note, it only works for two columns or more

    :param df: dataframe you want see unique values for
    :param type: pandas.DataFrame
    return: dataframe of columns with unique values
    """
    # using list() allows us to combine lists down the line
    data_series = df.apply(lambda x: list( x.unique() ) )

    list_df = data_series.to_frame()

    # to create a df from lists they all neet to be the same leng. so we can append null 
    # values
    # to lists and make them the same length. First find differenc in length of longest list and
    # the rest
    list_df['needed_nulls'] = list_df[0].str.len().max() - list_df[0].str.len()

    # Second create a column of lists with one None value
    list_df['null_list_placeholder'] = [[None] for _ in range(list_df.shape[0])]

    # Third multiply the null list times the difference to get a list we can add to the list of
    # unique values making all the lists the same length. Example: [None] * 3  == [None, None, 
    # None]
    list_df['null_list_needed'] = list_df.null_list_placeholder * list_df.needed_nulls
    list_df['full_list'] = list_df[0] + list_df.null_list_needed

    unique_df = pd.DataFrame(
        list_df['full_list'].to_dict()
    )

    return unique_df

使用 pandas 中的列及其唯一值创建一个 dataframe

问题描述

3 个解决方案

解决方案1
2 2022-11-10 21:07:34

解决方案2
1 2022-11-10 21:32:05

解决方案3
0 2022-11-10 21:02:32

使用 pandas 中的列及其唯一值创建一个 dataframe

问题描述

3 个解决方案

解决方案1 2 2022-11-10 21:07:34

解决方案2 1 2022-11-10 21:32:05

解决方案3 0 2022-11-10 21:02:32

解决方案1
2 2022-11-10 21:07:34

解决方案2
1 2022-11-10 21:32:05

解决方案3
0 2022-11-10 21:02:32