如何在不影响数据集本身的情况下将 aa 字符串数据拆分为同一列/单列（Python Pandas）？

Question

The data set I'm working on has got a column with zipcodes in it.我正在处理的数据集有一个包含邮政编码的列。 Some entries only have one zipcode, some have 2, 5, or 10+ zipcodes.有些条目只有一个邮政编码，有些条目有 2、5 或 10+ 个邮政编码。 Like this:像这样：

Zipcode(s)邮政编码
1245 1245
5863, 5682, 1995 5863, 5682, 1995
6978, 1123, 5659, 34554 6978, 1123, 5659, 34554
4539, 6453 4539, 6453

I want to do some simple analysis -- apply a value_counts() on the column to see what zipcodes are the most popular.我想做一些简单的分析——在列上应用 value_counts() 以查看哪些邮政编码最受欢迎。 But I can't properly do it since most cells have multiple zipcodes on them.但我不能正确地做到这一点，因为大多数单元格上都有多个邮政编码。 That's also the reason why I want a way where it won't affect the dataset itself, just that specific instance where all zipcodes are split and are in one column.这也是为什么我想要一种不会影响数据集本身的方式，只是所有邮政编码都被拆分并位于一列中的特定实例。

I've tried splitting them into multiple columns with .str.split(',',n=20, expand=True) but that's not really what I'm looking for.我尝试使用.str.split(',',n=20, expand=True)将它们分成多列，但这并不是我真正想要的。 I want them all split into a single column.我希望它们都分成一列。

Answer 1

I think pandas.DataFrame.explode is what you're looking for.我认为pandas.DataFrame.explode是您要找的。
With this, you take all values from lists (which you created with the split function) to a new row.这样，您就可以将列表（您使用split函数创建的列表）中的所有值放到一个新行中。

import pandas as pd

df = pd.DataFrame({
    "Zipcodes":["8000", "2000, 2002, 3003", "8000, 2002", "3004, 2004, 3003"]
})

df

(
    df.Zipcodes
    .str.replace(" ", "") # optional, if you don't need this then 
    .str.split(",")       # use ", " instead of ","
    .explode()
    .value_counts()
)

Output: Output：

Answer 2

You can use this python snippet below:您可以使用下面的 python 片段：

import pandas as pd
df = pd.DataFrame({
    "Zipcode(s)" : ["1245", "5863, 5682, 1995", "6978, 1123, 5659, 34554", "4539, 6453"]
})
df["Zipcode(s)"] = df["Zipcode(s)"].map(lambda zcode: zcode.split(", "))
zipcodes = sum(df["Zipcode(s)"].to_list(), [])
#create dummy(empty) dataframe
dummydf = pd.DataFrame({"Zipcode(s)" : zipcodes})
print(dummydf["Zipcode(s)"].value_counts())

Output: Output：

1245     1
5863     1
5682     1
1995     1
6978     1
1123     1
5659     1
34554    1
4539     1
6453     1
Name: Zipcode(s), dtype: int64

如何在不影响数据集本身的情况下将 aa 字符串数据拆分为同一列/单列（Python Pandas）？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-12-31 10:00:16

解决方案2
0 2022-12-31 10:46:49

如何在不影响数据集本身的情况下将 aa 字符串数据拆分为同一列/单列（Python Pandas）？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-12-31 10:00:16

解决方案2 0 2022-12-31 10:46:49

解决方案1
1 已采纳 2022-12-31 10:00:16

解决方案2
0 2022-12-31 10:46:49