如何将重复的行添加到 Pandas DF？

Question

I appreciate the help in advance!我提前感谢您的帮助！

The question may seem weird at first so let me illustrate what I am trying to accomplish:这个问题一开始可能看起来很奇怪，所以让我来说明一下我想要完成的事情：

I have this df of cities and abbreviations:我有这个城市和缩写的df：

I need to add another column called 'Queries' and those queries are on a list as follows:我需要添加另一个名为“查询”的列，这些查询在列表中，如下所示：

queries = ['Document Management','Document Imaging','Imaging Services']

The trick though is that I need to duplicate my df rows for each query in the list.诀窍是我需要为列表中的每个查询复制我的 df 行。 For instance, for row 0 I have PHOENIX, AZ .例如，对于第 0 行，我有PHOENIX, AZ 。 I now need 3 rows saying PHOENIX, AZ, 'query[n]' .我现在需要 3 行说PHOENIX, AZ, 'query[n]' 。

Something that would look like this:看起来像这样的东西：

Of course I created that manually but I need to scale it for a large number of cities and a large list of queries.当然，我是手动创建的，但我需要针对大量城市和大量查询对其进行扩展。

This sounds simple but I've been trying for some hours now I don't see how to engineer any code for it.这听起来很简单，但我已经尝试了几个小时，现在我不知道如何为它设计任何代码。 Again, thanks for the help!再一次感谢你的帮助！

Answer 1

Here is one way, using .explode() :这是一种使用.explode()的方法：

import pandas as pd

df = pd.DataFrame({'City_Name': ['Phoenix', 'Tucson', 'Mesa', 'Los Angeles'],
                   'State': ['AZ', 'AZ', 'AZ', 'CA']})

# 'Query' is a column of tuples
df['Query'] = [('Doc Mgmt', 'Imaging', 'Services')] * len(df.index)

# ... and explode 'unpacks' the tuples, putting one item on each line
df = df.explode('Query')
print(df)

     City_Name State     Query
0      Phoenix    AZ  Doc Mgmt
0      Phoenix    AZ   Imaging
0      Phoenix    AZ  Services
1       Tucson    AZ  Doc Mgmt
1       Tucson    AZ   Imaging
1       Tucson    AZ  Services
2         Mesa    AZ  Doc Mgmt
2         Mesa    AZ   Imaging
2         Mesa    AZ  Services
3  Los Angeles    CA  Doc Mgmt
3  Los Angeles    CA   Imaging
3  Los Angeles    CA  Services

Answer 2

new to python myself, but I would get around it by creating n (n=# of unique query values) identical data frames without "Query".我自己是 python 的新手，但我会通过创建没有“查询”的 n（n = 唯一查询值的数量）相同的数据帧来解决它。 Then for each of the data frame, create a new column with one of the "Query" values.然后对于每个数据框，使用“查询”值之一创建一个新列。 Finally, stack all data frames together using append .最后，使用append将所有数据帧堆叠在一起。 A short example:一个简短的例子：

adf1 = pd.DataFrame([['city1','sate1'],['city2','state2']])
adf2 = adf1

adf1['query'] = 'doc management'
adf2['query'] = 'doc imaging'

df = adf1.append(adf2)

Another method if there are many types of queries.如果有多种类型的查询，另一种方法。 Creating a dummy column, say 'key', in both the original data frame and the query data frame, and merge the two on 'key'.在原始数据框和查询数据框中创建一个虚拟列，比如“key”，然后在“key”上合并两者。

adf = pd.DataFrame([['city1','state1'],['city2','state2']])
q = pd.DataFrame([['doc management'],['doc imaging']])

adf['key'] = 'key'
q['key'] = 'key'

df = pd.merge(adf, q, on='key', how='outer')

More advanced users should have better ways.更高级的用户应该有更好的方法。 This is a temporary solution if you are in a hurry.如果您赶时间，这是一个临时解决方案。

Answer 3

You should definitely go with jsmart's answer , but posting this as an exercise.您绝对应该使用jsmart 的答案 go ，但将此作为练习发布。

This can also be achieved by exporting the original cities/towns dataframe ( df ) to a list or records, manually duplicating each one for each query then reconstructing the final dataframe.这也可以通过将原始城镇 dataframe ( df ) 导出到列表或记录来实现，为每个查询手动复制每个，然后重建最终的 dataframe。

The entire thing can fit in a single line, and is even relatively readable if you can follow what's going on;)整个事情可以放在一行中，如果你能理解正在发生的事情，甚至是相对可读的；）

pd.DataFrame([{**record, 'query': query}
               for query in queries
               for record in df.to_dict(orient='records')])

如何将重复的行添加到 Pandas DF？

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-08-15 23:16:43

解决方案2
1 2020-08-15 22:58:10

解决方案3
1 2020-08-15 23:31:16

如何将重复的行添加到 Pandas DF？

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-08-15 23:16:43

解决方案2 1 2020-08-15 22:58:10

解决方案3 1 2020-08-15 23:31:16

解决方案1
3 已采纳 2020-08-15 23:16:43

解决方案2
1 2020-08-15 22:58:10

解决方案3
1 2020-08-15 23:31:16