使用熊猫数据框列值并粘贴到下一行

Question

我对 Python 中的 Panda 数据框很陌生。 我正在处理 csv 文件结构如下所示的代码：

Id, Title, Body, Tags, Date
1, First question, My first question, robot Python, 2015
2, Second question, My second question, C++ Python, 2015
3, Third question, My third question, Selenium, 2016
4, Fourth question, My fourth question, Java C++, 2016

我已使用 Panda 库将此 CSV 导出到我的 python 代码

我正在尝试获取如下所示的数据框：

Id, Title, Body, Tags, Date
1, First question, My first question, robot, 2015
2, First question, My first question, Python, 2015
3, Second question, My second question, C++, 2015
4, Second question, My second question, Python, 2015
.......

请让我知道是否有任何合适的方法来实现这一目标

Answer 1

你可以这样做：

df = df.drop(["Id"], axis=1)
df2 = pd.DataFrame(columns=df.columns)
for index, row in df.iterrows():
    aux = row
    for tag in row["Tags"].split():
        aux["Tags"] = tag
        df2 = df2.append(aux)
df2.reset_index(drop=True)

其中 df 是您的数据帧，而 df2 是更新后的数据帧。 您遍历数据帧 df 的每一行，并将“标签”值拆分为尽可能多的标签（在您的示例中，最大数量为 2，但我想您可以拥有更多）。 然后将带有每个单独标记的行附加到新数据帧 df2。 （我删除了 id 并重置了索引，因为它保留了原始索引值）

    Title,  Body,   Tags,   Date,
0   First question, My first question,  robot,  2015
1   First question, My first question,  Python, 2015
2   Second question,    My second question, C++,    2015
3   Second question,    My second question, Python, 2015
4   Third question, My third question,  Selenium,   2016
5   Fourth question,    My fourth question, Java,   2016
6   Fourth question,    My fourth question, C++,    2016

Answer 2

最佳做法是提供您正在尝试执行的操作的完整代码，以便我们可以为您提供全面帮助。

我认为您要做的只是替换一些值。 您可以使用这种结构。

df['column name'] = df['column name'].replace(['old value'],'new value')

所以对于你的例子。

df['Title'] = df['Title'].replace({'Second Question': 'First Question',
                                   'Second Question' : 'Third Question"}),
                                    inplace = True)

等等等等。

使用熊猫数据框列值并粘贴到下一行

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-10-31 11:41:06

解决方案2
-1 2020-10-31 11:41:05

使用熊猫数据框列值并粘贴到下一行

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-10-31 11:41:06

解决方案2 -1 2020-10-31 11:41:05

解决方案1
1 已采纳 2020-10-31 11:41:06

解决方案2
-1 2020-10-31 11:41:05