简体   繁体   English

如何将用户输入的数据添加到 pandas 数据框列?

[英]How to add data entered by the user to a pandas data frame column?

I have the following dataset:我有以下数据集:

import pandas as pd

data = {'type': ['train', 'train', 'train', 'pool', 'pool', 
'pool', 'pool', 'pool'], 'index': [0,1,2,3,4,5,6,7], 'corpus': 
['a','b','c', 'd', 'e', 'f', 'g', 'h'], 'labels': [[1,0,0], 
[0,1,0], [1,1,0], None , None , None  , None , None]}


data = pd.DataFrame(data)

data

what I want to do is to display the data from columns "corpus" associated with column 'type' 'pool' to a user add some labels to it.我想要做的是向用户显示与列“类型”“池”相关的“语料库”列中的数据,并为其添加一些标签。 After that, my program should be able insert in the dataset the labels added by the user to each of the texts displayed.之后,我的程序应该能够在数据集中插入用户添加到显示的每个文本的标签。 With the code below, the program is adding the last label entered by the user and replacing all the labels of the original dataset.使用下面的代码,程序将添加用户输入的最后一个 label 并替换原始数据集的所有标签。

for row, c in data.iterrows():
  if c['type'] == 'pool':
    a = input(f"Please enter your labels for 
the below text: \n\n {c['corpus']}")
    data['labels'] = a

So, my output current output is:所以,我的 output 当前 output 是:

        type     corpus labels
   0    train       a   0,0,1
   1    train       b   0,0,1
   2    train       c   0,0,1
   7    pool        h   0,0,1
   4    pool        e   0,0,1
   3    pool        d   0,0,1
   5    pool        f   0,0,1
   6    pool        g   0,0,1

my goal is:我的目标是:

    type    corpus   labels
0   train       a   [1, 0, 0]
1   train       b   [0, 1, 0]
2   train       c   [1, 1, 0]
7   pool        h   [1, 0, 0]
4   pool        e   [0, 0, 1]
3   pool        d   [1, 1, 1]
5   pool        f   [0, 1, 0]
6   pool        g   [0, 0, 1]

There are two things to fix with the code:代码有两点需要修复:

Firstly if you assign a to data['labels'] you are actually assigning it to the whole column (this is why you get the same value in all rows).首先,如果您将a分配给data['labels']您实际上是将其分配给整个列(这就是为什么您在所有行中都获得相同的值)。

Secondly assigning the return from input will assign a string but the other rows contained a list of ints.其次,分配input的返回值将分配一个字符串,但其他行包含一个整数列表。 To solve this we can use split to get the elements, map int to those and assing using df.at为了解决这个问题,我们可以使用split来获取元素 map int并使用df.at

import pandas as pd

data = {
    "type": ["train", "train", "train", "pool", "pool", "pool", "pool", "pool"],
    "index": [0, 1, 2, 3, 4, 5, 6, 7],
    "corpus": ["a", "b", "c", "d", "e", "f", "g", "h"],
    "labels": [[1, 0, 0], [0, 1, 0], [1, 1, 0], None, None, None, None, None],
}


data = pd.DataFrame(data)
print(data)

for idx, row in data.iterrows():
    if row["type"] == "pool":
        a = input(f"Please enter your labels for the below text: \n\n {row['corpus']} ")
        data.at[idx, "labels"] = list(map(int, a.split(",")))
print(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM