如何快速添加一个大列表值对应python pandas dataframe

Question

I have a large csv file with the following format (example), the report_date is currently empty:我有一个大的 csv 文件，格式如下（示例），report_date 当前为空：

| ids | disease_code | report_date |
| --- | ------------ | ----------- |
| 10  |    I202      |             |
| 11  |    I232      |             |
| 11  |    I242      |             |

I generated a list of tuples from a data source like the following:我从数据源生成了一个元组列表，如下所示：

[(10, ['I202'], 2021-10-22), (11, ['I232', 'I242'], 2021-11-22), (11, ['I232', 'I242'], 2021-11-12),.....]

The above order is patient_id, disease_code and the reported_date (The dates are in order corresponding to the disease), for a patient who has more than one disease, the reported date was unfortunately separated into two tuples.上面的顺序是patient_id, disease_code and the reported_date（日期按照疾病的先后顺序），对于一个患有不止一种疾病的患者，reported date很不幸被分成了两个元组。 Now I want to fill the report_date column by matching the first two values of the tuple with the current csv, like this:现在我想通过将元组的前两个值与当前的 csv 匹配来填充 report_date 列，如下所示：

| ids | disease_code | report_date |
| --- | ------------ | ----------- |
| 10  |    I202      | 2021-10-22  |
| 11  |    I232      | 2021-11-22  |
| 11  |    I242      | 2021-11-12  |

I tried to use a nested loop but it seems like it will take 480 hours to complete.我尝试使用嵌套循环，但似乎需要 480 小时才能完成。 I believe there is a more simple answer but I could not figure it out.我相信有一个更简单的答案，但我无法弄清楚。 Any hint would be appreciated.任何提示将不胜感激。

Answer 1

First, you can create a dataframe with your data.首先，您可以使用您的数据创建一个 dataframe。 You'll see that the column "disease_code" contains a list of values, just as you mentioned:正如您提到的，您会看到"disease_code"列包含一个值列表：

>> df = pd.DataFrame(
    [(10, ['I202'], "2021-10-22"), (11, ['I232', 'I242'], "2021-11-22"), (11, ['I232', 'I242'], "2021-11-12")],
    columns=["ids", "disease_code", "report_date"],
)
>> df["report_date"] = pd.to_datetime(df["report_date"])
>> df
   ids  disease_code report_date
0   10        [I202]  2021-10-22
1   11  [I232, I242]  2021-11-22
2   11  [I232, I242]  2021-11-12

Now you need to separate the values in the "disease_code" column by repeating the values in the other columns... pd.DataFrame.explode does exactly that.现在您需要通过重复其他列中的值来分隔"disease_code"列中的值... pd.DataFrame.explode正是这样做的。 This method transforms values in a list-like column to multiple rows:此方法将类似列表的列中的值转换为多行：

>> df.explode(["disease_code"])  # Explode the "disease_code" column
   ids disease_code report_date
0   10         I202  2021-10-22
1   11         I232  2021-11-22
1   11         I242  2021-11-22
2   11         I232  2021-11-12
2   11         I242  2021-11-12

Answer 2

For new DataFrame use list comprehension:对于新的 DataFrame 使用列表理解：

L = [(10, ['I202'], '2021-10-22'), 
     (11, ['I232', 'I242'], '2021-11-22'),
     (11, ['I232', 'I242'], '2021-11-12')]

df1 = pd.DataFrame([(a, x, c) for a, b, c in L for x in b], 
                   columns=["ids", "disease_code", "report_date"])
print (df1)
   ids disease_code report_date
0   10         I202  2021-10-22
1   11         I232  2021-11-22
2   11         I242  2021-11-22
3   11         I232  2021-11-12
4   11         I242  2021-11-12

Then DataFrame.merge to original DataFrame df , but because there are duplicates in ids, disease_code columns first remove them:然后DataFrame.merge到原来的 DataFrame df ，但是因为ids, disease_code列先去掉：

print (df)
   ids disease_code  report_date
0   10         I202          NaN
1   11         I232          NaN
2   11         I242          NaN

print (df1.drop_duplicates(['ids','disease_code']))
   ids disease_code report_date
0   10         I202  2021-10-22
1   11         I232  2021-11-22
2   11         I242  2021-11-22

df = (df.drop('report_date', axis=1)
        .merge(df1.drop_duplicates(['ids','disease_code']), 
               on=['ids','disease_code']))
print (df)
   ids disease_code report_date
0   10         I202  2021-10-22
1   11         I232  2021-11-22
2   11         I242  2021-11-22

如何快速添加一个大列表值对应python pandas dataframe

问题描述

2 个解决方案

解决方案1
1 2022-02-18 06:13:16

解决方案2
0 2022-02-18 06:16:47

如何快速添加一个大列表值对应python pandas dataframe

问题描述

2 个解决方案

解决方案1 1 2022-02-18 06:13:16

解决方案2 0 2022-02-18 06:16:47

解决方案1
1 2022-02-18 06:13:16

解决方案2
0 2022-02-18 06:16:47